sfrench/cifs-2.6.git
5 weeks agoRevert "net: txgbe: fix clk_name exceed MAX_DEV_ID limits"
Duanqiang Wen [Mon, 22 Apr 2024 08:41:09 +0000 (16:41 +0800)]
Revert "net: txgbe: fix clk_name exceed MAX_DEV_ID limits"

This reverts commit e30cef001da259e8df354b813015d0e5acc08740.
commit 99f4570cfba1 ("clkdev: Update clkdev id usage to allow
for longer names") can fix clk_name exceed MAX_DEV_ID limits,
so this commit is meaningless.

Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240422084109.3201-2-duanqiangwen@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoRevert "net: txgbe: fix i2c dev name cannot match clkdev"
Duanqiang Wen [Mon, 22 Apr 2024 08:41:08 +0000 (16:41 +0800)]
Revert "net: txgbe: fix i2c dev name cannot match clkdev"

This reverts commit c644920ce9220d83e070f575a4df711741c07f07.
when register i2c dev, txgbe shorten "i2c_designware" to "i2c_dw",
will cause this i2c dev can't match platfom driver i2c_designware_platform.

Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240422084109.3201-1-duanqiangwen@net-swift.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'mlxsw-various-acl-fixes'
Jakub Kicinski [Thu, 25 Apr 2024 02:33:04 +0000 (19:33 -0700)]
Merge branch 'mlxsw-various-acl-fixes'

Petr Machata says:

====================
mlxsw: Various ACL fixes

Ido Schimmel writes:

Fix various problems in the ACL (i.e., flower offload) code. See the
commit messages for more details.
====================

Link: https://lore.kernel.org/r/cover.1713797103.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agomlxsw: spectrum_acl_tcam: Fix memory leak when canceling rehash work
Ido Schimmel [Mon, 22 Apr 2024 15:26:02 +0000 (17:26 +0200)]
mlxsw: spectrum_acl_tcam: Fix memory leak when canceling rehash work

The rehash delayed work is rescheduled with a delay if the number of
credits at end of the work is not negative as supposedly it means that
the migration ended. Otherwise, it is rescheduled immediately.

After "mlxsw: spectrum_acl_tcam: Fix possible use-after-free during
rehash" the above is no longer accurate as a non-negative number of
credits is no longer indicative of the migration being done. It can also
happen if the work encountered an error in which case the migration will
resume the next time the work is scheduled.

The significance of the above is that it is possible for the work to be
pending and associated with hints that were allocated when the migration
started. This leads to the hints being leaked [1] when the work is
canceled while pending as part of ACL region dismantle.

Fix by freeing the hints if hints are associated with a work that was
canceled while pending.

Blame the original commit since the reliance on not having a pending
work associated with hints is fragile.

[1]
unreferenced object 0xffff88810e7c3000 (size 256):
  comm "kworker/0:16", pid 176, jiffies 4295460353
  hex dump (first 32 bytes):
    00 30 95 11 81 88 ff ff 61 00 00 00 00 00 00 80  .0......a.......
    00 00 61 00 40 00 00 00 00 00 00 00 04 00 00 00  ..a.@...........
  backtrace (crc 2544ddb9):
    [<00000000cf8cfab3>] kmalloc_trace+0x23f/0x2a0
    [<000000004d9a1ad9>] objagg_hints_get+0x42/0x390
    [<000000000b143cf3>] mlxsw_sp_acl_erp_rehash_hints_get+0xca/0x400
    [<0000000059bdb60a>] mlxsw_sp_acl_tcam_vregion_rehash_work+0x868/0x1160
    [<00000000e81fd734>] process_one_work+0x59c/0xf20
    [<00000000ceee9e81>] worker_thread+0x799/0x12c0
    [<00000000bda6fe39>] kthread+0x246/0x300
    [<0000000070056d23>] ret_from_fork+0x34/0x70
    [<00000000dea2b93e>] ret_from_fork_asm+0x1a/0x30

Fixes: c9c9af91f1d9 ("mlxsw: spectrum_acl: Allow to interrupt/continue rehash work")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/0cc12ebb07c4d4c41a1265ee2c28b392ff997a86.1713797103.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agomlxsw: spectrum_acl_tcam: Fix incorrect list API usage
Ido Schimmel [Mon, 22 Apr 2024 15:26:01 +0000 (17:26 +0200)]
mlxsw: spectrum_acl_tcam: Fix incorrect list API usage

Both the function that migrates all the chunks within a region and the
function that migrates all the entries within a chunk call
list_first_entry() on the respective lists without checking that the
lists are not empty. This is incorrect usage of the API, which leads to
the following warning [1].

Fix by returning if the lists are empty as there is nothing to migrate
in this case.

[1]
WARNING: CPU: 0 PID: 6437 at drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c:1266 mlxsw_sp_acl_tcam_vchunk_migrate_all+0x1f1/0>
Modules linked in:
CPU: 0 PID: 6437 Comm: kworker/0:37 Not tainted 6.9.0-rc3-custom-00883-g94a65f079ef6 #39
Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019
Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work
RIP: 0010:mlxsw_sp_acl_tcam_vchunk_migrate_all+0x1f1/0x2c0
[...]
Call Trace:
 <TASK>
 mlxsw_sp_acl_tcam_vregion_rehash_work+0x6c/0x4a0
 process_one_work+0x151/0x370
 worker_thread+0x2cb/0x3e0
 kthread+0xd0/0x100
 ret_from_fork+0x34/0x50
 ret_from_fork_asm+0x1a/0x30
 </TASK>

Fixes: 6f9579d4e302 ("mlxsw: spectrum_acl: Remember where to continue rehash migration")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/4628e9a22d1d84818e28310abbbc498e7bc31bc9.1713797103.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agomlxsw: spectrum_acl_tcam: Fix warning during rehash
Ido Schimmel [Mon, 22 Apr 2024 15:26:00 +0000 (17:26 +0200)]
mlxsw: spectrum_acl_tcam: Fix warning during rehash

As previously explained, the rehash delayed work migrates filters from
one region to another. This is done by iterating over all chunks (all
the filters with the same priority) in the region and in each chunk
iterating over all the filters.

When the work runs out of credits it stores the current chunk and entry
as markers in the per-work context so that it would know where to resume
the migration from the next time the work is scheduled.

Upon error, the chunk marker is reset to NULL, but without resetting the
entry markers despite being relative to it. This can result in migration
being resumed from an entry that does not belong to the chunk being
migrated. In turn, this will eventually lead to a chunk being iterated
over as if it is an entry. Because of how the two structures happen to
be defined, this does not lead to KASAN splats, but to warnings such as
[1].

Fix by creating a helper that resets all the markers and call it from
all the places the currently only reset the chunk marker. For good
measures also call it when starting a completely new rehash. Add a
warning to avoid future cases.

[1]
WARNING: CPU: 7 PID: 1076 at drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.c:407 mlxsw_afk_encode+0x242/0x2f0
Modules linked in:
CPU: 7 PID: 1076 Comm: kworker/7:24 Tainted: G        W          6.9.0-rc3-custom-00880-g29e61d91b77b #29
Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019
Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work
RIP: 0010:mlxsw_afk_encode+0x242/0x2f0
[...]
Call Trace:
 <TASK>
 mlxsw_sp_acl_atcam_entry_add+0xd9/0x3c0
 mlxsw_sp_acl_tcam_entry_create+0x5e/0xa0
 mlxsw_sp_acl_tcam_vchunk_migrate_all+0x109/0x290
 mlxsw_sp_acl_tcam_vregion_rehash_work+0x6c/0x470
 process_one_work+0x151/0x370
 worker_thread+0x2cb/0x3e0
 kthread+0xd0/0x100
 ret_from_fork+0x34/0x50
 </TASK>

Fixes: 6f9579d4e302 ("mlxsw: spectrum_acl: Remember where to continue rehash migration")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/cc17eed86b41dd829d39b07906fec074a9ce580e.1713797103.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agomlxsw: spectrum_acl_tcam: Fix memory leak during rehash
Ido Schimmel [Mon, 22 Apr 2024 15:25:59 +0000 (17:25 +0200)]
mlxsw: spectrum_acl_tcam: Fix memory leak during rehash

The rehash delayed work migrates filters from one region to another.
This is done by iterating over all chunks (all the filters with the same
priority) in the region and in each chunk iterating over all the
filters.

If the migration fails, the code tries to migrate the filters back to
the old region. However, the rollback itself can also fail in which case
another migration will be erroneously performed. Besides the fact that
this ping pong is not a very good idea, it also creates a problem.

Each virtual chunk references two chunks: The currently used one
('vchunk->chunk') and a backup ('vchunk->chunk2'). During migration the
first holds the chunk we want to migrate filters to and the second holds
the chunk we are migrating filters from.

The code currently assumes - but does not verify - that the backup chunk
does not exist (NULL) if the currently used chunk does not reference the
target region. This assumption breaks when we are trying to rollback a
rollback, resulting in the backup chunk being overwritten and leaked
[1].

Fix by not rolling back a failed rollback and add a warning to avoid
future cases.

[1]
WARNING: CPU: 5 PID: 1063 at lib/parman.c:291 parman_destroy+0x17/0x20
Modules linked in:
CPU: 5 PID: 1063 Comm: kworker/5:11 Tainted: G        W          6.9.0-rc2-custom-00784-gc6a05c468a0b #14
Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019
Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work
RIP: 0010:parman_destroy+0x17/0x20
[...]
Call Trace:
 <TASK>
 mlxsw_sp_acl_atcam_region_fini+0x19/0x60
 mlxsw_sp_acl_tcam_region_destroy+0x49/0xf0
 mlxsw_sp_acl_tcam_vregion_rehash_work+0x1f1/0x470
 process_one_work+0x151/0x370
 worker_thread+0x2cb/0x3e0
 kthread+0xd0/0x100
 ret_from_fork+0x34/0x50
 ret_from_fork_asm+0x1a/0x30
 </TASK>

Fixes: 843500518509 ("mlxsw: spectrum_acl: Do rollback as another call to mlxsw_sp_acl_tcam_vchunk_migrate_all()")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/d5edd4f4503934186ae5cfe268503b16345b4e0f.1713797103.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agomlxsw: spectrum_acl_tcam: Rate limit error message
Ido Schimmel [Mon, 22 Apr 2024 15:25:58 +0000 (17:25 +0200)]
mlxsw: spectrum_acl_tcam: Rate limit error message

In the rare cases when the device resources are exhausted it is likely
that the rehash delayed work will fail. An error message will be printed
whenever this happens which can be overwhelming considering the fact
that the work is per-region and that there can be hundreds of regions.

Fix by rate limiting the error message.

Fixes: e5e7962ee5c2 ("mlxsw: spectrum_acl: Implement region migration according to hints")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/c510763b2ebd25e7990d80183feff91cde593145.1713797103.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agomlxsw: spectrum_acl_tcam: Fix possible use-after-free during rehash
Ido Schimmel [Mon, 22 Apr 2024 15:25:57 +0000 (17:25 +0200)]
mlxsw: spectrum_acl_tcam: Fix possible use-after-free during rehash

The rehash delayed work migrates filters from one region to another
according to the number of available credits.

The migrated from region is destroyed at the end of the work if the
number of credits is non-negative as the assumption is that this is
indicative of migration being complete. This assumption is incorrect as
a non-negative number of credits can also be the result of a failed
migration.

The destruction of a region that still has filters referencing it can
result in a use-after-free [1].

Fix by not destroying the region if migration failed.

[1]
BUG: KASAN: slab-use-after-free in mlxsw_sp_acl_ctcam_region_entry_remove+0x21d/0x230
Read of size 8 at addr ffff8881735319e8 by task kworker/0:31/3858

CPU: 0 PID: 3858 Comm: kworker/0:31 Tainted: G        W          6.9.0-rc2-custom-00782-gf2275c2157d8 #5
Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019
Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work
Call Trace:
 <TASK>
 dump_stack_lvl+0xc6/0x120
 print_report+0xce/0x670
 kasan_report+0xd7/0x110
 mlxsw_sp_acl_ctcam_region_entry_remove+0x21d/0x230
 mlxsw_sp_acl_ctcam_entry_del+0x2e/0x70
 mlxsw_sp_acl_atcam_entry_del+0x81/0x210
 mlxsw_sp_acl_tcam_vchunk_migrate_all+0x3cd/0xb50
 mlxsw_sp_acl_tcam_vregion_rehash_work+0x157/0x1300
 process_one_work+0x8eb/0x19b0
 worker_thread+0x6c9/0xf70
 kthread+0x2c9/0x3b0
 ret_from_fork+0x4d/0x80
 ret_from_fork_asm+0x1a/0x30
 </TASK>

Allocated by task 174:
 kasan_save_stack+0x33/0x60
 kasan_save_track+0x14/0x30
 __kasan_kmalloc+0x8f/0xa0
 __kmalloc+0x19c/0x360
 mlxsw_sp_acl_tcam_region_create+0xdf/0x9c0
 mlxsw_sp_acl_tcam_vregion_rehash_work+0x954/0x1300
 process_one_work+0x8eb/0x19b0
 worker_thread+0x6c9/0xf70
 kthread+0x2c9/0x3b0
 ret_from_fork+0x4d/0x80
 ret_from_fork_asm+0x1a/0x30

Freed by task 7:
 kasan_save_stack+0x33/0x60
 kasan_save_track+0x14/0x30
 kasan_save_free_info+0x3b/0x60
 poison_slab_object+0x102/0x170
 __kasan_slab_free+0x14/0x30
 kfree+0xc1/0x290
 mlxsw_sp_acl_tcam_region_destroy+0x272/0x310
 mlxsw_sp_acl_tcam_vregion_rehash_work+0x731/0x1300
 process_one_work+0x8eb/0x19b0
 worker_thread+0x6c9/0xf70
 kthread+0x2c9/0x3b0
 ret_from_fork+0x4d/0x80
 ret_from_fork_asm+0x1a/0x30

Fixes: c9c9af91f1d9 ("mlxsw: spectrum_acl: Allow to interrupt/continue rehash work")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/3e412b5659ec2310c5c615760dfe5eac18dd7ebd.1713797103.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agomlxsw: spectrum_acl_tcam: Fix possible use-after-free during activity update
Ido Schimmel [Mon, 22 Apr 2024 15:25:56 +0000 (17:25 +0200)]
mlxsw: spectrum_acl_tcam: Fix possible use-after-free during activity update

The rule activity update delayed work periodically traverses the list of
configured rules and queries their activity from the device.

As part of this task it accesses the entry pointed by 'ventry->entry',
but this entry can be changed concurrently by the rehash delayed work,
leading to a use-after-free [1].

Fix by closing the race and perform the activity query under the
'vregion->lock' mutex.

[1]
BUG: KASAN: slab-use-after-free in mlxsw_sp_acl_tcam_flower_rule_activity_get+0x121/0x140
Read of size 8 at addr ffff8881054ed808 by task kworker/0:18/181

CPU: 0 PID: 181 Comm: kworker/0:18 Not tainted 6.9.0-rc2-custom-00781-gd5ab772d32f7 #2
Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019
Workqueue: mlxsw_core mlxsw_sp_acl_rule_activity_update_work
Call Trace:
 <TASK>
 dump_stack_lvl+0xc6/0x120
 print_report+0xce/0x670
 kasan_report+0xd7/0x110
 mlxsw_sp_acl_tcam_flower_rule_activity_get+0x121/0x140
 mlxsw_sp_acl_rule_activity_update_work+0x219/0x400
 process_one_work+0x8eb/0x19b0
 worker_thread+0x6c9/0xf70
 kthread+0x2c9/0x3b0
 ret_from_fork+0x4d/0x80
 ret_from_fork_asm+0x1a/0x30
 </TASK>

Allocated by task 1039:
 kasan_save_stack+0x33/0x60
 kasan_save_track+0x14/0x30
 __kasan_kmalloc+0x8f/0xa0
 __kmalloc+0x19c/0x360
 mlxsw_sp_acl_tcam_entry_create+0x7b/0x1f0
 mlxsw_sp_acl_tcam_vchunk_migrate_all+0x30d/0xb50
 mlxsw_sp_acl_tcam_vregion_rehash_work+0x157/0x1300
 process_one_work+0x8eb/0x19b0
 worker_thread+0x6c9/0xf70
 kthread+0x2c9/0x3b0
 ret_from_fork+0x4d/0x80
 ret_from_fork_asm+0x1a/0x30

Freed by task 1039:
 kasan_save_stack+0x33/0x60
 kasan_save_track+0x14/0x30
 kasan_save_free_info+0x3b/0x60
 poison_slab_object+0x102/0x170
 __kasan_slab_free+0x14/0x30
 kfree+0xc1/0x290
 mlxsw_sp_acl_tcam_vchunk_migrate_all+0x3d7/0xb50
 mlxsw_sp_acl_tcam_vregion_rehash_work+0x157/0x1300
 process_one_work+0x8eb/0x19b0
 worker_thread+0x6c9/0xf70
 kthread+0x2c9/0x3b0
 ret_from_fork+0x4d/0x80
 ret_from_fork_asm+0x1a/0x30

Fixes: 2bffc5322fd8 ("mlxsw: spectrum_acl: Don't take mutex in mlxsw_sp_acl_tcam_vregion_rehash_work()")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/1fcce0a60b231ebeb2515d91022284ba7b4ffe7a.1713797103.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agomlxsw: spectrum_acl_tcam: Fix race during rehash delayed work
Ido Schimmel [Mon, 22 Apr 2024 15:25:55 +0000 (17:25 +0200)]
mlxsw: spectrum_acl_tcam: Fix race during rehash delayed work

The purpose of the rehash delayed work is to reduce the number of masks
(eRPs) used by an ACL region as the eRP bank is a global and limited
resource.

This is done in three steps:

1. Creating a new set of masks and a new ACL region which will use the
   new masks and to which the existing filters will be migrated to. The
   new region is assigned to 'vregion->region' and the region from which
   the filters are migrated from is assigned to 'vregion->region2'.

2. Migrating all the filters from the old region to the new region.

3. Destroying the old region and setting 'vregion->region2' to NULL.

Only the second steps is performed under the 'vregion->lock' mutex
although its comments says that among other things it "Protects
consistency of region, region2 pointers".

This is problematic as the first step can race with filter insertion
from user space that uses 'vregion->region', but under the mutex.

Fix by holding the mutex across the entirety of the delayed work and not
only during the second step.

Fixes: 2bffc5322fd8 ("mlxsw: spectrum_acl: Don't take mutex in mlxsw_sp_acl_tcam_vregion_rehash_work()")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/1ec1d54edf2bad0a369e6b4fa030aba64e1f124b.1713797103.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agomlxsw: spectrum_acl_tcam: Fix race in region ID allocation
Ido Schimmel [Mon, 22 Apr 2024 15:25:54 +0000 (17:25 +0200)]
mlxsw: spectrum_acl_tcam: Fix race in region ID allocation

Region identifiers can be allocated both when user space tries to insert
a new tc filter and when filters are migrated from one region to another
as part of the rehash delayed work.

There is no lock protecting the bitmap from which these identifiers are
allocated from, which is racy and leads to bad parameter errors from the
device's firmware.

Fix by converting the bitmap to IDA which handles its own locking. For
consistency, do the same for the group identifiers that are part of the
same structure.

Fixes: 2bffc5322fd8 ("mlxsw: spectrum_acl: Don't take mutex in mlxsw_sp_acl_tcam_vregion_rehash_work()")
Reported-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Alexander Zubkov <green@qrator.net>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/ce494b7940cadfe84f3e18da7785b51ef5f776e3.1713797103.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: openvswitch: Fix Use-After-Free in ovs_ct_exit
Hyunwoo Kim [Mon, 22 Apr 2024 09:37:17 +0000 (05:37 -0400)]
net: openvswitch: Fix Use-After-Free in ovs_ct_exit

Since kfree_rcu, which is called in the hlist_for_each_entry_rcu traversal
of ovs_ct_limit_exit, is not part of the RCU read critical section, it
is possible that the RCU grace period will pass during the traversal and
the key will be free.

To prevent this, it should be changed to hlist_for_each_entry_safe.

Fixes: 11efd5cb04a1 ("openvswitch: Support conntrack zone limit")
Signed-off-by: Hyunwoo Kim <v4bel@theori.io>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Link: https://lore.kernel.org/r/ZiYvzQN/Ry5oeFQW@v4bel-B760M-AORUS-ELITE-AX
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: phy: mediatek-ge-soc: follow netdev LED trigger semantics
Daniel Golle [Sun, 21 Apr 2024 00:08:31 +0000 (01:08 +0100)]
net: phy: mediatek-ge-soc: follow netdev LED trigger semantics

Only blink if the link is up on a LED which is programmed to also
indicate link-status.

Otherwise, if both LEDs are in use to indicate different speeds, the
resulting blinking being inverted on LEDs which aren't switched on at
a specific speed is quite counter-intuitive.

Also make sure that state left behind by reset or the bootloader is
recognized correctly including the half-duplex and full-duplex bits as
well as the (unsupported by Linux netdev trigger semantics) link-down
bit.

Fixes: c66937b0f8db ("net: phy: mediatek-ge-soc: support PHY LEDs")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agonet: gtp: Fix Use-After-Free in gtp_dellink
Hyunwoo Kim [Mon, 22 Apr 2024 09:39:30 +0000 (05:39 -0400)]
net: gtp: Fix Use-After-Free in gtp_dellink

Since call_rcu, which is called in the hlist_for_each_entry_rcu traversal
of gtp_dellink, is not part of the RCU read critical section, it
is possible that the RCU grace period will pass during the traversal and
the key will be free.

To prevent this, it should be changed to hlist_for_each_entry_safe.

Fixes: 94dc550a5062 ("gtp: fix an use-after-free in ipv4_pdp_find()")
Signed-off-by: Hyunwoo Kim <v4bel@theori.io>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agotcp: Fix Use-After-Free in tcp_ao_connect_init
Hyunwoo Kim [Mon, 22 Apr 2024 09:33:40 +0000 (05:33 -0400)]
tcp: Fix Use-After-Free in tcp_ao_connect_init

Since call_rcu, which is called in the hlist_for_each_entry_rcu traversal
of tcp_ao_connect_init, is not part of the RCU read critical section, it
is possible that the RCU grace period will pass during the traversal and
the key will be free.

To prevent this, it should be changed to hlist_for_each_entry_safe.

Fixes: 7c2ffaf21bd6 ("net/tcp: Calculate TCP-AO traffic keys")
Signed-off-by: Hyunwoo Kim <v4bel@theori.io>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://lore.kernel.org/r/ZiYu9NJ/ClR8uSkH@v4bel-B760M-AORUS-ELITE-AX
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: usb: ax88179_178a: stop lying about skb->truesize
Eric Dumazet [Sun, 21 Apr 2024 19:38:28 +0000 (19:38 +0000)]
net: usb: ax88179_178a: stop lying about skb->truesize

Some usb drivers try to set small skb->truesize and break
core networking stacks.

In this patch, I removed one of the skb->truesize overide.

I also replaced one skb_clone() by an allocation of a fresh
and small skb, to get minimally sized skbs, like we did
in commit 1e2c61172342 ("net: cdc_ncm: reduce skb truesize
in rx path")

Fixes: f8ebb3ac881b ("net: usb: ax88179_178a: Fix packet receiving")
Reported-by: shironeko <shironeko@tesaguri.club>
Closes: https://lore.kernel.org/netdev/c110f41a0d2776b525930f213ca9715c@tesaguri.club/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jose Alonso <joalonsof@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240421193828.1966195-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoipv4: check for NULL idev in ip_route_use_hint()
Eric Dumazet [Sun, 21 Apr 2024 18:43:26 +0000 (18:43 +0000)]
ipv4: check for NULL idev in ip_route_use_hint()

syzbot was able to trigger a NULL deref in fib_validate_source()
in an old tree [1].

It appears the bug exists in latest trees.

All calls to __in_dev_get_rcu() must be checked for a NULL result.

[1]
general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 2 PID: 3257 Comm: syz-executor.3 Not tainted 5.10.0-syzkaller #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
 RIP: 0010:fib_validate_source+0xbf/0x15a0 net/ipv4/fib_frontend.c:425
Code: 18 f2 f2 f2 f2 42 c7 44 20 23 f3 f3 f3 f3 48 89 44 24 78 42 c6 44 20 27 f3 e8 5d 88 48 fc 4c 89 e8 48 c1 e8 03 48 89 44 24 18 <42> 80 3c 20 00 74 08 4c 89 ef e8 d2 15 98 fc 48 89 5c 24 10 41 bf
RSP: 0018:ffffc900015fee40 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88800f7a4000 RCX: ffff88800f4f90c0
RDX: 0000000000000000 RSI: 0000000004001eac RDI: ffff8880160c64c0
RBP: ffffc900015ff060 R08: 0000000000000000 R09: ffff88800f7a4000
R10: 0000000000000002 R11: ffff88800f4f90c0 R12: dffffc0000000000
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88800f7a4000
FS:  00007f938acfe6c0(0000) GS:ffff888058c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f938acddd58 CR3: 000000001248e000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  ip_route_use_hint+0x410/0x9b0 net/ipv4/route.c:2231
  ip_rcv_finish_core+0x2c4/0x1a30 net/ipv4/ip_input.c:327
  ip_list_rcv_finish net/ipv4/ip_input.c:612 [inline]
  ip_sublist_rcv+0x3ed/0xe50 net/ipv4/ip_input.c:638
  ip_list_rcv+0x422/0x470 net/ipv4/ip_input.c:673
  __netif_receive_skb_list_ptype net/core/dev.c:5572 [inline]
  __netif_receive_skb_list_core+0x6b1/0x890 net/core/dev.c:5620
  __netif_receive_skb_list net/core/dev.c:5672 [inline]
  netif_receive_skb_list_internal+0x9f9/0xdc0 net/core/dev.c:5764
  netif_receive_skb_list+0x55/0x3e0 net/core/dev.c:5816
  xdp_recv_frames net/bpf/test_run.c:257 [inline]
  xdp_test_run_batch net/bpf/test_run.c:335 [inline]
  bpf_test_run_xdp_live+0x1818/0x1d00 net/bpf/test_run.c:363
  bpf_prog_test_run_xdp+0x81f/0x1170 net/bpf/test_run.c:1376
  bpf_prog_test_run+0x349/0x3c0 kernel/bpf/syscall.c:3736
  __sys_bpf+0x45c/0x710 kernel/bpf/syscall.c:5115
  __do_sys_bpf kernel/bpf/syscall.c:5201 [inline]
  __se_sys_bpf kernel/bpf/syscall.c:5199 [inline]
  __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5199

Fixes: 02b24941619f ("ipv4: use dst hint for ipv4 list receive")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20240421184326.1704930-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: fix sk_memory_allocated_{add|sub} vs softirqs
Eric Dumazet [Sun, 21 Apr 2024 17:52:48 +0000 (17:52 +0000)]
net: fix sk_memory_allocated_{add|sub} vs softirqs

Jonathan Heathcote reported a regression caused by blamed commit
on aarch64 architecture.

x86 happens to have irq-safe __this_cpu_add_return()
and __this_cpu_sub(), but this is not generic.

I think my confusion came from "struct sock" argument,
because these helpers are called with a locked socket.
But the memory accounting is per-proto (and per-cpu after
the blamed commit). We might cleanup these helpers later
to directly accept a "struct proto *proto" argument.

Switch to this_cpu_add_return() and this_cpu_xchg()
operations, and get rid of preempt_disable()/preempt_enable() pairs.

Fast path becomes a bit faster as a result :)

Many thanks to Jonathan Heathcote for his awesome report and
investigations.

Fixes: 3cd3399dd7a8 ("net: implement per-cpu reserves for memory_allocated")
Reported-by: Jonathan Heathcote <jonathan.heathcote@bbc.co.uk>
Closes: https://lore.kernel.org/netdev/VI1PR01MB42407D7947B2EA448F1E04EFD10D2@VI1PR01MB4240.eurprd01.prod.exchangelabs.com/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Link: https://lore.kernel.org/r/20240421175248.1692552-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agotools: ynl: don't ignore errors in NLMSG_DONE messages
Jakub Kicinski [Sat, 20 Apr 2024 02:08:26 +0000 (19:08 -0700)]
tools: ynl: don't ignore errors in NLMSG_DONE messages

NLMSG_DONE contains an error code, it has to be extracted.
Prior to this change all dumps will end in success,
and in case of failure the result is silently truncated.

Fixes: e4b48ed460d3 ("tools: ynl: add a completely generic client")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20240420020827.3288615-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agoax25: Fix netdev refcount issue
Duoming Zhou [Fri, 19 Apr 2024 02:04:56 +0000 (10:04 +0800)]
ax25: Fix netdev refcount issue

The dev_tracker is added to ax25_cb in ax25_bind(). When the
ax25 device is detaching, the dev_tracker of ax25_cb should be
deallocated in ax25_kill_by_device() instead of the dev_tracker
of ax25_dev. The log reported by ref_tracker is shown below:

[   80.884935] ref_tracker: reference already released.
[   80.885150] ref_tracker: allocated in:
[   80.885349]  ax25_dev_device_up+0x105/0x540
[   80.885730]  ax25_device_event+0xa4/0x420
[   80.885730]  notifier_call_chain+0xc9/0x1e0
[   80.885730]  __dev_notify_flags+0x138/0x280
[   80.885730]  dev_change_flags+0xd7/0x180
[   80.885730]  dev_ifsioc+0x6a9/0xa30
[   80.885730]  dev_ioctl+0x4d8/0xd90
[   80.885730]  sock_do_ioctl+0x1c2/0x2d0
[   80.885730]  sock_ioctl+0x38b/0x4f0
[   80.885730]  __se_sys_ioctl+0xad/0xf0
[   80.885730]  do_syscall_64+0xc4/0x1b0
[   80.885730]  entry_SYSCALL_64_after_hwframe+0x67/0x6f
[   80.885730] ref_tracker: freed in:
[   80.885730]  ax25_device_event+0x272/0x420
[   80.885730]  notifier_call_chain+0xc9/0x1e0
[   80.885730]  dev_close_many+0x272/0x370
[   80.885730]  unregister_netdevice_many_notify+0x3b5/0x1180
[   80.885730]  unregister_netdev+0xcf/0x120
[   80.885730]  sixpack_close+0x11f/0x1b0
[   80.885730]  tty_ldisc_kill+0xcb/0x190
[   80.885730]  tty_ldisc_hangup+0x338/0x3d0
[   80.885730]  __tty_hangup+0x504/0x740
[   80.885730]  tty_release+0x46e/0xd80
[   80.885730]  __fput+0x37f/0x770
[   80.885730]  __x64_sys_close+0x7b/0xb0
[   80.885730]  do_syscall_64+0xc4/0x1b0
[   80.885730]  entry_SYSCALL_64_after_hwframe+0x67/0x6f
[   80.893739] ------------[ cut here ]------------
[   80.894030] WARNING: CPU: 2 PID: 140 at lib/ref_tracker.c:255 ref_tracker_free+0x47b/0x6b0
[   80.894297] Modules linked in:
[   80.894929] CPU: 2 PID: 140 Comm: ax25_conn_rel_6 Not tainted 6.9.0-rc4-g8cd26fd90c1a #11
[   80.895190] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qem4
[   80.895514] RIP: 0010:ref_tracker_free+0x47b/0x6b0
[   80.895808] Code: 83 c5 18 4c 89 eb 48 c1 eb 03 8a 04 13 84 c0 0f 85 df 01 00 00 41 83 7d 00 00 75 4b 4c 89 ff 9
[   80.896171] RSP: 0018:ffff888009edf8c0 EFLAGS: 00000286
[   80.896339] RAX: 1ffff1100141ac00 RBX: 1ffff1100149463b RCX: dffffc0000000000
[   80.896502] RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffff88800a0d6518
[   80.896925] RBP: ffff888009edf9b0 R08: ffff88806d3288d3 R09: 1ffff1100da6511a
[   80.897212] R10: dffffc0000000000 R11: ffffed100da6511b R12: ffff88800a4a31d4
[   80.897859] R13: ffff88800a4a31d8 R14: dffffc0000000000 R15: ffff88800a0d6518
[   80.898279] FS:  00007fd88b7fe700(0000) GS:ffff88806d300000(0000) knlGS:0000000000000000
[   80.899436] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   80.900181] CR2: 00007fd88c001d48 CR3: 000000000993e000 CR4: 00000000000006f0
...
[   80.935774] ref_tracker: sp%d@000000000bb9df3d has 1/1 users at
[   80.935774]      ax25_bind+0x424/0x4e0
[   80.935774]      __sys_bind+0x1d9/0x270
[   80.935774]      __x64_sys_bind+0x75/0x80
[   80.935774]      do_syscall_64+0xc4/0x1b0
[   80.935774]      entry_SYSCALL_64_after_hwframe+0x67/0x6f

Change ax25_dev->dev_tracker to the dev_tracker of ax25_cb
in order to mitigate the bug.

Fixes: feef318c855a ("ax25: fix UAF bugs of net_device caused by rebinding operation")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Link: https://lore.kernel.org/r/20240419020456.29826-1-duoming@zju.edu.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agoNFC: trf7970a: disable all regulators on removal
Paul Geurts [Thu, 18 Apr 2024 19:25:38 +0000 (21:25 +0200)]
NFC: trf7970a: disable all regulators on removal

During module probe, regulator 'vin' and 'vdd-io' are used and enabled,
but the vdd-io regulator overwrites the 'vin' regulator pointer. During
remove, only the vdd-io is disabled, as the vin regulator pointer is not
available anymore. When regulator_put() is called during resource
cleanup a kernel warning is given, as the regulator is still enabled.

Store the two regulators in separate pointers and disable both the
regulators on module remove.

Fixes: 49d22c70aaf0 ("NFC: trf7970a: Add device tree option of 1.8 Volt IO voltage")
Signed-off-by: Paul Geurts <paul_geurts@live.nl>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/DB7PR09MB26847A4EBF88D9EDFEB1DA0F950E2@DB7PR09MB2684.eurprd09.prod.outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMAINTAINERS: eth: mark IBM eHEA as an Orphan
David Christensen [Thu, 18 Apr 2024 19:55:17 +0000 (15:55 -0400)]
MAINTAINERS: eth: mark IBM eHEA as an Orphan

Current maintainer Douglas Miller has left IBM and no replacement has
been assigned for the driver. The eHEA hardware was last used on
IBM POWER7 systems, the last of which reached end-of-support at the
end of 2020.

Signed-off-by: David Christensen <drc@linux.ibm.com>
Reviewed-by: Pradeep Satyanarayana <pradeeps@linux.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Link: https://lore.kernel.org/r/20240418195517.528577-1-drc@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: dsa: mv88e6xx: fix supported_interfaces setup in mv88e6250_phylink_get_caps()
Matthias Schiffer [Wed, 17 Apr 2024 10:37:37 +0000 (12:37 +0200)]
net: dsa: mv88e6xx: fix supported_interfaces setup in mv88e6250_phylink_get_caps()

With the recent PHYLINK changes requiring supported_interfaces to be set,
MV88E6250 family switches like the 88E6020 fail to probe - cmode is
never initialized on these devices, so mv88e6250_phylink_get_caps() does
not set any supported_interfaces flags.

Instead of a cmode, on 88E6250 we have a read-only port mode value that
encodes similar information. There is no reason to bother mapping port
mode to the cmodes of other switch models; instead we introduce a
mv88e6250_setup_supported_interfaces() that is called directly from
mv88e6250_phylink_get_caps().

Fixes: de5c9bf40c45 ("net: phylink: require supported_interfaces to be filled")
Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Link: https://lore.kernel.org/r/20240417103737.166651-1-matthias.schiffer@ew.tq-group.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'bnxt_en-aer-fixes'
David S. Miller [Mon, 22 Apr 2024 13:13:19 +0000 (14:13 +0100)]
Merge branch 'bnxt_en-aer-fixes'

Michael Chan says:

====================
bnxt_en: AER fixes

This patchset fixes issues in the AER recovery logic.  The first patch
refactors the code to make a shutdown function available for AER fatal
errors.  The second patch fixes the AER fatal recovery logic.  The
third patch fixes the health register logic to fix AER recovery failure
for the new P7 chips.
====================

Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
5 weeks agobnxt_en: Fix error recovery for 5760X (P7) chips
Michael Chan [Fri, 19 Apr 2024 18:34:49 +0000 (11:34 -0700)]
bnxt_en: Fix error recovery for 5760X (P7) chips

During error recovery, such as AER fatal error slot reset, we call
bnxt_try_map_fw_health_reg() to try to get access to the health
register to determine the firmware state.  Fix
bnxt_try_map_fw_health_reg() to recognize the P7 chip correctly
and set up the health register.

This fixes this type of AER slot reset failure:

bnxt_en 0000:04:00.0: AER: PCIe Bus Error: severity=Uncorrectable (Fatal), type=Inaccessible, (Unregistered Agent ID)
bnxt_en 0000:04:00.0 enp4s0f0np0: PCI I/O error detected
bnxt_en 0000:04:00.0 bnxt_re0: Handle device suspend call
bnxt_en 0000:04:00.1 enp4s0f1np1: PCI I/O error detected
bnxt_en 0000:04:00.1 bnxt_re1: Handle device suspend call
pcieport 0000:00:02.0: AER: Root Port link has been reset (0)
bnxt_en 0000:04:00.0 enp4s0f0np0: PCI Slot Reset
bnxt_en 0000:04:00.0: enabling device (0000 -> 0002)
bnxt_en 0000:04:00.0: Firmware not ready
bnxt_en 0000:04:00.1 enp4s0f1np1: PCI Slot Reset
bnxt_en 0000:04:00.1: enabling device (0000 -> 0002)
bnxt_en 0000:04:00.1: Firmware not ready
pcieport 0000:00:02.0: AER: device recovery failed

Fixes: a432a45bdba4 ("bnxt_en: Define basic P7 macros")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agobnxt_en: Fix the PCI-AER routines
Vikas Gupta [Fri, 19 Apr 2024 18:34:48 +0000 (11:34 -0700)]
bnxt_en: Fix the PCI-AER routines

We do not support two simultaneous recoveries so check for reset
flag, BNXT_STATE_IN_FW_RESET, and do not proceed with AER further.
When the pci channel state is pci_channel_io_frozen, the PCIe link
can not be trusted so we disable the traffic immediately and stop
BAR access by calling bnxt_fw_fatal_close().  BAR access after
AER fatal error can cause an NMI.

Fixes: f75d9a0aa967 ("bnxt_en: Re-write PCI BARs after PCI fatal error.")
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agobnxt_en: refactor reset close code
Vikas Gupta [Fri, 19 Apr 2024 18:34:47 +0000 (11:34 -0700)]
bnxt_en: refactor reset close code

Introduce bnxt_fw_fatal_close() API which can be used
to stop data path and disable device when firmware
is in fatal state.

Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agobridge/br_netlink.c: no need to return void function
Hangbin Liu [Fri, 19 Apr 2024 08:02:00 +0000 (16:02 +0800)]
bridge/br_netlink.c: no need to return void function

br_info_notify is a void function. There is no need to return.

Fixes: b6d0425b816e ("bridge: cfm: Netlink Notifications.")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agomailmap: add entries for Alex Elder
Alex Elder [Sun, 21 Apr 2024 19:44:58 +0000 (14:44 -0500)]
mailmap: add entries for Alex Elder

Define my kernel.org address to be the canonical one, and add mailmap
entries for the various addresses (including typos) that have been
used over the years.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agoicmp: prevent possible NULL dereferences from icmp_build_probe()
Eric Dumazet [Sat, 20 Apr 2024 07:01:16 +0000 (07:01 +0000)]
icmp: prevent possible NULL dereferences from icmp_build_probe()

First problem is a double call to __in_dev_get_rcu(), because
the second one could return NULL.

if (__in_dev_get_rcu(dev) && __in_dev_get_rcu(dev)->ifa_list)

Second problem is a read from dev->ip6_ptr with no NULL check:

if (!list_empty(&rcu_dereference(dev->ip6_ptr)->addr_list))

Use the correct RCU API to fix these.

v2: add missing include <net/addrconf.h>

Fixes: d329ea5bd884 ("icmp: add response to RFC 8335 PROBE messages")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Andreas Roeseler <andreas.a.roeseler@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agonet: usb: qmi_wwan: add Telit FN920C04 compositions
Daniele Palmas [Thu, 18 Apr 2024 11:12:07 +0000 (13:12 +0200)]
net: usb: qmi_wwan: add Telit FN920C04 compositions

Add the following Telit FN920C04 compositions:

0x10a0: rmnet + tty (AT/NMEA) + tty (AT) + tty (diag)
T:  Bus=03 Lev=01 Prnt=03 Port=06 Cnt=01 Dev#=  5 Spd=480  MxCh= 0
D:  Ver= 2.01 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=1bc7 ProdID=10a0 Rev=05.15
S:  Manufacturer=Telit Cinterion
S:  Product=FN920
S:  SerialNumber=92c4c4d8
C:  #Ifs= 4 Cfg#= 1 Atr=e0 MxPwr=500mA
I:  If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan
E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=82(I) Atr=03(Int.) MxPS=   8 Ivl=32ms
I:  If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=60 Driver=option
E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=84(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
I:  If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=86(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
I:  If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms

0x10a4: rmnet + tty (AT) + tty (AT) + tty (diag)
T:  Bus=03 Lev=01 Prnt=03 Port=06 Cnt=01 Dev#=  8 Spd=480  MxCh= 0
D:  Ver= 2.01 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=1bc7 ProdID=10a4 Rev=05.15
S:  Manufacturer=Telit Cinterion
S:  Product=FN920
S:  SerialNumber=92c4c4d8
C:  #Ifs= 4 Cfg#= 1 Atr=e0 MxPwr=500mA
I:  If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan
E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=82(I) Atr=03(Int.) MxPS=   8 Ivl=32ms
I:  If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=84(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
I:  If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=86(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
I:  If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms

0x10a9: rmnet + tty (AT) + tty (diag) + DPL (data packet logging) + adb
T:  Bus=03 Lev=01 Prnt=03 Port=06 Cnt=01 Dev#=  9 Spd=480  MxCh= 0
D:  Ver= 2.01 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=1bc7 ProdID=10a9 Rev=05.15
S:  Manufacturer=Telit Cinterion
S:  Product=FN920
S:  SerialNumber=92c4c4d8
C:  #Ifs= 5 Cfg#= 1 Atr=e0 MxPwr=500mA
I:  If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan
E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=82(I) Atr=03(Int.) MxPS=   8 Ivl=32ms
I:  If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=84(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
I:  If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 3 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=80 Driver=(none)
E:  Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 4 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none)
E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms

Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agoMerge branch 'mlxsw-fixes'
Jakub Kicinski [Sat, 20 Apr 2024 03:43:17 +0000 (20:43 -0700)]
Merge branch 'mlxsw-fixes'

Petr Machata says:

====================
mlxsw: Fixes

This patchset fixes the following issues:

- During driver de-initialization the driver unregisters the EMAD
  response trap by setting its action to DISCARD. However the manual
  only permits TRAP and FORWARD, and future firmware versions will
  enforce this.

  In patch #1, suppress the error message by aligning the driver to the
  manual and use a FORWARD (NOP) action when unregistering the trap.

- The driver queries the Management Capabilities Mask (MCAM) register
  during initialization to understand if certain features are supported.

  However, not all firmware versions support this register, leading to
  the driver failing to load.

  Patches #2 and #3 fix this issue by treating an error in the register
  query as an indication that the feature is not supported.

v2:
- Patch #2:
    - Make mlxsw_env_max_module_eeprom_len_query() void
====================

Link: https://lore.kernel.org/r/cover.1713446092.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agomlxsw: pci: Fix driver initialization with old firmware
Ido Schimmel [Thu, 18 Apr 2024 13:46:08 +0000 (15:46 +0200)]
mlxsw: pci: Fix driver initialization with old firmware

The driver queries the Management Capabilities Mask (MCAM) register
during initialization to understand if a new and deeper reset flow is
supported.

However, not all firmware versions support this register, leading to the
driver failing to load.

Fix by treating an error in the register query as an indication that the
feature is not supported.

Fixes: f257c73e5356 ("mlxsw: pci: Add support for new reset flow")
Reported-by: Tim 'mithro' Ansell <me@mith.ro>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/ee968c49d53bac96a4c66d1b09ebbd097d81aca5.1713446092.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agomlxsw: core_env: Fix driver initialization with old firmware
Ido Schimmel [Thu, 18 Apr 2024 13:46:07 +0000 (15:46 +0200)]
mlxsw: core_env: Fix driver initialization with old firmware

The driver queries the Management Capabilities Mask (MCAM) register
during initialization to understand if it can read up to 128 bytes from
transceiver modules.

However, not all firmware versions support this register, leading to the
driver failing to load.

Fix by treating an error in the register query as an indication that the
feature is not supported.

Fixes: 1f4aea1f72da ("mlxsw: core_env: Read transceiver module EEPROM in 128 bytes chunks")
Reported-by: Tim 'mithro' Ansell <me@mith.ro>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/0afa8b2e8bac178f5f88211344429176dcc72281.1713446092.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agomlxsw: core: Unregister EMAD trap using FORWARD action
Ido Schimmel [Thu, 18 Apr 2024 13:46:06 +0000 (15:46 +0200)]
mlxsw: core: Unregister EMAD trap using FORWARD action

The device's manual (PRM - Programmer's Reference Manual) classifies the
trap that is used to deliver EMAD responses as an "event trap". Among
other things, it means that the only actions that can be associated with
the trap are TRAP and FORWARD (NOP).

Currently, during driver de-initialization the driver unregisters the
trap by setting its action to DISCARD, which violates the above
guideline. Future firmware versions will prevent such misuses by
returning an error. This does not prevent the driver from working, but
an error will be printed to the kernel log during module removal /
devlink reload:

mlxsw_spectrum 0000:03:00.0: Reg cmd access status failed (status=7(bad parameter))
mlxsw_spectrum 0000:03:00.0: Reg cmd access failed (reg_id=7003(hpkt),type=write)

Suppress the error message by aligning the driver to the manual and use
a FORWARD (NOP) action when unregistering the trap.

Fixes: 4ec14b7634b2 ("mlxsw: Add interface to access registers and process events")
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/753a89e14008fde08cb4a2c1e5f537b81d8eb2d6.1713446092.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: bcmasp: fix memory leak when bringing down interface
Justin Chen [Thu, 18 Apr 2024 18:05:41 +0000 (11:05 -0700)]
net: bcmasp: fix memory leak when bringing down interface

When bringing down the TX rings we flush the rings but forget to
reclaimed the flushed packets. This leads to a memory leak since we
do not free the dma mapped buffers. This also leads to tx control
block corruption when bringing down the interface for power
management.

Fixes: 490cb412007d ("net: bcmasp: Add support for ASP2.0 Ethernet controller")
Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Acked-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240418180541.2271719-1-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoudp: preserve the connected status if only UDP cmsg
Yick Xie [Thu, 18 Apr 2024 17:06:10 +0000 (01:06 +0800)]
udp: preserve the connected status if only UDP cmsg

If "udp_cmsg_send()" returned 0 (i.e. only UDP cmsg),
"connected" should not be set to 0. Otherwise it stops
the connected socket from using the cached route.

Fixes: 2e8de8576343 ("udp: add gso segment cmsg")
Signed-off-by: Yick Xie <yick.xie@gmail.com>
Cc: stable@vger.kernel.org
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240418170610.867084-1-yick.xie@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agovxlan: drop packets from invalid src-address
David Bauer [Thu, 18 Apr 2024 13:29:08 +0000 (15:29 +0200)]
vxlan: drop packets from invalid src-address

The VXLAN driver currently does not check if the inner layer2
source-address is valid.

In case source-address snooping/learning is enabled, a entry in the FDB
for the invalid address is created with the layer3 address of the tunnel
endpoint.

If the frame happens to have a non-unicast address set, all this
non-unicast traffic is subsequently not flooded to the tunnel network
but sent to the learnt host in the FDB. To make matters worse, this FDB
entry does not expire.

Apply the same filtering for packets as it is done for bridges. This not
only drops these invalid packets but avoids them from being learnt into
the FDB.

Fixes: d342894c5d2f ("vxlan: virtual extensible lan")
Suggested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David Bauer <mail@david-bauer.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agonet: libwx: fix alloc msix vectors failed
Duanqiang Wen [Thu, 18 Apr 2024 02:15:56 +0000 (10:15 +0800)]
net: libwx: fix alloc msix vectors failed

driver needs queue msix vectors and one misc irq vector,
but only queue vectors need irq affinity.
when num_online_cpus is less than chip max msix vectors,
driver will acquire (num_online_cpus + 1) vecotrs, and
call pci_alloc_irq_vectors_affinity functions with affinity
params without setting pre_vectors or post_vectors, it will
cause return error code -ENOSPC.
Misc irq vector is vector 0, driver need to set affinity params
.pre_vectors = 1.

Fixes: 3f703186113f ("net: libwx: Add irq flow functions")
Signed-off-by: Duanqiang Wen <duanqiangwen@net-swift.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 weeks agoMerge tag 'net-6.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 18 Apr 2024 18:40:54 +0000 (11:40 -0700)]
Merge tag 'net-6.9-rc5' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "A little calmer than usual, probably just the timing of sub-tree PRs.

  Including fixes from netfilter.

  Current release - regressions:

   - inet: bring NLM_DONE out to a separate recv() again, fix user space
     which assumes multiple recv()s will happen and gets blocked forever

   - drv: mlx5:
       - restore mistakenly dropped parts in register devlink flow
       - use channel mdev reference instead of global mdev instance for
         coalescing
       - acquire RTNL lock before RQs/SQs activation/deactivation

  Previous releases - regressions:

   - net: change maximum number of UDP segments to 128, fix virtio
     compatibility with Windows peers

   - usb: ax88179_178a: avoid writing the mac address before first
     reading

  Previous releases - always broken:

   - sched: fix mirred deadlock on device recursion

   - netfilter:
       - br_netfilter: skip conntrack input hook for promisc packets
       - fixes removal of duplicate elements in the pipapo set backend
       - various fixes for abort paths and error handling

   - af_unix: don't peek OOB data without MSG_OOB

   - drv: flower: fix fragment flags handling in multiple drivers

   - drv: ravb: fix jumbo frames and packet stats accounting

  Misc:

   - kselftest_harness: fix Clang warning about zero-length format

   - tun: limit printing rate when illegal packet received by tun dev"

* tag 'net-6.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (46 commits)
  net: ethernet: ti: am65-cpsw-nuss: cleanup DMA Channels before using them
  net: usb: ax88179_178a: avoid writing the mac address before first reading
  net: ravb: Fix RX byte accounting for jumbo packets
  net: ravb: Fix GbEth jumbo packet RX checksum handling
  net: ravb: Allow RX loop to move past DMA mapping errors
  net: ravb: Count packets instead of descriptors in R-Car RX path
  net: ethernet: mtk_eth_soc: fix WED + wifi reset
  net:usb:qmi_wwan: support Rolling modules
  selftests: kselftest_harness: fix Clang warning about zero-length format
  net/sched: Fix mirred deadlock on device recursion
  netfilter: nf_tables: fix memleak in map from abort path
  netfilter: nf_tables: restore set elements when delete set fails
  netfilter: nf_tables: missing iterator type in lookup walk
  s390/ism: Properly fix receive message buffer allocation
  net: dsa: mt7530: fix port mirroring for MT7988 SoC switch
  net: dsa: mt7530: fix mirroring frames received on local port
  tun: limit printing rate when illegal packet received by tun dev
  ice: Fix checking for unsupported keys on non-tunnel device
  ice: tc: allow zero flags in parsing tc flower
  ice: tc: check src_vsi in case of traffic from VF
  ...

5 weeks agoMerge tag 'gpio-fixes-for-v6.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 18 Apr 2024 17:18:03 +0000 (10:18 -0700)]
Merge tag 'gpio-fixes-for-v6.9-rc5' of git://git./linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:

 - use -ENOTSUPP consistently in Intel GPIO drivers

 - don't include dt-bindings headers in gpio-swnode code

 - add missing of device table to gpio-lpc32xx and fix autoloading

* tag 'gpio-fixes-for-v6.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpiolib: swnode: Remove wrong header inclusion
  gpio: lpc32xx: fix module autoloading
  gpio: crystalcove: Use -ENOTSUPP consistently
  gpio: wcove: Use -ENOTSUPP consistently

5 weeks agonet: ethernet: ti: am65-cpsw-nuss: cleanup DMA Channels before using them
Siddharth Vadapalli [Wed, 17 Apr 2024 09:54:25 +0000 (15:24 +0530)]
net: ethernet: ti: am65-cpsw-nuss: cleanup DMA Channels before using them

The TX and RX DMA Channels used by the driver to exchange data with CPSW
are not guaranteed to be in a clean state during driver initialization.
The Bootloader could have used the same DMA Channels without cleaning them
up in the event of failure. Thus, reset and disable the DMA Channels to
ensure that they are in a clean state before using them.

Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver")
Reported-by: Schuyler Patton <spatton@ti.com>
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Link: https://lore.kernel.org/r/20240417095425.2253876-1-s-vadapalli@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet: usb: ax88179_178a: avoid writing the mac address before first reading
Jose Ignacio Tornos Martinez [Wed, 17 Apr 2024 08:55:13 +0000 (10:55 +0200)]
net: usb: ax88179_178a: avoid writing the mac address before first reading

After the commit d2689b6a86b9 ("net: usb: ax88179_178a: avoid two
consecutive device resets"), reset operation, in which the default mac
address from the device is read, is not executed from bind operation and
the random address, that is pregenerated just in case, is direclty written
the first time in the device, so the default one from the device is not
even read. This writing is not dangerous because is volatile and the
default mac address is not missed.

In order to avoid this and keep the simplification to have only one
reset and reduce the delays, restore the reset from bind operation and
remove the reset that is commanded from open operation. The behavior is
the same but everything is ready for usbnet_probe.

Tested with ASIX AX88179 USB Gigabit Ethernet devices.
Restore the old behavior for the rest of possible devices because I don't
have the hardware to test.

cc: stable@vger.kernel.org # 6.6+
Fixes: d2689b6a86b9 ("net: usb: ax88179_178a: avoid two consecutive device resets")
Reported-by: Jarkko Palviainen <jarkko.palviainen@gmail.com>
Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
Link: https://lore.kernel.org/r/20240417085524.219532-1-jtornosm@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge tag 'random-6.9-rc5-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 18 Apr 2024 16:49:08 +0000 (09:49 -0700)]
Merge tag 'random-6.9-rc5-for-linus' of git://git./linux/kernel/git/crng/random

Pull random number generator fixes from Jason Donenfeld:

 - The input subsystem contributes entropy in some places where a
   spinlock is held, but the entropy accounting code only handled
   callers being in an interrupt or non-atomic process context, but not
   atomic process context. We fix this by removing an optimization and
   just calling queue_work() unconditionally.

 - Greg accidently sent up a patch not intended for his tree and that
   had been nack'd, so that's now reverted.

* tag 'random-6.9-rc5-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
  Revert "vmgenid: emit uevent when VMGENID updates"
  random: handle creditable entropy from atomic process context

5 weeks agoMerge tag 'platform-drivers-x86-v6.9-3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 18 Apr 2024 14:15:33 +0000 (07:15 -0700)]
Merge tag 'platform-drivers-x86-v6.9-3' of git://git./linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Ilpo Järvinen:

 - amd/pmf: Add SPS notifications quirk (+ quirk support)

 - amd/pmf: Lower Smart PC check message severity

 - x86/ISST: New HW support

 - x86/intel-uncore-freq: Bump minor version to avoid "unsupported" message

 - amd/pmc: New BIOS version still needs Spurious IRQ1 quirk

* tag 'platform-drivers-x86-v6.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86/amd/pmc: Extend Framework 13 quirk to more BIOSes
  platform/x86/intel-uncore-freq: Increase minor number support
  platform/x86: ISST: Add Granite Rapids-D to HPM CPU list
  platform/x86/amd: pmf: Add quirk for ROG Zephyrus G14
  platform/x86/amd: pmf: Add infrastructure for quirking supported funcs
  platform/x86/amd: pmf: Decrease error message to debug

5 weeks agoRevert "vmgenid: emit uevent when VMGENID updates"
Jason A. Donenfeld [Thu, 18 Apr 2024 11:45:17 +0000 (13:45 +0200)]
Revert "vmgenid: emit uevent when VMGENID updates"

This reverts commit ad6bcdad2b6724e113f191a12f859a9e8456b26d. I had
nak'd it, and Greg said on the thread that it links that he wasn't going
to take it either, especially since it's not his code or his tree, but
then, seemingly accidentally, it got pushed up some months later, in
what looks like a mistake, with no further discussion in the linked
thread. So revert it, since it's clearly not intended.

Fixes: ad6bcdad2b67 ("vmgenid: emit uevent when VMGENID updates")
Cc: stable@vger.kernel.org
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20230531095119.11202-2-bchalios@amazon.es
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
6 weeks agoMerge tag 'nf-24-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Paolo Abeni [Thu, 18 Apr 2024 11:12:36 +0000 (13:12 +0200)]
Merge tag 'nf-24-04-18' of git://git./linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

Patch #1 amends a missing spot where the set iterator type is unset.
 This is fixing a issue in the previous pull request.

Patch #2 fixes the delete set command abort path by restoring state
         of the elements. Reverse logic for the activate (abort) case
 otherwise element state is not restored, this requires to move
 the check for active/inactive elements to the set iterator
 callback. From the deactivate path, toggle the next generation
 bit and from the activate (abort) path, clear the next generation
 bitmask.

Patch #3 skips elements already restored by delete set command from the
 abort path in case there is a previous delete element command in
 the batch. Check for the next generation bit just like it is done
 via set iteration to restore maps.

netfilter pull request 24-04-18

* tag 'nf-24-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: fix memleak in map from abort path
  netfilter: nf_tables: restore set elements when delete set fails
  netfilter: nf_tables: missing iterator type in lookup walk
====================

Link: https://lore.kernel.org/r/20240418010948.3332346-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agoMerge branch 'ravb-ethernet-driver-bugfixes'
Paolo Abeni [Thu, 18 Apr 2024 09:34:31 +0000 (11:34 +0200)]
Merge branch 'ravb-ethernet-driver-bugfixes'

Paul Barker says:

====================
ravb Ethernet driver bugfixes

These patches fix bugs found during recent work on the ravb driver.

Patches 1 & 2 affect the R-Car code paths so have been tested on an
R-Car M3N Salvator-XS board - this is the only R-Car board I currently
have access to.

Patches 2, 3 & 4 affect the GbEth code paths so have been tested on
RZ/G2L and RZ/G2UL SMARC EVK boards.

Changes v2->v3:
  * Incorporate feedback from Niklas and add Reviewed-by tag to patch
    "net: ravb: Count packets instead of descriptors in R-Car RX path".
Changes v1->v2:
  * Fixed typos in commit message of patch
    "net: ravb: Allow RX loop to move past DMA mapping errors".
  * Added Sergey's Reviewed-by tags.
  * Expanded Cc list as Patchwork complained that I had missed people.
  * Trimmed the call trace in accordance with the docs [1] in patch
    "net: ravb: Fix GbEth jumbo packet RX checksum handling".

[1]: https://docs.kernel.org/process/submitting-patches.html#backtraces-in-commit-messages
====================

Link: https://lore.kernel.org/r/20240416120254.2620-1-paul.barker.ct@bp.renesas.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agonet: ravb: Fix RX byte accounting for jumbo packets
Paul Barker [Tue, 16 Apr 2024 12:02:54 +0000 (13:02 +0100)]
net: ravb: Fix RX byte accounting for jumbo packets

The RX byte accounting for jumbo packets was changed to fix a potential
use-after-free bug. However, that fix used the wrong variable and so
only accounted for the number of bytes in the final descriptor, not the
number of bytes in the whole packet.

To fix this, we can simply update our stats with the correct number of
bytes before calling napi_gro_receive().

Also rename pkt_len to desc_len in ravb_rx_gbeth() to avoid any future
confusion. The variable name pkt_len is correct in ravb_rx_rcar() as
that function does not handle packets spanning multiple descriptors.

Fixes: 5a5a3e564de6 ("ravb: Fix potential use-after-free in ravb_rx_gbeth()")
Signed-off-by: Paul Barker <paul.barker.ct@bp.renesas.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agonet: ravb: Fix GbEth jumbo packet RX checksum handling
Paul Barker [Tue, 16 Apr 2024 12:02:53 +0000 (13:02 +0100)]
net: ravb: Fix GbEth jumbo packet RX checksum handling

Sending a 7kB ping packet to the RZ/G2L in v6.9-rc2 causes the following
backtrace:

WARNING: CPU: 0 PID: 0 at include/linux/skbuff.h:3127 skb_trim+0x30/0x38
Hardware name: Renesas SMARC EVK based on r9a07g044l2 (DT)
pc : skb_trim+0x30/0x38
lr : ravb_rx_csum_gbeth+0x40/0x90
Call trace:
 skb_trim+0x30/0x38
 ravb_rx_gbeth+0x56c/0x5cc
 ravb_poll+0xa0/0x204
 __napi_poll+0x38/0x17c

This is caused by ravb_rx_gbeth() calling ravb_rx_csum_gbeth() with the
wrong skb for a packet which spans multiple descriptors. To fix this,
use the correct skb.

Fixes: c2da9408579d ("ravb: Add Rx checksum offload support for GbEth")
Signed-off-by: Paul Barker <paul.barker.ct@bp.renesas.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agonet: ravb: Allow RX loop to move past DMA mapping errors
Paul Barker [Tue, 16 Apr 2024 12:02:52 +0000 (13:02 +0100)]
net: ravb: Allow RX loop to move past DMA mapping errors

The RX loops in ravb_rx_gbeth() and ravb_rx_rcar() skip to the next loop
iteration if a zero-length descriptor is seen (indicating a DMA mapping
error). However, the current RX descriptor index `priv->cur_rx[q]` was
incremented at the end of the loop and so would not be incremented when
we skip to the next loop iteration. This would cause the loop to keep
seeing the same zero-length descriptor instead of moving on to the next
descriptor.

As the loop counter `i` still increments, the loop would eventually
terminate so there is no risk of being stuck here forever - but we
should still fix this to avoid wasting cycles.

To fix this, the RX descriptor index is incremented at the top of the
loop, in the for statement itself. The assignments of `entry` and `desc`
are brought into the loop to avoid the need for duplication.

Fixes: d8b48911fd24 ("ravb: fix ring memory allocation")
Signed-off-by: Paul Barker <paul.barker.ct@bp.renesas.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agonet: ravb: Count packets instead of descriptors in R-Car RX path
Paul Barker [Tue, 16 Apr 2024 12:02:51 +0000 (13:02 +0100)]
net: ravb: Count packets instead of descriptors in R-Car RX path

The units of "work done" in the RX path should be packets instead of
descriptors.

Descriptors which are used by the hardware to record error conditions or
are empty in the case of a DMA mapping error should not count towards
our RX work budget.

Also make the limit variable unsigned as it can never be negative.

Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper")
Signed-off-by: Paul Barker <paul.barker.ct@bp.renesas.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agonet: ethernet: mtk_eth_soc: fix WED + wifi reset
Felix Fietkau [Tue, 16 Apr 2024 08:23:29 +0000 (10:23 +0200)]
net: ethernet: mtk_eth_soc: fix WED + wifi reset

The WLAN + WED reset sequence relies on being able to receive interrupts from
the card, in order to synchronize individual steps with the firmware.
When WED is stopped, leave interrupts running and rely on the driver turning
off unwanted ones.
WED DMA also needs to be disabled before resetting.

Fixes: f78cd9c783e0 ("net: ethernet: mtk_wed: update mtk_wed_stop")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20240416082330.82564-1-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonet:usb:qmi_wwan: support Rolling modules
Vanillan Wang [Tue, 16 Apr 2024 12:07:13 +0000 (20:07 +0800)]
net:usb:qmi_wwan: support Rolling modules

Update the qmi_wwan driver support for the Rolling
LTE modules.

- VID:PID 33f8:0104, RW101-GL for laptop debug M.2 cards(with RMNET
interface for /Linux/Chrome OS)
0x0104: RMNET, diag, at, pipe

Here are the outputs of usb-devices:
T:  Bus=04 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#=  2 Spd=5000 MxCh= 0
D:  Ver= 3.20 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 9 #Cfgs=  1
P:  Vendor=33f8 ProdID=0104 Rev=05.04
S:  Manufacturer=Rolling Wireless S.a.r.l.
S:  Product=Rolling Module
S:  SerialNumber=ba2eb033
C:  #Ifs= 6 Cfg#= 1 Atr=a0 MxPwr=896mA
I:  If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
E:  Ad=01(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E:  Ad=81(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
I:  If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E:  Ad=02(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E:  Ad=82(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E:  Ad=83(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
I:  If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E:  Ad=03(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E:  Ad=84(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E:  Ad=85(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
I:  If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=40 Driver=option
E:  Ad=04(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E:  Ad=86(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E:  Ad=87(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
I:  If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan
E:  Ad=0f(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E:  Ad=88(I) Atr=03(Int.) MxPS=   8 Ivl=32ms
E:  Ad=8e(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
I:  If#= 5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=usbfs
E:  Ad=05(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
E:  Ad=89(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms

Signed-off-by: Vanillan Wang <vanillanwang@163.com>
Link: https://lore.kernel.org/r/20240416120713.24777-1-vanillanwang@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net...
Jakub Kicinski [Thu, 18 Apr 2024 01:38:34 +0000 (18:38 -0700)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2024-04-16 (ice)

This series contains updates to ice driver only.

Michal fixes a couple of issues with TC filter parsing; always add match
for src_vsi and remove flag check that could prevent addition of valid
filters.

Marcin adds additional checks for unsupported flower filters.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  ice: Fix checking for unsupported keys on non-tunnel device
  ice: tc: allow zero flags in parsing tc flower
  ice: tc: check src_vsi in case of traffic from VF
====================

Link: https://lore.kernel.org/r/20240416202409.2008383-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoselftests: kselftest_harness: fix Clang warning about zero-length format
Jakub Kicinski [Tue, 16 Apr 2024 15:10:48 +0000 (08:10 -0700)]
selftests: kselftest_harness: fix Clang warning about zero-length format

Apparently it's more legal to pass the format as NULL, than
it is to use an empty string. Clang complains about empty
formats:

./../kselftest_harness.h:1207:30: warning: format string is empty
[-Wformat-zero-length]
 1207 |            diagnostic ? "%s" : "", diagnostic);
      |                                 ^~
1 warning generated.

Reported-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/all/20240409224256.1581292-1-seanjc@google.com
Fixes: 378193eff339 ("selftests: kselftest_harness: let PASS / FAIL provide diagnostic")
Tested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Link: https://lore.kernel.org/r/20240416151048.1682352-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoMerge tag 'for-6.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Thu, 18 Apr 2024 01:25:40 +0000 (18:25 -0700)]
Merge tag 'for-6.9-rc4-tag' of git://git./linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - fixup in zoned mode for out-of-order writes of metadata that are no
   longer necessary, this used to be tracked in a separate list but now
   the old locaion needs to be zeroed out, also add assertions

 - fix bulk page allocation retry, this may stall after first failure
   for compression read/write

* tag 'for-6.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: do not wait for short bulk allocation
  btrfs: zoned: add ASSERT and WARN for EXTENT_BUFFER_ZONED_ZEROOUT handling
  btrfs: zoned: do not flag ZEROOUT on non-dirty extent buffer

6 weeks agonet/sched: Fix mirred deadlock on device recursion
Eric Dumazet [Mon, 15 Apr 2024 21:07:28 +0000 (18:07 -0300)]
net/sched: Fix mirred deadlock on device recursion

When the mirred action is used on a classful egress qdisc and a packet is
mirrored or redirected to self we hit a qdisc lock deadlock.
See trace below.

[..... other info removed for brevity....]
[   82.890906]
[   82.890906] ============================================
[   82.890906] WARNING: possible recursive locking detected
[   82.890906] 6.8.0-05205-g77fadd89fe2d-dirty #213 Tainted: G        W
[   82.890906] --------------------------------------------
[   82.890906] ping/418 is trying to acquire lock:
[   82.890906] ffff888006994110 (&sch->q.lock){+.-.}-{3:3}, at:
__dev_queue_xmit+0x1778/0x3550
[   82.890906]
[   82.890906] but task is already holding lock:
[   82.890906] ffff888006994110 (&sch->q.lock){+.-.}-{3:3}, at:
__dev_queue_xmit+0x1778/0x3550
[   82.890906]
[   82.890906] other info that might help us debug this:
[   82.890906]  Possible unsafe locking scenario:
[   82.890906]
[   82.890906]        CPU0
[   82.890906]        ----
[   82.890906]   lock(&sch->q.lock);
[   82.890906]   lock(&sch->q.lock);
[   82.890906]
[   82.890906]  *** DEADLOCK ***
[   82.890906]
[..... other info removed for brevity....]

Example setup (eth0->eth0) to recreate
tc qdisc add dev eth0 root handle 1: htb default 30
tc filter add dev eth0 handle 1: protocol ip prio 2 matchall \
     action mirred egress redirect dev eth0

Another example(eth0->eth1->eth0) to recreate
tc qdisc add dev eth0 root handle 1: htb default 30
tc filter add dev eth0 handle 1: protocol ip prio 2 matchall \
     action mirred egress redirect dev eth1

tc qdisc add dev eth1 root handle 1: htb default 30
tc filter add dev eth1 handle 1: protocol ip prio 2 matchall \
     action mirred egress redirect dev eth0

We fix this by adding an owner field (CPU id) to struct Qdisc set after
root qdisc is entered. When the softirq enters it a second time, if the
qdisc owner is the same CPU, the packet is dropped to break the loop.

Reported-by: Mingshuai Ren <renmingshuai@huawei.com>
Closes: https://lore.kernel.org/netdev/20240314111713.5979-1-renmingshuai@huawei.com/
Fixes: 3bcb846ca4cf ("net: get rid of spin_trylock() in net_tx_action()")
Fixes: e578d9c02587 ("net: sched: use counter to break reclassify loops")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Tested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20240415210728.36949-1-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agonetfilter: nf_tables: fix memleak in map from abort path
Pablo Neira Ayuso [Wed, 17 Apr 2024 15:43:21 +0000 (17:43 +0200)]
netfilter: nf_tables: fix memleak in map from abort path

The delete set command does not rely on the transaction object for
element removal, therefore, a combination of delete element + delete set
from the abort path could result in restoring twice the refcount of the
mapping.

Check for inactive element in the next generation for the delete element
command in the abort path, skip restoring state if next generation bit
has been already cleared. This is similar to the activate logic using
the set walk iterator.

[ 6170.286929] ------------[ cut here ]------------
[ 6170.286939] WARNING: CPU: 6 PID: 790302 at net/netfilter/nf_tables_api.c:2086 nf_tables_chain_destroy+0x1f7/0x220 [nf_tables]
[ 6170.287071] Modules linked in: [...]
[ 6170.287633] CPU: 6 PID: 790302 Comm: kworker/6:2 Not tainted 6.9.0-rc3+ #365
[ 6170.287768] RIP: 0010:nf_tables_chain_destroy+0x1f7/0x220 [nf_tables]
[ 6170.287886] Code: df 48 8d 7d 58 e8 69 2e 3b df 48 8b 7d 58 e8 80 1b 37 df 48 8d 7d 68 e8 57 2e 3b df 48 8b 7d 68 e8 6e 1b 37 df 48 89 ef eb c4 <0f> 0b 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 0f
[ 6170.287895] RSP: 0018:ffff888134b8fd08 EFLAGS: 00010202
[ 6170.287904] RAX: 0000000000000001 RBX: ffff888125bffb28 RCX: dffffc0000000000
[ 6170.287912] RDX: 0000000000000003 RSI: ffffffffa20298ab RDI: ffff88811ebe4750
[ 6170.287919] RBP: ffff88811ebe4700 R08: ffff88838e812650 R09: fffffbfff0623a55
[ 6170.287926] R10: ffffffff8311d2af R11: 0000000000000001 R12: ffff888125bffb10
[ 6170.287933] R13: ffff888125bffb10 R14: dead000000000122 R15: dead000000000100
[ 6170.287940] FS:  0000000000000000(0000) GS:ffff888390b00000(0000) knlGS:0000000000000000
[ 6170.287948] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6170.287955] CR2: 00007fd31fc00710 CR3: 0000000133f60004 CR4: 00000000001706f0
[ 6170.287962] Call Trace:
[ 6170.287967]  <TASK>
[ 6170.287973]  ? __warn+0x9f/0x1a0
[ 6170.287986]  ? nf_tables_chain_destroy+0x1f7/0x220 [nf_tables]
[ 6170.288092]  ? report_bug+0x1b1/0x1e0
[ 6170.287986]  ? nf_tables_chain_destroy+0x1f7/0x220 [nf_tables]
[ 6170.288092]  ? report_bug+0x1b1/0x1e0
[ 6170.288104]  ? handle_bug+0x3c/0x70
[ 6170.288112]  ? exc_invalid_op+0x17/0x40
[ 6170.288120]  ? asm_exc_invalid_op+0x1a/0x20
[ 6170.288132]  ? nf_tables_chain_destroy+0x2b/0x220 [nf_tables]
[ 6170.288243]  ? nf_tables_chain_destroy+0x1f7/0x220 [nf_tables]
[ 6170.288366]  ? nf_tables_chain_destroy+0x2b/0x220 [nf_tables]
[ 6170.288483]  nf_tables_trans_destroy_work+0x588/0x590 [nf_tables]

Fixes: 591054469b3e ("netfilter: nf_tables: revisit chain/object refcounting from elements")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 weeks agogpiolib: swnode: Remove wrong header inclusion
Andy Shevchenko [Wed, 17 Apr 2024 14:19:13 +0000 (17:19 +0300)]
gpiolib: swnode: Remove wrong header inclusion

The flags in the software node properties are supposed to be
the GPIO lookup flags, which are provided by gpio/machine.h,
as the software nodes are the kernel internal thing and doesn't
need to rely to any of ABIs.

Fixes: e7f9ff5dc90c ("gpiolib: add support for software nodes")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
6 weeks agoMerge tag 'pwm/for-6.9-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 17 Apr 2024 17:04:40 +0000 (10:04 -0700)]
Merge tag 'pwm/for-6.9-rc5-fixes' of git://git./linux/kernel/git/ukleinek/linux

Pull pwm fixes from Uwe Kleine-König:
 "The first patch fixes a regression in the suspend/resume path for the
  dwc pwm driver that was introduced in v6.9-rc1 when support for 16
  channel devices was added.

  The second patch fixes a bunch of device tree binding check warnings"

* tag 'pwm/for-6.9-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux:
  dt-bindings: pwm: mediatek,pwm-disp: Document power-domains property
  pwm: dwc: allow suspend/resume for 16 channels

6 weeks agonetfilter: nf_tables: restore set elements when delete set fails
Pablo Neira Ayuso [Wed, 17 Apr 2024 15:43:11 +0000 (17:43 +0200)]
netfilter: nf_tables: restore set elements when delete set fails

From abort path, nft_mapelem_activate() needs to restore refcounters to
the original state. Currently, it uses the set->ops->walk() to iterate
over these set elements. The existing set iterator skips inactive
elements in the next generation, this does not work from the abort path
to restore the original state since it has to skip active elements
instead (not inactive ones).

This patch moves the check for inactive elements to the set iterator
callback, then it reverses the logic for the .activate case which
needs to skip active elements.

Toggle next generation bit for elements when delete set command is
invoked and call nft_clear() from .activate (abort) path to restore the
next generation bit.

The splat below shows an object in mappings memleak:

[43929.457523] ------------[ cut here ]------------
[43929.457532] WARNING: CPU: 0 PID: 1139 at include/net/netfilter/nf_tables.h:1237 nft_setelem_data_deactivate+0xe4/0xf0 [nf_tables]
[...]
[43929.458014] RIP: 0010:nft_setelem_data_deactivate+0xe4/0xf0 [nf_tables]
[43929.458076] Code: 83 f8 01 77 ab 49 8d 7c 24 08 e8 37 5e d0 de 49 8b 6c 24 08 48 8d 7d 50 e8 e9 5c d0 de 8b 45 50 8d 50 ff 89 55 50 85 c0 75 86 <0f> 0b eb 82 0f 0b eb b3 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90
[43929.458081] RSP: 0018:ffff888140f9f4b0 EFLAGS: 00010246
[43929.458086] RAX: 0000000000000000 RBX: ffff8881434f5288 RCX: dffffc0000000000
[43929.458090] RDX: 00000000ffffffff RSI: ffffffffa26d28a7 RDI: ffff88810ecc9550
[43929.458093] RBP: ffff88810ecc9500 R08: 0000000000000001 R09: ffffed10281f3e8f
[43929.458096] R10: 0000000000000003 R11: ffff0000ffff0000 R12: ffff8881434f52a0
[43929.458100] R13: ffff888140f9f5f4 R14: ffff888151c7a800 R15: 0000000000000002
[43929.458103] FS:  00007f0c687c4740(0000) GS:ffff888390800000(0000) knlGS:0000000000000000
[43929.458107] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[43929.458111] CR2: 00007f58dbe5b008 CR3: 0000000123602005 CR4: 00000000001706f0
[43929.458114] Call Trace:
[43929.458118]  <TASK>
[43929.458121]  ? __warn+0x9f/0x1a0
[43929.458127]  ? nft_setelem_data_deactivate+0xe4/0xf0 [nf_tables]
[43929.458188]  ? report_bug+0x1b1/0x1e0
[43929.458196]  ? handle_bug+0x3c/0x70
[43929.458200]  ? exc_invalid_op+0x17/0x40
[43929.458211]  ? nft_setelem_data_deactivate+0xd7/0xf0 [nf_tables]
[43929.458271]  ? nft_setelem_data_deactivate+0xe4/0xf0 [nf_tables]
[43929.458332]  nft_mapelem_deactivate+0x24/0x30 [nf_tables]
[43929.458392]  nft_rhash_walk+0xdd/0x180 [nf_tables]
[43929.458453]  ? __pfx_nft_rhash_walk+0x10/0x10 [nf_tables]
[43929.458512]  ? rb_insert_color+0x2e/0x280
[43929.458520]  nft_map_deactivate+0xdc/0x1e0 [nf_tables]
[43929.458582]  ? __pfx_nft_map_deactivate+0x10/0x10 [nf_tables]
[43929.458642]  ? __pfx_nft_mapelem_deactivate+0x10/0x10 [nf_tables]
[43929.458701]  ? __rcu_read_unlock+0x46/0x70
[43929.458709]  nft_delset+0xff/0x110 [nf_tables]
[43929.458769]  nft_flush_table+0x16f/0x460 [nf_tables]
[43929.458830]  nf_tables_deltable+0x501/0x580 [nf_tables]

Fixes: 628bd3e49cba ("netfilter: nf_tables: drop map element references from preparation phase")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 weeks agonetfilter: nf_tables: missing iterator type in lookup walk
Pablo Neira Ayuso [Wed, 17 Apr 2024 15:43:01 +0000 (17:43 +0200)]
netfilter: nf_tables: missing iterator type in lookup walk

Add missing decorator type to lookup expression and tighten WARN_ON_ONCE
check in pipapo to spot earlier that this is unset.

Fixes: 29b359cf6d95 ("netfilter: nft_set_pipapo: walk over current view on netlink dump")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 weeks agoplatform/x86/amd/pmc: Extend Framework 13 quirk to more BIOSes
Mario Limonciello [Wed, 10 Apr 2024 14:10:46 +0000 (09:10 -0500)]
platform/x86/amd/pmc: Extend Framework 13 quirk to more BIOSes

BIOS 03.05 still hasn't fixed the spurious IRQ1 issue.  As it's still
being worked on there is still a possibility that it won't need to
apply to future BIOS releases.

Add a quirk for BIOS 03.05 as well.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240410141046.433-1-mario.limonciello@amd.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
6 weeks agos390/ism: Properly fix receive message buffer allocation
Gerd Bayer [Mon, 15 Apr 2024 13:15:07 +0000 (15:15 +0200)]
s390/ism: Properly fix receive message buffer allocation

Since [1], dma_alloc_coherent() does not accept requests for GFP_COMP
anymore, even on archs that may be able to fulfill this. Functionality that
relied on the receive buffer being a compound page broke at that point:
The SMC-D protocol, that utilizes the ism device driver, passes receive
buffers to the splice processor in a struct splice_pipe_desc with a
single entry list of struct pages. As the buffer is no longer a compound
page, the splice processor now rejects requests to handle more than a
page worth of data.

Replace dma_alloc_coherent() and allocate a buffer with folio_alloc and
create a DMA map for it with dma_map_page(). Since only receive buffers
on ISM devices use DMA, qualify the mapping as FROM_DEVICE.
Since ISM devices are available on arch s390, only, and on that arch all
DMA is coherent, there is no need to introduce and export some kind of
dma_sync_to_cpu() method to be called by the SMC-D protocol layer.

Analogously, replace dma_free_coherent by a two step dma_unmap_page,
then folio_put to free the receive buffer.

[1] https://lore.kernel.org/all/20221113163535.884299-1-hch@lst.de/

Fixes: c08004eede4b ("s390/ism: don't pass bogus GFP_ flags to dma_alloc_coherent")
Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 weeks agorandom: handle creditable entropy from atomic process context
Jason A. Donenfeld [Wed, 17 Apr 2024 11:38:29 +0000 (13:38 +0200)]
random: handle creditable entropy from atomic process context

The entropy accounting changes a static key when the RNG has
initialized, since it only ever initializes once. Static key changes,
however, cannot be made from atomic context, so depending on where the
last creditable entropy comes from, the static key change might need to
be deferred to a worker.

Previously the code used the execute_in_process_context() helper
function, which accounts for whether or not the caller is
in_interrupt(). However, that doesn't account for the case where the
caller is actually in process context but is holding a spinlock.

This turned out to be the case with input_handle_event() in
drivers/input/input.c contributing entropy:

  [<ffffffd613025ba0>] die+0xa8/0x2fc
  [<ffffffd613027428>] bug_handler+0x44/0xec
  [<ffffffd613016964>] brk_handler+0x90/0x144
  [<ffffffd613041e58>] do_debug_exception+0xa0/0x148
  [<ffffffd61400c208>] el1_dbg+0x60/0x7c
  [<ffffffd61400c000>] el1h_64_sync_handler+0x38/0x90
  [<ffffffd613011294>] el1h_64_sync+0x64/0x6c
  [<ffffffd613102d88>] __might_resched+0x1fc/0x2e8
  [<ffffffd613102b54>] __might_sleep+0x44/0x7c
  [<ffffffd6130b6eac>] cpus_read_lock+0x1c/0xec
  [<ffffffd6132c2820>] static_key_enable+0x14/0x38
  [<ffffffd61400ac08>] crng_set_ready+0x14/0x28
  [<ffffffd6130df4dc>] execute_in_process_context+0xb8/0xf8
  [<ffffffd61400ab30>] _credit_init_bits+0x118/0x1dc
  [<ffffffd6138580c8>] add_timer_randomness+0x264/0x270
  [<ffffffd613857e54>] add_input_randomness+0x38/0x48
  [<ffffffd613a80f94>] input_handle_event+0x2b8/0x490
  [<ffffffd613a81310>] input_event+0x6c/0x98

According to Guoyong, it's not really possible to refactor the various
drivers to never hold a spinlock there. And in_atomic() isn't reliable.

So, rather than trying to be too fancy, just punt the change in the
static key to a workqueue always. There's basically no drawback of doing
this, as the code already needed to account for the static key not
changing immediately, and given that it's just an optimization, there's
not exactly a hurry to change the static key right away, so deferal is
fine.

Reported-by: Guoyong Wang <guoyong.wang@mediatek.com>
Cc: stable@vger.kernel.org
Fixes: f5bda35fba61 ("random: use static branch for crng_ready()")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
6 weeks agoMerge branch 'mt7530-fixes'
David S. Miller [Wed, 17 Apr 2024 07:56:51 +0000 (08:56 +0100)]
Merge branch 'mt7530-fixes'

Merge branch 'mr7530-fixes'

Arınç ÜNAL says:

====================
Fix port mirroring on MT7530 DSA subdriver

This patch series fixes the frames received on the local port (monitor
port) not being mirrored, and port mirroring for the MT7988 SoC switch.
====================

Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
6 weeks agonet: dsa: mt7530: fix port mirroring for MT7988 SoC switch
Arınç ÜNAL [Sat, 13 Apr 2024 13:01:40 +0000 (16:01 +0300)]
net: dsa: mt7530: fix port mirroring for MT7988 SoC switch

The "MT7988A Wi-Fi 7 Generation Router Platform: Datasheet (Open Version)
v0.1" document shows bits 16 to 18 as the MIRROR_PORT field of the CPU
forward control register. Currently, the MT7530 DSA subdriver configures
bits 0 to 2 of the CPU forward control register which breaks the port
mirroring feature for the MT7988 SoC switch.

Fix this by using the MT7531_MIRROR_PORT_GET() and MT7531_MIRROR_PORT_SET()
macros which utilise the correct bits.

Fixes: 110c18bfed41 ("net: dsa: mt7530: introduce driver for MT7988 built-in switch")
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Acked-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 weeks agonet: dsa: mt7530: fix mirroring frames received on local port
Arınç ÜNAL [Sat, 13 Apr 2024 13:01:39 +0000 (16:01 +0300)]
net: dsa: mt7530: fix mirroring frames received on local port

This switch intellectual property provides a bit on the ARL global control
register which controls allowing mirroring frames which are received on the
local port (monitor port). This bit is unset after reset.

This ability must be enabled to fully support the port mirroring feature on
this switch intellectual property.

Therefore, this patch fixes the traffic not being reflected on a port,
which would be configured like below:

  tc qdisc add dev swp0 clsact

  tc filter add dev swp0 ingress matchall skip_sw \
  action mirred egress mirror dev swp0

As a side note, this configuration provides the hairpinning feature for a
single port.

Fixes: 37feab6076aa ("net: dsa: mt7530: add support for port mirroring")
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 weeks agotun: limit printing rate when illegal packet received by tun dev
Lei Chen [Mon, 15 Apr 2024 02:02:46 +0000 (22:02 -0400)]
tun: limit printing rate when illegal packet received by tun dev

vhost_worker will call tun call backs to receive packets. If too many
illegal packets arrives, tun_do_read will keep dumping packet contents.
When console is enabled, it will costs much more cpu time to dump
packet and soft lockup will be detected.

net_ratelimit mechanism can be used to limit the dumping rate.

PID: 33036    TASK: ffff949da6f20000  CPU: 23   COMMAND: "vhost-32980"
 #0 [fffffe00003fce50] crash_nmi_callback at ffffffff89249253
 #1 [fffffe00003fce58] nmi_handle at ffffffff89225fa3
 #2 [fffffe00003fceb0] default_do_nmi at ffffffff8922642e
 #3 [fffffe00003fced0] do_nmi at ffffffff8922660d
 #4 [fffffe00003fcef0] end_repeat_nmi at ffffffff89c01663
    [exception RIP: io_serial_in+20]
    RIP: ffffffff89792594  RSP: ffffa655314979e8  RFLAGS: 00000002
    RAX: ffffffff89792500  RBX: ffffffff8af428a0  RCX: 0000000000000000
    RDX: 00000000000003fd  RSI: 0000000000000005  RDI: ffffffff8af428a0
    RBP: 0000000000002710   R8: 0000000000000004   R9: 000000000000000f
    R10: 0000000000000000  R11: ffffffff8acbf64f  R12: 0000000000000020
    R13: ffffffff8acbf698  R14: 0000000000000058  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #5 [ffffa655314979e8] io_serial_in at ffffffff89792594
 #6 [ffffa655314979e8] wait_for_xmitr at ffffffff89793470
 #7 [ffffa65531497a08] serial8250_console_putchar at ffffffff897934f6
 #8 [ffffa65531497a20] uart_console_write at ffffffff8978b605
 #9 [ffffa65531497a48] serial8250_console_write at ffffffff89796558
 #10 [ffffa65531497ac8] console_unlock at ffffffff89316124
 #11 [ffffa65531497b10] vprintk_emit at ffffffff89317c07
 #12 [ffffa65531497b68] printk at ffffffff89318306
 #13 [ffffa65531497bc8] print_hex_dump at ffffffff89650765
 #14 [ffffa65531497ca8] tun_do_read at ffffffffc0b06c27 [tun]
 #15 [ffffa65531497d38] tun_recvmsg at ffffffffc0b06e34 [tun]
 #16 [ffffa65531497d68] handle_rx at ffffffffc0c5d682 [vhost_net]
 #17 [ffffa65531497ed0] vhost_worker at ffffffffc0c644dc [vhost]
 #18 [ffffa65531497f10] kthread at ffffffff892d2e72
 #19 [ffffa65531497f50] ret_from_fork at ffffffff89c0022f

Fixes: ef3db4a59542 ("tun: avoid BUG, dump packet on GSO errors")
Signed-off-by: Lei Chen <lei.chen@smartx.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20240415020247.2207781-1-lei.chen@smartx.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 weeks agoice: Fix checking for unsupported keys on non-tunnel device
Marcin Szycik [Tue, 9 Apr 2024 15:45:44 +0000 (17:45 +0200)]
ice: Fix checking for unsupported keys on non-tunnel device

Add missing FLOW_DISSECTOR_KEY_ENC_* checks to TC flower filter parsing.
Without these checks, it would be possible to add filters with tunnel
options on non-tunnel devices. enc_* options are only valid for tunnel
devices.

Example:
  devlink dev eswitch set $PF1_PCI mode switchdev
  echo 1 > /sys/class/net/$PF1/device/sriov_numvfs
  tc qdisc add dev $VF1_PR ingress
  ethtool -K $PF1 hw-tc-offload on
  tc filter add dev $VF1_PR ingress flower enc_ttl 12 skip_sw action drop

Fixes: 9e300987d4a8 ("ice: VXLAN and Geneve TC support")
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
6 weeks agoice: tc: allow zero flags in parsing tc flower
Michal Swiatkowski [Fri, 15 Mar 2024 11:08:21 +0000 (12:08 +0100)]
ice: tc: allow zero flags in parsing tc flower

The check for flags is done to not pass empty lookups to adding switch
rule functions. Since metadata is always added to lookups there is no
need to check against the flag.

It is also fixing the problem with such rule:
$ tc filter add dev gtp_dev ingress protocol ip prio 0 flower \
enc_dst_port 2123 action drop
Switch block in case of GTP can't parse the destination port, because it
should always be set to GTP specific value. The same with ethertype. The
result is that there is no other matching criteria than GTP tunnel. In
this case flags is 0, rule can't be added only because of defensive
check against flags.

Fixes: 9a225f81f540 ("ice: Support GTP-U and GTP-C offload in switchdev")
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
6 weeks agoice: tc: check src_vsi in case of traffic from VF
Michal Swiatkowski [Fri, 15 Mar 2024 11:08:20 +0000 (12:08 +0100)]
ice: tc: check src_vsi in case of traffic from VF

In case of traffic going from the VF (so ingress for port representor)
source VSI should be consider during packet classification. It is
needed for hardware to not match packets from different ports with
filters added on other port.

It is only for "from VF" traffic, because other traffic direction
doesn't have source VSI.

Set correct ::src_vsi in rule_info to pass it to the hardware filter.

For example this rule should drop only ipv4 packets from eth10, not from
the others VF PRs. It is needed to check source VSI in this case.
$tc filter add dev eth10 ingress protocol ip flower skip_sw action drop

Fixes: 0d08a441fb1a ("ice: ndo_setup_tc implementation for PF")
Reviewed-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
6 weeks agoMerge branch 'net-stmmac-fix-mac-capabilities-procedure'
Paolo Abeni [Tue, 16 Apr 2024 13:24:57 +0000 (15:24 +0200)]
Merge branch 'net-stmmac-fix-mac-capabilities-procedure'

Serge Semin says:

====================
net: stmmac: Fix MAC-capabilities procedure

The series got born as a result of the discussions around the recent
Yanteng' series adding the Loongson LS7A1000, LS2K1000, LS7A2000, LS2K2000
MACs support:
Link: https://lore.kernel.org/netdev/fu3f6uoakylnb6eijllakeu5i4okcyqq7sfafhp5efaocbsrwe@w74xe7gb6x7p
In particular the Yanteng' patchset needed to implement the Loongson
MAC-specific constraints applied to the link speed and link duplex mode.
As a result of the discussion with Russel the next preliminary patch was
born:
Link: https://lore.kernel.org/netdev/df31e8bcf74b3b4ddb7ddf5a1c371390f16a2ad5.1712917541.git.siyanteng@loongson.cn
The patch above was a temporal solution utilized by Yanteng for further
developments and to move on with the on-going review. This patchset is a
refactored version of that single patch with formatting required for the
fixes patches.

In particular the series starts with fixing the half-duplex-less
constraint currently applied for all IP-cores. In fact it's specific for
the DW QoS Eth only (DW GMAC v4.x/v5.x).

The next patch fixes the MAC-capabilities setting up during the active
Tx/Rx queues re-initialization procedure. Particularly the procedure
missed the max-speed limit thus possibly activating speeds prohibited on
the respective platforms.

Third patch fixes the incorrect MAC-capabilities initialization for DW
MAC100, DW XGMAC and DW XLGMAC devices by moving the correct
initialization to the IP-core specific setup() methods.

That's it for now. Thanks for review and testing in advance.

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Simon Horman <horms@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Jernej Skrabec <jernej.skrabec@gmail.com>
Cc: Samuel Holland <samuel@sholland.org>
Cc: netdev@vger.kernel.org
Cc: linux-stm32@st-md-mailman.stormreply.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-sunxi@lists.linux.dev
Cc: linux-kernel@vger.kernel.org
====================

Link: https://lore.kernel.org/r/20240412180340.7965-1-fancer.lancer@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agonet: stmmac: Fix IP-cores specific MAC capabilities
Serge Semin [Fri, 12 Apr 2024 18:03:16 +0000 (21:03 +0300)]
net: stmmac: Fix IP-cores specific MAC capabilities

Here is the list of the MAC capabilities specific to the particular DW MAC
IP-cores currently supported by the driver:

DW MAC100: MAC_ASYM_PAUSE | MAC_SYM_PAUSE |
   MAC_10 | MAC_100

DW GMAC:  MAC_ASYM_PAUSE | MAC_SYM_PAUSE |
          MAC_10 | MAC_100 | MAC_1000

Allwinner sun8i MAC: MAC_ASYM_PAUSE | MAC_SYM_PAUSE |
                     MAC_10 | MAC_100 | MAC_1000

DW QoS Eth: MAC_ASYM_PAUSE | MAC_SYM_PAUSE |
            MAC_10 | MAC_100 | MAC_1000 | MAC_2500FD
if there is more than 1 active Tx/Rx queues:
   MAC_ASYM_PAUSE | MAC_SYM_PAUSE |
           MAC_10FD | MAC_100FD | MAC_1000FD | MAC_2500FD

DW XGMAC: MAC_ASYM_PAUSE | MAC_SYM_PAUSE |
          MAC_1000FD | MAC_2500FD | MAC_5000FD | MAC_10000FD

DW XLGMAC: MAC_ASYM_PAUSE | MAC_SYM_PAUSE |
          MAC_1000FD | MAC_2500FD | MAC_5000FD | MAC_10000FD |
          MAC_25000FD | MAC_40000FD | MAC_50000FD | MAC_100000FD

As you can see there are only two common capabilities:
MAC_ASYM_PAUSE | MAC_SYM_PAUSE.
Meanwhile what is currently implemented defines 10/100/1000 link speeds
for all IP-cores, which is definitely incorrect for DW MAC100, DW XGMAC
and DW XLGMAC devices.

Seeing the flow-control is implemented as a callback for each MAC IP-core
(see dwmac100_flow_ctrl(), dwmac1000_flow_ctrl(), sun8i_dwmac_flow_ctrl(),
etc) and since the MAC-specific setup() method is supposed to be called
for each available DW MAC-based device, the capabilities initialization
can be freely moved to these setup() functions, thus correctly setting up
the MAC-capabilities for each IP-core (including the Allwinner Sun8i). A
new stmmac_link::caps field was specifically introduced for that so to
have all link-specific info preserved in a single structure.

Note the suggested change fixes three earlier commits at a time. The
commit 5b0d7d7da64b ("net: stmmac: Add the missing speeds that XGMAC
supports") permitted the 10-100 link speeds and 1G half-duplex mode for DW
XGMAC IP-core even though it doesn't support them. The commit df7699c70c1b
("net: stmmac: Do not cut down 1G modes") incorrectly added the MAC1000
capability to the DW MAC100 IP-core. Similarly to the DW XGMAC the commit
8a880936e902 ("net: stmmac: Add XLGMII support") incorrectly permitted the
10-100 link speeds and 1G half-duplex mode for DW XLGMAC IP-core.

Fixes: 5b0d7d7da64b ("net: stmmac: Add the missing speeds that XGMAC supports")
Fixes: df7699c70c1b ("net: stmmac: Do not cut down 1G modes")
Fixes: 8a880936e902 ("net: stmmac: Add XLGMII support")
Suggested-by: Russell King (Oracle) <linux@armlinux.org.uk>
Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Reviewed-by: Romain Gantois <romain.gantois@bootlin.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agonet: stmmac: Fix max-speed being ignored on queue re-init
Serge Semin [Fri, 12 Apr 2024 18:03:15 +0000 (21:03 +0300)]
net: stmmac: Fix max-speed being ignored on queue re-init

It's possible to have the maximum link speed being artificially limited on
the platform-specific basis. It's done either by setting up the
plat_stmmacenet_data::max_speed field or by specifying the "max-speed"
DT-property. In such cases it's required that any specific
MAC-capabilities re-initializations would take the limit into account. In
particular the link speed capabilities may change during the number of
active Tx/Rx queues re-initialization. But the currently implemented
procedure doesn't take the speed limit into account.

Fix that by calling phylink_limit_mac_speed() in the
stmmac_reinit_queues() method if the speed limitation was required in the
same way as it's done in the stmmac_phy_setup() function.

Fixes: 95201f36f395 ("net: stmmac: update MAC capabilities when tx queues are updated")
Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Reviewed-by: Romain Gantois <romain.gantois@bootlin.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agonet: stmmac: Apply half-duplex-less constraint for DW QoS Eth only
Serge Semin [Fri, 12 Apr 2024 18:03:14 +0000 (21:03 +0300)]
net: stmmac: Apply half-duplex-less constraint for DW QoS Eth only

There are three DW MAC IP-cores which can have the multiple Tx/Rx queues
enabled:
DW GMAC v3.7+ with AV feature,
DW QoS Eth v4.x/v5.x,
DW XGMAC/XLGMAC
Based on the respective HW databooks, only the DW QoS Eth IP-core doesn't
support the half-duplex link mode in case if more than one queues enabled:

"In multiple queue/channel configurations, for half-duplex operation,
enable only the Q0/CH0 on Tx and Rx. For single queue/channel in
full-duplex operation, any queue/channel can be enabled."

The rest of the IP-cores don't have such constraint. Thus in order to have
the constraint applied for the DW QoS Eth MACs only, let's move the it'
implementation to the respective MAC-capabilities getter and make sure the
getter is called in the queues re-init procedure.

Fixes: b6cfffa7ad92 ("stmmac: fix DMA channel hang in half-duplex mode")
Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Reviewed-by: Romain Gantois <romain.gantois@bootlin.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agoplatform/x86/intel-uncore-freq: Increase minor number support
Srinivas Pandruvada [Mon, 15 Apr 2024 22:06:25 +0000 (15:06 -0700)]
platform/x86/intel-uncore-freq: Increase minor number support

No new changes will be added for minor version 2. Change the minor
version number to 2 and stop displaying log message for unsupported
minor version 2.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20240415220625.2828339-1-srinivas.pandruvada@linux.intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
6 weeks agoplatform/x86: ISST: Add Granite Rapids-D to HPM CPU list
Srinivas Pandruvada [Mon, 15 Apr 2024 21:28:53 +0000 (14:28 -0700)]
platform/x86: ISST: Add Granite Rapids-D to HPM CPU list

Add Granite Rapids-D to hpm_cpu_ids, so that MSR 0x54 can be used.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20240415212853.2820470-1-srinivas.pandruvada@linux.intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
6 weeks agoplatform/x86/amd: pmf: Add quirk for ROG Zephyrus G14
Mario Limonciello [Wed, 10 Apr 2024 14:09:56 +0000 (09:09 -0500)]
platform/x86/amd: pmf: Add quirk for ROG Zephyrus G14

ROG Zephyrus G14 advertises support for SPS notifications to the
BIOS but doesn't actually use them. Instead the asus-nb-wmi driver
utilizes such events.

Add a quirk to prevent the system from registering for ACPI platform
profile when this system is found to avoid conflicts.

Reported-by: al0uette@outlook.com
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218685
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20240410140956.385-3-mario.limonciello@amd.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
6 weeks agoplatform/x86/amd: pmf: Add infrastructure for quirking supported funcs
Mario Limonciello [Wed, 10 Apr 2024 14:09:55 +0000 (09:09 -0500)]
platform/x86/amd: pmf: Add infrastructure for quirking supported funcs

In the event of a BIOS bug add infrastructure that will be utilized
to override the return value for supported_funcs to avoid enabling
features.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20240410140956.385-2-mario.limonciello@amd.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
6 weeks agoplatform/x86/amd: pmf: Decrease error message to debug
Mario Limonciello [Wed, 10 Apr 2024 14:09:54 +0000 (09:09 -0500)]
platform/x86/amd: pmf: Decrease error message to debug

ASUS ROG Zephyrus G14 doesn't have _CRS in AMDI0102 device and so
there are no resources to walk.  This is expected behavior because
it doesn't support Smart PC.  Decrease error message to debug.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=218685
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240410140956.385-1-mario.limonciello@amd.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
6 weeks agoMerge branch 'selftests-net-tcp_ao-a-bunch-of-fixes-for-tcp-ao-selftests'
Paolo Abeni [Tue, 16 Apr 2024 11:35:09 +0000 (13:35 +0200)]
Merge branch 'selftests-net-tcp_ao-a-bunch-of-fixes-for-tcp-ao-selftests'

Dmitry Safonov via says:

====================
selftests/net/tcp_ao: A bunch of fixes for TCP-AO selftests

Started as addressing the flakiness issues in rst_ipv*, that affect
netdev dashboard.

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
====================

Link: https://lore.kernel.org/r/20240413-tcp-ao-selftests-fixes-v1-0-f9c41c96949d@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agoselftests/tcp_ao: Printing fixes to confirm with format-security
Dmitry Safonov [Sat, 13 Apr 2024 01:42:55 +0000 (02:42 +0100)]
selftests/tcp_ao: Printing fixes to confirm with format-security

On my new laptop with packages from nixos-unstable, gcc 12.3.0 produces
> lib/setup.c: In function ‘__test_msg’:
> lib/setup.c:20:9: error: format not a string literal and no format arguments [-Werror=format-security]
>    20 |         ksft_print_msg(buf);
>       |         ^~~~~~~~~~~~~~
> lib/setup.c: In function ‘__test_ok’:
> lib/setup.c:26:9: error: format not a string literal and no format arguments [-Werror=format-security]
>    26 |         ksft_test_result_pass(buf);
>       |         ^~~~~~~~~~~~~~~~~~~~~
> lib/setup.c: In function ‘__test_fail’:
> lib/setup.c:32:9: error: format not a string literal and no format arguments [-Werror=format-security]
>    32 |         ksft_test_result_fail(buf);
>       |         ^~~~~~~~~~~~~~~~~~~~~
> lib/setup.c: In function ‘__test_xfail’:
> lib/setup.c:38:9: error: format not a string literal and no format arguments [-Werror=format-security]
>    38 |         ksft_test_result_xfail(buf);
>       |         ^~~~~~~~~~~~~~~~~~~~~~
> lib/setup.c: In function ‘__test_error’:
> lib/setup.c:44:9: error: format not a string literal and no format arguments [-Werror=format-security]
>    44 |         ksft_test_result_error(buf);
>       |         ^~~~~~~~~~~~~~~~~~~~~~
> lib/setup.c: In function ‘__test_skip’:
> lib/setup.c:50:9: error: format not a string literal and no format arguments [-Werror=format-security]
>    50 |         ksft_test_result_skip(buf);
>       |         ^~~~~~~~~~~~~~~~~~~~~
> cc1: some warnings being treated as errors

As the buffer was already pre-printed into, print it as a string
rather than a format-string.

Fixes: cfbab37b3da0 ("selftests/net: Add TCP-AO library")
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agoselftests/tcp_ao: Fix fscanf() call for format-security
Dmitry Safonov [Sat, 13 Apr 2024 01:42:54 +0000 (02:42 +0100)]
selftests/tcp_ao: Fix fscanf() call for format-security

On my new laptop with packages from nixos-unstable, gcc 12.3.0 produces:
> lib/proc.c: In function ‘netstat_read_type’:
> lib/proc.c:89:9: error: format not a string literal and no format arguments [-Werror=format-security]
>    89 |         if (fscanf(fnetstat, type->header_name) == EOF)
>       |         ^~
> cc1: some warnings being treated as errors

Here the selftests lib parses header name, while expectes non-space word
ending with a column.

Fixes: cfbab37b3da0 ("selftests/net: Add TCP-AO library")
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agoselftests/tcp_ao: Zero-init tcp_ao_info_opt
Dmitry Safonov [Sat, 13 Apr 2024 01:42:53 +0000 (02:42 +0100)]
selftests/tcp_ao: Zero-init tcp_ao_info_opt

The structure is on the stack and has to be zero-initialized as
the kernel checks for:
> if (in.reserved != 0 || in.reserved2 != 0)
> return -EINVAL;

Fixes: b26660531cf6 ("selftests/net: Add test for TCP-AO add setsockopt() command")
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agoselftests/tcp_ao: Make RST tests less flaky
Dmitry Safonov [Sat, 13 Apr 2024 01:42:52 +0000 (02:42 +0100)]
selftests/tcp_ao: Make RST tests less flaky

Currently, "active reset" cases are flaky, because select() is called
for 3 sockets, while only 2 are expected to receive RST.
The idea of the third socket was to get into request_sock_queue,
but the test mistakenly attempted to connect() after the listener
socket was shut down.

Repair this test, it's important to check the different kernel
code-paths for signing RST TCP-AO segments.

Fixes: c6df7b2361d7 ("selftests/net: Add TCP-AO RST test")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 weeks agoMerge tag 'nfsd-6.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Linus Torvalds [Mon, 15 Apr 2024 21:09:47 +0000 (14:09 -0700)]
Merge tag 'nfsd-6.9-3' of git://git./linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - Fix a potential tracepoint crash

 - Fix NFSv4 GETATTR on big-endian platforms

* tag 'nfsd-6.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  NFSD: fix endianness issue in nfsd4_encode_fattr4
  SUNRPC: Fix rpcgss_context trace event acceptor field

6 weeks agoMerge tag 'bcachefs-2024-04-15' of https://evilpiepirate.org/git/bcachefs
Linus Torvalds [Mon, 15 Apr 2024 18:01:11 +0000 (11:01 -0700)]
Merge tag 'bcachefs-2024-04-15' of https://evilpiepirate.org/git/bcachefs

Pull yet more bcachefs fixes from Kent Overstreet:
 "This gets recovery working again for the affected user I've been
  working with, and I'm still waiting to hear back on other bug reports
  but should fix it for everyone else who's been having issues with
  recovery.

   - Various recovery fixes:

       - fixes for the btree_insert_entry being resized on path
         allocation btree_path array recently became dynamically
         resizable, and btree_insert_entry along with it; this was being
         observed during journal replay, when write buffer btree updates
         don't use the write buffer and instead use the normal btree
         update path

       - multiple fixes for deadlock in recovery when we need to do lots
         of btree node merges; excessive merges were clocking up the
         whole pipeline

       - write buffer path now correctly does btree node merges when
         needed

       - fix failure to go RW when superblock indicates recovery passes
         needed (i.e. to complete an unfinished upgrade)

   - Various unsafety fixes - test case contributed by a user who had
     two drives out of a six drive array write out a whole bunch of
     garbage after power failure

   - New (tiny) on disk format feature: since it appears the btree node
     scan tool will be a more regular thing (crappy hardware, user
     error) - this adds a 64 bit per-device bitmap of regions that have
     ever had btree nodes.

   - A path->should_be_locked fix, from a larger patch series tightening
     up invariants and assertions around btree transaction and path
     locking state.

     This particular fix prevents us from keeping around btree_paths
     that are no longer needed"

* tag 'bcachefs-2024-04-15' of https://evilpiepirate.org/git/bcachefs: (24 commits)
  bcachefs: set_btree_iter_dontneed also clears should_be_locked
  bcachefs: fix error path of __bch2_read_super()
  bcachefs: Check for backpointer bucket_offset >= bucket size
  bcachefs: bch_member.btree_allocated_bitmap
  bcachefs: sysfs internal/trigger_journal_flush
  bcachefs: Fix bch2_btree_node_fill() for !path
  bcachefs: add safety checks in bch2_btree_node_fill()
  bcachefs: Interior known are required to have known key types
  bcachefs: add missing bounds check in __bch2_bkey_val_invalid()
  bcachefs: Fix btree node merging on write buffer btrees
  bcachefs: Disable merges from interior update path
  bcachefs: Run merges at BCH_WATERMARK_btree
  bcachefs: Fix missing write refs in fs fio paths
  bcachefs: Fix deadlock in journal replay
  bcachefs: Go rw if running any explicit recovery passes
  bcachefs: Standardize helpers for printing enum strs with bounds checks
  bcachefs: don't queue btree nodes for rewrites during scan
  bcachefs: fix race in bch2_btree_node_evict()
  bcachefs: fix unsafety in bch2_stripe_to_text()
  bcachefs: fix unsafety in bch2_extent_ptr_to_text()
  ...

6 weeks agobcachefs: set_btree_iter_dontneed also clears should_be_locked
Kent Overstreet [Sat, 13 Apr 2024 22:02:15 +0000 (18:02 -0400)]
bcachefs: set_btree_iter_dontneed also clears should_be_locked

This is part of a larger series cleaning up the semantics of
should_be_locked and adding assertions around it; if we don't need an
iterator/path anymore, it clearly doesn't need to be locked.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: fix error path of __bch2_read_super()
Chao Yu [Fri, 12 Apr 2024 06:36:38 +0000 (14:36 +0800)]
bcachefs: fix error path of __bch2_read_super()

In __bch2_read_super(), if kstrdup() fails, it needs to release memory
in sb->holder, fix to call bch2_free_super() in the error path.

Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agoMerge tag 'linux_kselftest-fixes-6.9-rc5' of git://git.kernel.org/pub/scm/linux/kerne...
Linus Torvalds [Mon, 15 Apr 2024 17:08:10 +0000 (10:08 -0700)]
Merge tag 'linux_kselftest-fixes-6.9-rc5' of git://git./linux/kernel/git/shuah/linux-kselftest

Pull kselftest fixes from Shuah Khan:
 "A fix to kselftest harness to prevent infinite loop triggered in an
  assert in FIXTURE_TEARDOWN and a fix to a problem seen in being able
  to stop subsystem-enable tests when sched events are being traced"

* tag 'linux_kselftest-fixes-6.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/harness: Prevent infinite loop due to Assert in FIXTURE_TEARDOWN
  selftests/ftrace: Limit length in subsystem-enable tests

6 weeks agodt-bindings: pwm: mediatek,pwm-disp: Document power-domains property
AngeloGioacchino Del Regno [Thu, 4 Apr 2024 08:18:08 +0000 (10:18 +0200)]
dt-bindings: pwm: mediatek,pwm-disp: Document power-domains property

Allow the power-domains property to the PWM_DISP block as on some SoCs
this does need at most one power domain.

Fixes: b09b179bac0a ("dt-bindings: pwm: Convert pwm-mtk-disp.txt to mediatek,pwm-disp.yaml format")
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240404081808.92199-1-angelogioacchino.delregno@collabora.com
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
6 weeks agopwm: dwc: allow suspend/resume for 16 channels
Raag Jadav [Mon, 15 Apr 2024 07:40:51 +0000 (13:10 +0530)]
pwm: dwc: allow suspend/resume for 16 channels

With 16 channel pwm support, we're registering two instances of pwm_chip
with 8 channels each. We need to update PM functions to use both instances
of pwm_chip during power state transitions.

Introduce struct dwc_pwm_drvdata and use it as driver_data, which will
maintain both instances of pwm_chip along with dwc_pwm_info and allow us
to use them inside suspend/resume handles.

Fixes: ebf2c89eb95e ("pwm: dwc: Add 16 channel support for Intel Elkhart Lake")
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20240415074051.14681-1-raag.jadav@intel.com
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
6 weeks agoocteontx2-pf: fix FLOW_DIS_IS_FRAGMENT implementation
Asbjørn Sloth Tønnesen [Fri, 12 Apr 2024 12:02:56 +0000 (12:02 +0000)]
octeontx2-pf: fix FLOW_DIS_IS_FRAGMENT implementation

Upon reviewing the flower control flags handling in
this driver, I notice that the key wasn't being used,
only the mask.

Ie. `tc flower ... ip_flags nofrag` was hardware
offloaded as `... ip_flags frag`.

Only compile tested, no access to HW.

Fixes: c672e3727989 ("octeontx2-pf: Add support to filter packet based on IP fragment")
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 weeks agoinet: bring NLM_DONE out to a separate recv() again
Jakub Kicinski [Thu, 11 Apr 2024 18:02:02 +0000 (11:02 -0700)]
inet: bring NLM_DONE out to a separate recv() again

Commit under Fixes optimized the number of recv() calls
needed during RTM_GETROUTE dumps, but we got multiple
reports of applications hanging on recv() calls.
Applications expect that a route dump will be terminated
with a recv() reading an individual NLM_DONE message.

Coalescing NLM_DONE is perfectly legal in netlink,
but even tho reporters fixed the code in respective
projects, chances are it will take time for those
applications to get updated. So revert to old behavior
(for now)?

Old kernel (5.19):

 $ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
            --dump getroute --json '{"rtm-family": 2}'
 Recv: read 692 bytes, 11 messages
   nl_len = 68 (52) nl_flags = 0x22 nl_type = 24
 ...
   nl_len = 60 (44) nl_flags = 0x22 nl_type = 24
 Recv: read 20 bytes, 1 messages
   nl_len = 20 (4) nl_flags = 0x2 nl_type = 3

Before (6.9-rc2):

 $ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
            --dump getroute --json '{"rtm-family": 2}'
 Recv: read 712 bytes, 12 messages
   nl_len = 68 (52) nl_flags = 0x22 nl_type = 24
 ...
   nl_len = 60 (44) nl_flags = 0x22 nl_type = 24
   nl_len = 20 (4) nl_flags = 0x2 nl_type = 3

After:

 $ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
            --dump getroute --json '{"rtm-family": 2}'
 Recv: read 692 bytes, 11 messages
   nl_len = 68 (52) nl_flags = 0x22 nl_type = 24
 ...
   nl_len = 60 (44) nl_flags = 0x22 nl_type = 24
 Recv: read 20 bytes, 1 messages
   nl_len = 20 (4) nl_flags = 0x2 nl_type = 3

Reported-by: Stefano Brivio <sbrivio@redhat.com>
Link: https://lore.kernel.org/all/20240315124808.033ff58d@elisabeth
Reported-by: Ilya Maximets <i.maximets@ovn.org>
Link: https://lore.kernel.org/all/02b50aae-f0e9-47a4-8365-a977a85975d3@ovn.org
Fixes: 4ce5dc9316de ("inet: switch inet_dump_fib() to RCU protection")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 weeks agobcachefs: Check for backpointer bucket_offset >= bucket size
Kent Overstreet [Sun, 14 Apr 2024 04:51:48 +0000 (00:51 -0400)]
bcachefs: Check for backpointer bucket_offset >= bucket size

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: bch_member.btree_allocated_bitmap
Kent Overstreet [Fri, 12 Apr 2024 22:45:47 +0000 (18:45 -0400)]
bcachefs: bch_member.btree_allocated_bitmap

This adds a small (64 bit) per-device bitmap that tracks ranges that
have btree nodes, for accelerating btree node scan if it is ever needed.

- New helpers, bch2_dev_btree_bitmap_marked() and
  bch2_dev_bitmap_mark(), for checking and updating the bitmap

- Interior btree update path updates the bitmaps when required

- The check_allocations pass has a new fsck_err check,
  btree_bitmap_not_marked

- New on disk format version, mi_btree_mitmap, which indicates the new
  bitmap is present

- Upgrade table lists the required recovery pass and expected fsck error

- Btree node scan uses the bitmap to skip ranges if we're on the new
  version

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 weeks agobcachefs: sysfs internal/trigger_journal_flush
Kent Overstreet [Sun, 14 Apr 2024 02:43:11 +0000 (22:43 -0400)]
bcachefs: sysfs internal/trigger_journal_flush

Add a sysfs knob for immediately flushing the entire journal.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>