sfrench/cifs-2.6.git
5 weeks agoMerge tag 'v6.9-rc3-SMB3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sat, 13 Apr 2024 17:10:18 +0000 (10:10 -0700)]
Merge tag 'v6.9-rc3-SMB3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

 - fix for oops in cifs_get_fattr of deleted files

 - fix for the remote open counter going negative in some directory
   lease cases

 - fix for mkfifo to instantiate dentry to avoid possible crash

 - important fix to allow handling key rotation for mount and remount
   (ie cases that are becoming more common when password that was used
   for the mount will expire soon but will be replaced by new password)

* tag 'v6.9-rc3-SMB3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb3: fix broken reconnect when password changing on the server by allowing password rotation
  smb: client: instantiate when creating SFU files
  smb3: fix Open files on server counter going negative
  smb: client: fix NULL ptr deref in cifs_mark_open_handles_for_deleted_file()

5 weeks agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Fri, 12 Apr 2024 20:08:39 +0000 (13:08 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fix from Catalin Marinas:
 "Fix the TLBI RANGE operand calculation causing live migration under
  KVM/arm64 to miss dirty pages due to stale TLB entries"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: tlb: Fix TLBI RANGE operand

5 weeks agoMerge tag 'soc-fixes-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Linus Torvalds [Fri, 12 Apr 2024 20:02:27 +0000 (13:02 -0700)]
Merge tag 'soc-fixes-6.9-1' of git://git./linux/kernel/git/soc/soc

Pull SoC fixes from Arnd Bergmann:
 "The device tree changes this time are all for NXP i.MX platforms,
  addressing issues with clocks and regulators on i.MX7 and i.MX8.

  The old OMAP2 based Nokia N8x0 tablet get a couple of code fixes for
  regressions that came in.

  The ARM SCMI and FF-A firmware interfaces get a couple of minor bug
  fixes.

  A regression fix for RISC-V cache management addresses a problem with
  probe order on Sifive cores"

* tag 'soc-fixes-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (23 commits)
  MAINTAINERS: Change Krzysztof Kozlowski's email address
  arm64: dts: imx8qm-ss-dma: fix can lpcg indices
  arm64: dts: imx8-ss-dma: fix can lpcg indices
  arm64: dts: imx8-ss-dma: fix adc lpcg indices
  arm64: dts: imx8-ss-dma: fix pwm lpcg indices
  arm64: dts: imx8-ss-dma: fix spi lpcg indices
  arm64: dts: imx8-ss-conn: fix usb lpcg indices
  arm64: dts: imx8-ss-lsio: fix pwm lpcg indices
  ARM: dts: imx7s-warp: Pass OV2680 link-frequencies
  ARM: dts: imx7-mba7: Use 'no-mmc' property
  arm64: dts: imx8-ss-conn: fix usdhc wrong lpcg clock order
  arm64: dts: freescale: imx8mp-venice-gw73xx-2x: fix USB vbus regulator
  arm64: dts: freescale: imx8mp-venice-gw72xx-2x: fix USB vbus regulator
  cache: sifive_ccache: Partially convert to a platform driver
  firmware: arm_scmi: Make raw debugfs entries non-seekable
  firmware: arm_scmi: Fix wrong fastchannel initialization
  firmware: arm_ffa: Fix the partition ID check in ffa_notification_info_get()
  ARM: OMAP2+: fix USB regression on Nokia N8x0
  mmc: omap: restore original power up/down steps
  mmc: omap: fix deferred probe
  ...

5 weeks agoMerge tag 'iommu-fixes-v6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 12 Apr 2024 19:56:19 +0000 (12:56 -0700)]
Merge tag 'iommu-fixes-v6.9-rc3' of git://git./linux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:

 - Intel VT-d Fixes:
     - Allocate local memory for PRQ page
     - Fix WARN_ON in iommu probe path
     - Fix wrong use of pasid config

 - AMD IOMMU Fixes:
     - Lock inversion fix
     - Log message severity fix
     - Disable SNP when v2 page-tables are used

 - Mediatek driver:
     - Fix module autoloading

* tag 'iommu-fixes-v6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/amd: Change log message severity
  iommu/vt-d: Fix WARN_ON in iommu probe path
  iommu/vt-d: Allocate local memory for page request queue
  iommu/vt-d: Fix wrong use of pasid config
  iommu: mtk: fix module autoloading
  iommu/amd: Do not enable SNP when V2 page table is enabled
  iommu/amd: Fix possible irq lock inversion dependency issue

5 weeks agoMerge tag 'pci-v6.9-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Linus Torvalds [Fri, 12 Apr 2024 19:47:48 +0000 (12:47 -0700)]
Merge tag 'pci-v6.9-fixes-1' of git://git./linux/kernel/git/pci/pci

Pull pci fixes from Bjorn Helgaas:

 - Revert a quirk that prevented Secondary Bus Reset for LSI / Agere
   FW643.

   We thought the device was broken, but the reset does work correctly
   on other platforms, and the reset avoids leaking data out of VMs
   (Bjorn Helgaas)

 - Update MAINTAINERS to reflect that Gustavo Pimentel is no longer
   reachable (Manivannan Sadhasivam)

* tag 'pci-v6.9-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  Revert "PCI: Mark LSI FW643 to avoid bus reset"
  MAINTAINERS: Drop Gustavo Pimentel as PCI DWC Maintainer

5 weeks agoMerge tag 'block-6.9-20240412' of git://git.kernel.dk/linux
Linus Torvalds [Fri, 12 Apr 2024 17:22:33 +0000 (10:22 -0700)]
Merge tag 'block-6.9-20240412' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

 - MD pull request via Song:
       - UAF fix (Yu)

 - Avoid out-of-bounds shift in blk-iocost (Rik)

 - Fix for q->blkg_list corruption (Ming)

 - Relax virt boundary mask/size segment checking (Ming)

* tag 'block-6.9-20240412' of git://git.kernel.dk/linux:
  block: fix that blk_time_get_ns() doesn't update time after schedule
  block: allow device to have both virt_boundary_mask and max segment size
  block: fix q->blkg_list corruption during disk rebind
  blk-iocost: avoid out of bounds shift
  raid1: fix use-after-free for original bio in raid1_write_request()

5 weeks agoMerge tag 'io_uring-6.9-20240412' of git://git.kernel.dk/linux
Linus Torvalds [Fri, 12 Apr 2024 17:19:36 +0000 (10:19 -0700)]
Merge tag 'io_uring-6.9-20240412' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:

 - Fix for sigmask restoring while waiting for events (Alexey)

 - Typo fix in comment (Haiyue)

 - Fix for a msg_control retstore on SEND_ZC retries (Pavel)

* tag 'io_uring-6.9-20240412' of git://git.kernel.dk/linux:
  io-uring: correct typo in comment for IOU_F_TWQ_LAZY_WAKE
  io_uring/net: restore msg_control on sendzc retry
  io_uring: Fix io_cqring_wait() not restoring sigmask on get_timespec64() failure

5 weeks agoMerge tag 'ceph-for-6.9-rc4' of https://github.com/ceph/ceph-client
Linus Torvalds [Fri, 12 Apr 2024 17:15:46 +0000 (10:15 -0700)]
Merge tag 'ceph-for-6.9-rc4' of https://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "Two CephFS fixes marked for stable and a MAINTAINERS update"

* tag 'ceph-for-6.9-rc4' of https://github.com/ceph/ceph-client:
  MAINTAINERS: remove myself as a Reviewer for Ceph
  ceph: switch to use cap_delay_lock for the unlink delay list
  ceph: redirty page before returning AOP_WRITEPAGE_ACTIVATE

5 weeks agoKconfig: add some hidden tabs on purpose
Linus Torvalds [Fri, 12 Apr 2024 17:05:10 +0000 (10:05 -0700)]
Kconfig: add some hidden tabs on purpose

Commit d96c36004e31 ("tracing: Fix FTRACE_RECORD_RECURSION_SIZE Kconfig
entry") removed a hidden tab because it apparently showed breakage in
some third-party kernel config parsing tool.

It wasn't clear what tool it was, but let's make sure it gets fixed.
Because if you can't parse tabs as whitespace, you should not be parsing
the kernel Kconfig files.

In fact, let's make such breakage more obvious than some esoteric ftrace
record size option.  If you can't parse tabs, you can't have page sizes.

Yes, tab-vs-space confusion is sadly a traditional Unix thing, and
'make' is famous for being broken in this regard.  But no, that does not
mean that it's ok.

I'd add more random tabs to our Kconfig files, but I don't want to make
things uglier than necessary.  But it *might* bbe necessary if it turns
out we see more of this kind of silly tooling.

Fixes: d96c36004e31 ("tracing: Fix FTRACE_RECORD_RECURSION_SIZE Kconfig entry")
Link: https://lore.kernel.org/lkml/CAHk-=wj-hLLN_t_m5OL4dXLaxvXKy_axuoJYXif7iczbfgAevQ@mail.gmail.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 weeks agoMerge tag 'trace-v6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace...
Linus Torvalds [Fri, 12 Apr 2024 16:02:24 +0000 (09:02 -0700)]
Merge tag 'trace-v6.9-rc3' of git://git./linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix the buffer_percent accounting as it is dependent on three
   variables:

     1) pages_read - number of subbuffers read
     2) pages_lost - number of subbuffers lost due to overwrite
     3) pages_touched - number of pages that a writer entered

   These three counters only increment, and to know how many active
   pages there are on the buffer at any given time, the pages_read and
   pages_lost are subtracted from pages_touched.

   But the pages touched was incremented whenever any writer went to the
   next subbuffer even if it wasn't the only one, so it was incremented
   more than it should be causing the counter for how many subbuffers
   currently have content incorrect, which caused the buffer_percent
   that holds waiters until the ring buffer is filled to a given
   percentage to wake up early.

 - Fix warning of unused functions when PERF_EVENTS is not configured in

 - Replace bad tab with space in Kconfig for FTRACE_RECORD_RECURSION_SIZE

 - Fix to some kerneldoc function comments in eventfs code.

* tag 'trace-v6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ring-buffer: Only update pages_touched when a new page is touched
  tracing: hide unused ftrace_event_id_fops
  tracing: Fix FTRACE_RECORD_RECURSION_SIZE Kconfig entry
  eventfs: Fix kernel-doc comments to functions

5 weeks agoMerge tag 'mips-fixes_6.9_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips...
Linus Torvalds [Fri, 12 Apr 2024 15:46:58 +0000 (08:46 -0700)]
Merge tag 'mips-fixes_6.9_1' of git://git./linux/kernel/git/mips/linux

Pull MIPS fix from Thomas Bogendoerfer:
 "Fix for syscall_get_nr() to make it work even if tracing is disabled"

* tag 'mips-fixes_6.9_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: scall: Save thread_info.syscall unconditionally on entry

5 weeks agoMerge tag 'drm-fixes-2024-04-12' of https://gitlab.freedesktop.org/drm/kernel
Linus Torvalds [Fri, 12 Apr 2024 15:27:09 +0000 (08:27 -0700)]
Merge tag 'drm-fixes-2024-04-12' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
 "Looks like everyone woke up after holidays, this weeks pull has a
  bunch of stuff all over, 2 weeks worth of amdgpu is a lot of it, then
  i915/xe have a few, a bunch of msm fixes, then some scattered driver
  fixes.

  I expect things will settle down for rc5.

  client:
   - Protect connector modes with mode_config mutex

  ast:
   - Fix soft lockup

  host1x:
   - Do not setup DMA for virtual addresses

  ivpu:
   - Fix deadlock in context_xa
   - PCI fixes
   - Fixes to error handling

  nouveau:
   - gsp: Fix OOB access
   - Fix casting

  panfrost:
   - Fix error path in MMU code

  qxl:
   - Revert "drm/qxl: simplify qxl_fence_wait"

  vmwgfx:
   - Enable DMA for SEV mappings

  i915:
   - Couple CDCLK programming fixes
   - HDCP related fix
   - 4 Bigjoiner related fixes
   - Fix for a circular locking around GuC on reset+wedged case

  xe:
   - Fix double display mutex initializations
   - Fix u32 -> u64 implicit conversions
   - Fix RING_CONTEXT_CONTROL not marked as masked

  msm:
   - DP refcount leak fix on disconnect
   - Add missing newlines to prints in msm_fb and msm_kms
   - fix dpu debugfs entry permissions
   - Fix the interface table for the catalog of X1E80100
   - fix irq message printing
   - Bindings fix to add DP node as child of mdss for mdss node
   - Minor typo fix in DP driver API which handles port status change
   - fix CHRASHDUMP_READ()
   - fix HHB (highest bank bit) for a619 to fix UBWC corruption

  amdgpu:
   - GPU reset fixes
   - Fix some confusing logging
   - UMSCH fix
   - Aborted suspend fix
   - DCN 3.5 fixes
   - S4 fix
   - MES logging fixes
   - SMU 14 fixes
   - SDMA 4.4.2 fix
   - KASAN fix
   - SMU 13.0.10 fix
   - VCN partition fix
   - GFX11 fixes
   - DWB fixes
   - Plane handling fix
   - FAMS fix
   - DCN 3.1.6 fix
   - VSC SDP fixes
   - OLED panel fix
   - GFX 11.5 fix

  amdkfd:
   - GPU reset fixes
   - fix ioctl integer overflow"

* tag 'drm-fixes-2024-04-12' of https://gitlab.freedesktop.org/drm/kernel: (65 commits)
  amdkfd: use calloc instead of kzalloc to avoid integer overflow
  drm/xe: Label RING_CONTEXT_CONTROL as masked
  drm/xe/xe_migrate: Cast to output precision before multiplying operands
  drm/xe/hwmon: Cast result to output precision on left shift of operand
  drm/xe/display: Fix double mutex initialization
  drm/amdgpu: differentiate external rev id for gfx 11.5.0
  drm/amd/display: Adjust dprefclk by down spread percentage.
  drm/amd/display: Set VSC SDP Colorimetry same way for MST and SST
  drm/amd/display: Program VSC SDP colorimetry for all DP sinks >= 1.4
  drm/amd/display: fix disable otg wa logic in DCN316
  drm/amd/display: Do not recursively call manual trigger programming
  drm/amd/display: always reset ODM mode in context when adding first plane
  drm/amdgpu: fix incorrect number of active RBs for gfx11
  drm/amd/display: Return max resolution supported by DWB
  amd/amdkfd: sync all devices to wait all processes being evicted
  drm/amdgpu: clear set_q_mode_offs when VM changed
  drm/amdgpu: Fix VCN allocation in CPX partition
  drm/amd/pm: fix the high voltage issue after unload
  drm/amd/display: Skip on writeback when it's not applicable
  drm/amdgpu: implement IRQ_STATE_ENABLE for SDMA v4.4.2
  ...

5 weeks agoblock: fix that blk_time_get_ns() doesn't update time after schedule
Yu Kuai [Thu, 11 Apr 2024 03:23:48 +0000 (11:23 +0800)]
block: fix that blk_time_get_ns() doesn't update time after schedule

While monitoring the throttle time of IO from iocost, it's found that
such time is always zero after the io_schedule() from ioc_rqos_throttle,
for example, with the following debug patch:

+       printk("%s-%d: %s enter %llu\n", current->comm, current->pid, __func__, blk_time_get_ns());
        while (true) {
                set_current_state(TASK_UNINTERRUPTIBLE);
                if (wait.committed)
                        break;
                io_schedule();
        }
+       printk("%s-%d: %s exit  %llu\n", current->comm, current->pid, __func__, blk_time_get_ns());

It can be observerd that blk_time_get_ns() always return the same time:

[ 1068.096579] fio-1268: ioc_rqos_throttle enter 1067901962288
[ 1068.272587] fio-1268: ioc_rqos_throttle exit  1067901962288
[ 1068.274389] fio-1268: ioc_rqos_throttle enter 1067901962288
[ 1068.472690] fio-1268: ioc_rqos_throttle exit  1067901962288
[ 1068.474485] fio-1268: ioc_rqos_throttle enter 1067901962288
[ 1068.672656] fio-1268: ioc_rqos_throttle exit  1067901962288
[ 1068.674451] fio-1268: ioc_rqos_throttle enter 1067901962288
[ 1068.872655] fio-1268: ioc_rqos_throttle exit  1067901962288

And I think the root cause is that 'PF_BLOCK_TS' is always cleared
by blk_flush_plug() before scheduel(), hence blk_plug_invalidate_ts()
will never be called:

blk_time_get_ns
 plug->cur_ktime = ktime_get_ns();
 current->flags |= PF_BLOCK_TS;

io_schedule:
 io_schedule_prepare
  blk_flush_plug
   __blk_flush_plug
    /* the flag is cleared, while time is not */
    current->flags &= ~PF_BLOCK_TS;
 schedule
 sched_update_worker
  /* the flag is not set, hence plug->cur_ktime is not cleared */
  if (tsk->flags & PF_BLOCK_TS)
   blk_plug_invalidate_ts()

blk_time_get_ns
 /* got the time stashed before schedule */
 return plug->cur_ktime;

Fix the problem by clearing cached time in __blk_flush_plug().

Fixes: 06b23f92af87 ("block: update cached timestamp post schedule/preemption")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240411032349.3051233-2-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
5 weeks agoiommu/amd: Change log message severity
Vasant Hegde [Wed, 10 Apr 2024 10:16:43 +0000 (10:16 +0000)]
iommu/amd: Change log message severity

Use consistent log severity (pr_warn) to log all messages in SNP
enable path.

Suggested-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20240410101643.32309-1-vasant.hegde@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
5 weeks agoiommu/vt-d: Fix WARN_ON in iommu probe path
Lu Baolu [Thu, 11 Apr 2024 03:07:44 +0000 (11:07 +0800)]
iommu/vt-d: Fix WARN_ON in iommu probe path

Commit 1a75cc710b95 ("iommu/vt-d: Use rbtree to track iommu probed
devices") adds all devices probed by the iommu driver in a rbtree
indexed by the source ID of each device. It assumes that each device
has a unique source ID. This assumption is incorrect and the VT-d
spec doesn't state this requirement either.

The reason for using a rbtree to track devices is to look up the device
with PCI bus and devfunc in the paths of handling ATS invalidation time
out error and the PRI I/O page faults. Both are PCI ATS feature related.

Only track the devices that have PCI ATS capabilities in the rbtree to
avoid unnecessary WARN_ON in the iommu probe path. Otherwise, on some
platforms below kernel splat will be displayed and the iommu probe results
in failure.

 WARNING: CPU: 3 PID: 166 at drivers/iommu/intel/iommu.c:158 intel_iommu_probe_device+0x319/0xd90
 Call Trace:
  <TASK>
  ? __warn+0x7e/0x180
  ? intel_iommu_probe_device+0x319/0xd90
  ? report_bug+0x1f8/0x200
  ? handle_bug+0x3c/0x70
  ? exc_invalid_op+0x18/0x70
  ? asm_exc_invalid_op+0x1a/0x20
  ? intel_iommu_probe_device+0x319/0xd90
  ? debug_mutex_init+0x37/0x50
  __iommu_probe_device+0xf2/0x4f0
  iommu_probe_device+0x22/0x70
  iommu_bus_notifier+0x1e/0x40
  notifier_call_chain+0x46/0x150
  blocking_notifier_call_chain+0x42/0x60
  bus_notify+0x2f/0x50
  device_add+0x5ed/0x7e0
  platform_device_add+0xf5/0x240
  mfd_add_devices+0x3f9/0x500
  ? preempt_count_add+0x4c/0xa0
  ? up_write+0xa2/0x1b0
  ? __debugfs_create_file+0xe3/0x150
  intel_lpss_probe+0x49f/0x5b0
  ? pci_conf1_write+0xa3/0xf0
  intel_lpss_pci_probe+0xcf/0x110 [intel_lpss_pci]
  pci_device_probe+0x95/0x120
  really_probe+0xd9/0x370
  ? __pfx___driver_attach+0x10/0x10
  __driver_probe_device+0x73/0x150
  driver_probe_device+0x19/0xa0
  __driver_attach+0xb6/0x180
  ? __pfx___driver_attach+0x10/0x10
  bus_for_each_dev+0x77/0xd0
  bus_add_driver+0x114/0x210
  driver_register+0x5b/0x110
  ? __pfx_intel_lpss_pci_driver_init+0x10/0x10 [intel_lpss_pci]
  do_one_initcall+0x57/0x2b0
  ? kmalloc_trace+0x21e/0x280
  ? do_init_module+0x1e/0x210
  do_init_module+0x5f/0x210
  load_module+0x1d37/0x1fc0
  ? init_module_from_file+0x86/0xd0
  init_module_from_file+0x86/0xd0
  idempotent_init_module+0x17c/0x230
  __x64_sys_finit_module+0x56/0xb0
  do_syscall_64+0x6e/0x140
  entry_SYSCALL_64_after_hwframe+0x71/0x79

Fixes: 1a75cc710b95 ("iommu/vt-d: Use rbtree to track iommu probed devices")
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10689
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20240407011429.136282-1-baolu.lu@linux.intel.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
5 weeks agoiommu/vt-d: Allocate local memory for page request queue
Jacob Pan [Thu, 11 Apr 2024 03:07:43 +0000 (11:07 +0800)]
iommu/vt-d: Allocate local memory for page request queue

The page request queue is per IOMMU, its allocation should be made
NUMA-aware for performance reasons.

Fixes: a222a7f0bb6c ("iommu/vt-d: Implement page request handling")
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20240403214007.985600-1-jacob.jun.pan@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
5 weeks agoiommu/vt-d: Fix wrong use of pasid config
Xuchun Shang [Thu, 11 Apr 2024 03:07:42 +0000 (11:07 +0800)]
iommu/vt-d: Fix wrong use of pasid config

The commit "iommu/vt-d: Add IOMMU perfmon support" introduce IOMMU
PMU feature, but use the wrong config when set pasid filter.

Fixes: 7232ab8b89e9 ("iommu/vt-d: Add IOMMU perfmon support")
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20240401060753.3321318-1-xuchun.shang@linux.alibaba.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
5 weeks agoiommu: mtk: fix module autoloading
Krzysztof Kozlowski [Wed, 10 Apr 2024 16:41:09 +0000 (18:41 +0200)]
iommu: mtk: fix module autoloading

Add MODULE_DEVICE_TABLE(), so modules could be properly autoloaded
based on the alias from of_device_id table.

Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://lore.kernel.org/r/20240410164109.233308-1-krzk@kernel.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
5 weeks agoiommu/amd: Do not enable SNP when V2 page table is enabled
Vasant Hegde [Wed, 10 Apr 2024 08:57:02 +0000 (08:57 +0000)]
iommu/amd: Do not enable SNP when V2 page table is enabled

DTE[Mode]=0 is not supported when SNP is enabled in the host. That means
to support SNP, IOMMU must be configured with V1 page table (See IOMMU
spec [1] for the details). If user passes kernel command line to configure
IOMMU domains with v2 page table (amd_iommu=pgtbl_v2) then disable SNP
as the user asked by not forcing the page table to v1.

[1] https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/specifications/48882_IOMMU.pdf

Cc: Ashish Kalra <ashish.kalra@amd.com>
Cc: Michael Roth <michael.roth@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20240410085702.31869-1-vasant.hegde@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
5 weeks agoiommu/amd: Fix possible irq lock inversion dependency issue
Vasant Hegde [Thu, 4 Apr 2024 10:27:17 +0000 (10:27 +0000)]
iommu/amd: Fix possible irq lock inversion dependency issue

LOCKDEP detector reported below warning:
----------------------------------------
[   23.796949] ========================================================
[   23.796950] WARNING: possible irq lock inversion dependency detected
[   23.796952] 6.8.0fix+ #811 Not tainted
[   23.796954] --------------------------------------------------------
[   23.796954] kworker/0:1/8 just changed the state of lock:
[   23.796956] ff365325e084a9b8 (&domain->lock){..-.}-{3:3}, at: amd_iommu_flush_iotlb_all+0x1f/0x50
[   23.796969] but this lock took another, SOFTIRQ-unsafe lock in the past:
[   23.796970]  (pd_bitmap_lock){+.+.}-{3:3}
[   23.796972]

               and interrupts could create inverse lock ordering between them.

[   23.796973]
               other info that might help us debug this:
[   23.796974] Chain exists of:
                 &domain->lock --> &dev_data->lock --> pd_bitmap_lock

[   23.796980]  Possible interrupt unsafe locking scenario:

[   23.796981]        CPU0                    CPU1
[   23.796982]        ----                    ----
[   23.796983]   lock(pd_bitmap_lock);
[   23.796985]                                local_irq_disable();
[   23.796985]                                lock(&domain->lock);
[   23.796988]                                lock(&dev_data->lock);
[   23.796990]   <Interrupt>
[   23.796991]     lock(&domain->lock);

Fix this issue by disabling interrupt when acquiring pd_bitmap_lock.

Note that this is temporary fix. We have a plan to replace custom bitmap
allocator with IDA allocator.

Fixes: 87a6f1f22c97 ("iommu/amd: Introduce per-device domain ID to fix potential TLB aliasing issue")
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20240404102717.6705-1-vasant.hegde@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
5 weeks agoamdkfd: use calloc instead of kzalloc to avoid integer overflow
Dave Airlie [Thu, 11 Apr 2024 20:11:25 +0000 (06:11 +1000)]
amdkfd: use calloc instead of kzalloc to avoid integer overflow

This uses calloc instead of doing the multiplication which might
overflow.

Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
5 weeks agoMerge tag 'drm-msm-next-2024-04-11' of https://gitlab.freedesktop.org/drm/msm into...
Dave Airlie [Fri, 12 Apr 2024 01:01:44 +0000 (11:01 +1000)]
Merge tag 'drm-msm-next-2024-04-11' of https://gitlab.freedesktop.org/drm/msm into drm-fixes

Fixes for v6.9

Display:
- Fixes for PM refcount leak when DP goes to disconnected state and
  also when link training fails. This is also one of the issues found
  with the pm runtime series
- Add missing newlines to prints in msm_fb and msm_kms
- Change permissions of some dpu debugfs entries which write to const
  data from catalog to read-only to avoid protection faults
- Fix the interface table for the catalog of X1E80100. This is an
  important fix to bringup DP for X1E80100.
- Logging fix to print the callback symbol in the invalid IRQ message
  case rather than printing when its known to be NULL.
- Bindings fix to add DP node as child of mdss for mdss node
- Minor typo fix in DP driver API which handles port status change

GPU:
- fix CHRASHDUMP_READ()
- fix HHB (highest bank bit) for a619 to fix UBWC corruption

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGvFwRUcHGWva7oDeydq1PTiZMduuykCD2MWaFrT4iMGZA@mail.gmail.com
5 weeks agoMerge tag 'cxl-fixes-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Linus Torvalds [Thu, 11 Apr 2024 23:49:11 +0000 (16:49 -0700)]
Merge tag 'cxl-fixes-6.9-rc4' of git://git./linux/kernel/git/cxl/cxl

Pull cxl fixes from Dave Jiang:

 - Fix index of Clear Event Record handles in cxl_clear_event_record()

 - Fix use before init of map->reg_type in cxl_decode_regblock()

 - Fix initialization of mbox_cmd.size_out in cxl_mem_get_records_log()

 - Fix CXL path access_coordinate computation:
     - Remove unneded check of iter in loop
     - Fix of retrieving of access_coordinate in PCI topology walk
     - Fix of incorrect region access_coordinate data calculation
     - Consolidate of access_coordinates attached to downstream port
       context
     - Add check to validate access_coordinate validity to prevent
       incorrect data being exposed via sysfs

* tag 'cxl-fixes-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
  cxl: Add checks to access_coordinate calculation to fail missing data
  cxl: Consolidate dport access_coordinate ->hb_coord and ->sw_coord into ->coord
  cxl: Fix incorrect region perf data calculation
  cxl: Fix retrieving of access_coordinates in PCIe path
  cxl: Remove checking of iter in cxl_endpoint_get_perf_coordinates()
  cxl/core: Fix initialization of mbox_cmd.size_out in get event
  cxl/core/regs: Fix usage of map->reg_type in cxl_decode_regblock() before assigned
  cxl/mem: Fix for the index of Clear Event Record Handle

5 weeks agoMerge tag 'hyperv-fixes-signed-20240411' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 11 Apr 2024 23:23:56 +0000 (16:23 -0700)]
Merge tag 'hyperv-fixes-signed-20240411' of git://git./linux/kernel/git/hyperv/linux

Pull hyperv fixes from Wei Liu:

 - Some cosmetic changes (Erni Sri Satya Vennela, Li Zhijian)

 - Introduce hv_numa_node_to_pxm_info() (Nuno Das Neves)

 - Fix KVP daemon to handle IPv4 and IPv6 combination for keyfile format
   (Shradha Gupta)

 - Avoid freeing decrypted memory in a confidential VM (Rick Edgecombe
   and Michael Kelley)

* tag 'hyperv-fixes-signed-20240411' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  Drivers: hv: vmbus: Don't free ring buffers that couldn't be re-encrypted
  uio_hv_generic: Don't free decrypted memory
  hv_netvsc: Don't free decrypted memory
  Drivers: hv: vmbus: Track decrypted status in vmbus_gpadl
  Drivers: hv: vmbus: Leak pages if set_memory_encrypted() fails
  hv/hv_kvp_daemon: Handle IPv4 and Ipv6 combination for keyfile format
  hv: vmbus: Convert sprintf() family to sysfs_emit() family
  mshyperv: Introduce hv_numa_node_to_pxm_info()
  x86/hyperv: Cosmetic changes for hv_apic.c

5 weeks agoring-buffer: Only update pages_touched when a new page is touched
Steven Rostedt (Google) [Tue, 9 Apr 2024 19:13:09 +0000 (15:13 -0400)]
ring-buffer: Only update pages_touched when a new page is touched

The "buffer_percent" logic that is used by the ring buffer splice code to
only wake up the tasks when there's no data after the buffer is filled to
the percentage of the "buffer_percent" file is dependent on three
variables that determine the amount of data that is in the ring buffer:

 1) pages_read - incremented whenever a new sub-buffer is consumed
 2) pages_lost - incremented every time a writer overwrites a sub-buffer
 3) pages_touched - incremented when a write goes to a new sub-buffer

The percentage is the calculation of:

  (pages_touched - (pages_lost + pages_read)) / nr_pages

Basically, the amount of data is the total number of sub-bufs that have been
touched, minus the number of sub-bufs lost and sub-bufs consumed. This is
divided by the total count to give the buffer percentage. When the
percentage is greater than the value in the "buffer_percent" file, it
wakes up splice readers waiting for that amount.

It was observed that over time, the amount read from the splice was
constantly decreasing the longer the trace was running. That is, if one
asked for 60%, it would read over 60% when it first starts tracing, but
then it would be woken up at under 60% and would slowly decrease the
amount of data read after being woken up, where the amount becomes much
less than the buffer percent.

This was due to an accounting of the pages_touched incrementation. This
value is incremented whenever a writer transfers to a new sub-buffer. But
the place where it was incremented was incorrect. If a writer overflowed
the current sub-buffer it would go to the next one. If it gets preempted
by an interrupt at that time, and the interrupt performs a trace, it too
will end up going to the next sub-buffer. But only one should increment
the counter. Unfortunately, that was not the case.

Change the cmpxchg() that does the real switch of the tail-page into a
try_cmpxchg(), and on success, perform the increment of pages_touched. This
will only increment the counter once for when the writer moves to a new
sub-buffer, and not when there's a race and is incremented for when a
writer and its preempting writer both move to the same new sub-buffer.

Link: https://lore.kernel.org/linux-trace-kernel/20240409151309.0d0e5056@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: 2c2b0a78b3739 ("ring-buffer: Add percentage of ring buffer full to wake up reader")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
5 weeks agotracing: hide unused ftrace_event_id_fops
Arnd Bergmann [Wed, 3 Apr 2024 08:06:24 +0000 (10:06 +0200)]
tracing: hide unused ftrace_event_id_fops

When CONFIG_PERF_EVENTS, a 'make W=1' build produces a warning about the
unused ftrace_event_id_fops variable:

kernel/trace/trace_events.c:2155:37: error: 'ftrace_event_id_fops' defined but not used [-Werror=unused-const-variable=]
 2155 | static const struct file_operations ftrace_event_id_fops = {

Hide this in the same #ifdef as the reference to it.

Link: https://lore.kernel.org/linux-trace-kernel/20240403080702.3509288-7-arnd@kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Zheng Yejian <zhengyejian1@huawei.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ajay Kaher <akaher@vmware.com>
Cc: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Clément Léger <cleger@rivosinc.com>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Cc: "Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com>
Fixes: 620a30e97feb ("tracing: Don't pass file_operations array to event_create_dir()")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
5 weeks agotracing: Fix FTRACE_RECORD_RECURSION_SIZE Kconfig entry
Prasad Pandit [Fri, 22 Mar 2024 12:18:01 +0000 (17:48 +0530)]
tracing: Fix FTRACE_RECORD_RECURSION_SIZE Kconfig entry

Fix FTRACE_RECORD_RECURSION_SIZE entry, replace tab with
a space character. It helps Kconfig parsers to read file
without error.

Link: https://lore.kernel.org/linux-trace-kernel/20240322121801.1803948-1-ppandit@redhat.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: 773c16705058 ("ftrace: Add recording of functions that caused recursion")
Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
5 weeks agoeventfs: Fix kernel-doc comments to functions
Yang Li [Fri, 22 Mar 2024 06:26:04 +0000 (14:26 +0800)]
eventfs: Fix kernel-doc comments to functions

This commit fix kernel-doc style comments with complete parameter
descriptions for the lookup_file(),lookup_dir_entry() and
lookup_file_dentry().

Link: https://lore.kernel.org/linux-trace-kernel/20240322062604.28862-1-yang.lee@linux.alibaba.com
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
5 weeks agosmb3: fix broken reconnect when password changing on the server by allowing password... v6.9-rc3-SMB3-client-fixes
Steve French [Thu, 4 Apr 2024 23:06:56 +0000 (18:06 -0500)]
smb3: fix broken reconnect when password changing on the server by allowing password rotation

There are various use cases that are becoming more common in which password
changes are scheduled on a server(s) periodically but the clients connected
to this server need to stay connected (even in the face of brief network
reconnects) due to mounts which can not be easily unmounted and mounted at
will, and servers that do password rotation do not always have the ability
to tell the clients exactly when to the new password will be effective,
so add support for an alt password ("password2=") on mount (and also
remount) so that we can anticipate the upcoming change to the server
without risking breaking existing mounts.

An alternative would have been to use the kernel keyring for this but the
processes doing the reconnect do not have access to the keyring but do
have access to the ses structure.

Reviewed-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
5 weeks agosmb: client: instantiate when creating SFU files
Paulo Alcantara [Tue, 9 Apr 2024 14:28:59 +0000 (11:28 -0300)]
smb: client: instantiate when creating SFU files

In cifs_sfu_make_node(), on success, instantiate rather than leave it
with dentry unhashed negative to support callers that expect mknod(2)
to always instantiate.

This fixes the following test case:

  mount.cifs //srv/share /mnt -o ...,sfu
  mkfifo /mnt/fifo
  ./xfstests/ltp/growfiles -b -W test -e 1 -u -i 0 -L 30 /mnt/fifo
  ...
  BUG: unable to handle page fault for address: 000000034cec4e58
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 1 PREEMPT SMP PTI
  CPU: 0 PID: 138098 Comm: growfiles Kdump: loaded Not tainted
  5.14.0-436.3987_1240945149.el9.x86_64 #1
  Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
  RIP: 0010:_raw_callee_save__kvm_vcpu_is_preempted+0x0/0x20
  Code: e8 15 d9 61 00 e9 63 ff ff ff 41 bd ea ff ff ff e9 58 ff ff ff e8
  d0 71 c0 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <48> 8b 04
  fd 60 2b c1 99 80 b8 90 50 03 00 00 0f 95 c0 c3 cc cc cc
  RSP: 0018:ffffb6a143cf7cf8 EFLAGS: 00010206
  RAX: ffff8a9bc30fb038 RBX: ffff8a9bc666a200 RCX: ffff8a9cc0260000
  RDX: 00000000736f622e RSI: ffff8a9bc30fb038 RDI: 000000007665645f
  RBP: ffffb6a143cf7d70 R08: 0000000000001000 R09: 0000000000000001
  R10: 0000000000000001 R11: 0000000000000000 R12: ffff8a9bc666a200
  R13: 0000559a302a12b0 R14: 0000000000001000 R15: 0000000000000000
  FS: 00007fbed1dbb740(0000) GS:ffff8a9cf0000000(0000)
  knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 000000034cec4e58 CR3: 0000000128ec6006 CR4: 0000000000770ef0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  PKRU: 55555554
  Call Trace:
   <TASK>
   ? show_trace_log_lvl+0x1c4/0x2df
   ? show_trace_log_lvl+0x1c4/0x2df
   ? __mutex_lock.constprop.0+0x5f7/0x6a0
   ? __die_body.cold+0x8/0xd
   ? page_fault_oops+0x134/0x170
   ? exc_page_fault+0x62/0x150
   ? asm_exc_page_fault+0x22/0x30
   ? _pfx_raw_callee_save__kvm_vcpu_is_preempted+0x10/0x10
   __mutex_lock.constprop.0+0x5f7/0x6a0
   ? __mod_memcg_lruvec_state+0x84/0xd0
   pipe_write+0x47/0x650
   ? do_anonymous_page+0x258/0x410
   ? inode_security+0x22/0x60
   ? selinux_file_permission+0x108/0x150
   vfs_write+0x2cb/0x410
   ksys_write+0x5f/0xe0
   do_syscall_64+0x5c/0xf0
   ? syscall_exit_to_user_mode+0x22/0x40
   ? do_syscall_64+0x6b/0xf0
   ? sched_clock_cpu+0x9/0xc0
   ? exc_page_fault+0x62/0x150
   entry_SYSCALL_64_after_hwframe+0x6e/0x76

Cc: stable@vger.kernel.org
Fixes: 72bc63f5e23a ("smb3: fix creating FIFOs when mounting with "sfu" mount option")
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
5 weeks agosmb3: fix Open files on server counter going negative
Steve French [Sun, 7 Apr 2024 04:16:08 +0000 (23:16 -0500)]
smb3: fix Open files on server counter going negative

We were decrementing the count of open files on server twice
for the case where we were closing cached directories.

Fixes: 8e843bf38f7b ("cifs: return a single-use cfid if we did not get a lease")
Cc: stable@vger.kernel.org
Acked-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
5 weeks agoMAINTAINERS: remove myself as a Reviewer for Ceph
Jeff Layton [Tue, 9 Apr 2024 11:01:57 +0000 (07:01 -0400)]
MAINTAINERS: remove myself as a Reviewer for Ceph

It has been a couple of years since I stepped down as CephFS maintainer.
I'm not involved in any meaningful way with the project these days, so
while I'm happy to help review the occasional patch, I don't need to be
cc'ed on all of them.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
5 weeks agoceph: switch to use cap_delay_lock for the unlink delay list
Xiubo Li [Tue, 9 Apr 2024 00:56:03 +0000 (08:56 +0800)]
ceph: switch to use cap_delay_lock for the unlink delay list

The same list item will be used in both cap_delay_list and
cap_unlink_delay_list, so it's buggy to use two different locks
to protect them.

Cc: stable@vger.kernel.org
Fixes: dbc347ef7f0c ("ceph: add ceph_cap_unlink_work to fire check_caps() immediately")
Link: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/AODC76VXRAMXKLFDCTK4TKFDDPWUSCN5
Reported-by: Marc Ruhmann <ruhmann@luis.uni-hannover.de>
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Marc Ruhmann <ruhmann@luis.uni-hannover.de>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
5 weeks agoMerge tag 'drm-xe-fixes-2024-04-11' of https://gitlab.freedesktop.org/drm/xe/kernel...
Dave Airlie [Thu, 11 Apr 2024 19:37:15 +0000 (05:37 +1000)]
Merge tag 'drm-xe-fixes-2024-04-11' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

- Fix double display mutex initializations
- Fix u32 -> u64 implicit conversions
- Fix RING_CONTEXT_CONTROL not marked as masked

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ewvvtgcb2gonxvccws6nt6fqswoyfp4g43t5ex24vpqwtrxdzm@hgjoz5uirmxx
5 weeks agoMerge tag 'drm-misc-fixes-2024-04-11' of https://gitlab.freedesktop.org/drm/misc...
Dave Airlie [Thu, 11 Apr 2024 19:35:37 +0000 (05:35 +1000)]
Merge tag 'drm-misc-fixes-2024-04-11' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

Short summary of fixes pull:

ast:
- Fix soft lockup

client:
- Protect connector modes with mode_config mutex

host1x:
- Do not setup DMA for virtual addresses

ivpu:
- Fix deadlock in context_xa
- PCI fixes
- Fixes to error handling

nouveau:
- gsp: Fix OOB access
- Fix casting

panfrost:
- Fix error path in MMU code

qxl:
- Revert "drm/qxl: simplify qxl_fence_wait"

vmwgfx:
- Enable DMA for SEV mappings

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240411073403.GA9895@localhost.localdomain
5 weeks agoMerge tag 'acpi-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Thu, 11 Apr 2024 19:03:43 +0000 (12:03 -0700)]
Merge tag 'acpi-6.9-rc4' of git://git./linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
 "These fix the handling of dependencies between devices in the ACPI
  device enumeration code and address a _UID matching regression from
  the 6.8 development cycle.

  Specifics:

   - Modify the ACPI device enumeration code to avoid counting
     dependencies that have been met already as unmet (Hans de Goede)

   - Make _UID matching take the integer value of 0 into account as
     appropriate (Raag Jadav)"

* tag 'acpi-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: bus: allow _UID matching for integer zero
  ACPI: scan: Do not increase dep_unmet for already met dependencies

5 weeks agoMerge tag 'pm-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Thu, 11 Apr 2024 19:00:25 +0000 (12:00 -0700)]
Merge tag 'pm-6.9-rc4' of git://git./linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Wysocki:
 "Fix the suspend-to-idle core code to guarantee that timers queued on
  CPUs other than the one that has first left the idle state, which
  should expire directly after resume, will be handled (Anna-Maria
  Behnsen)"

* tag 'pm-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: s2idle: Make sure CPUs will wakeup directly on resume

5 weeks agoMerge tag 'net-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 11 Apr 2024 18:46:31 +0000 (11:46 -0700)]
Merge tag 'net-6.9-rc4' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from bluetooth.

  Current release - new code bugs:

   - netfilter: complete validation of user input

   - mlx5: disallow SRIOV switchdev mode when in multi-PF netdev

  Previous releases - regressions:

   - core: fix u64_stats_init() for lockdep when used repeatedly in one
     file

   - ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr

   - bluetooth: fix memory leak in hci_req_sync_complete()

   - batman-adv: avoid infinite loop trying to resize local TT

   - drv: geneve: fix header validation in geneve[6]_xmit_skb

   - drv: bnxt_en: fix possible memory leak in
     bnxt_rdma_aux_device_init()

   - drv: mlx5: offset comp irq index in name by one

   - drv: ena: avoid double-free clearing stale tx_info->xdpf value

   - drv: pds_core: fix pdsc_check_pci_health deadlock

  Previous releases - always broken:

   - xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING

   - bluetooth: fix setsockopt not validating user input

   - af_unix: clear stale u->oob_skb.

   - nfc: llcp: fix nfc_llcp_setsockopt() unsafe copies

   - drv: virtio_net: fix guest hangup on invalid RSS update

   - drv: mlx5e: Fix mlx5e_priv_init() cleanup flow

   - dsa: mt7530: trap link-local frames regardless of ST Port State"

* tag 'net-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (59 commits)
  net: ena: Set tx_info->xdpf value to NULL
  net: ena: Fix incorrect descriptor free behavior
  net: ena: Wrong missing IO completions check order
  net: ena: Fix potential sign extension issue
  af_unix: Fix garbage collector racing against connect()
  net: dsa: mt7530: trap link-local frames regardless of ST Port State
  Revert "s390/ism: fix receive message buffer allocation"
  net: sparx5: fix wrong config being used when reconfiguring PCS
  net/mlx5: fix possible stack overflows
  net/mlx5: Disallow SRIOV switchdev mode when in multi-PF netdev
  net/mlx5e: RSS, Block XOR hash with over 128 channels
  net/mlx5e: Do not produce metadata freelist entries in Tx port ts WQE xmit
  net/mlx5e: HTB, Fix inconsistencies with QoS SQs number
  net/mlx5e: Fix mlx5e_priv_init() cleanup flow
  net/mlx5e: RSS, Block changing channels number when RXFH is configured
  net/mlx5: Correctly compare pkt reformat ids
  net/mlx5: Properly link new fs rules into the tree
  net/mlx5: offset comp irq index in name by one
  net/mlx5: Register devlink first under devlink lock
  net/mlx5: E-switch, store eswitch pointer before registering devlink_param
  ...

5 weeks agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Thu, 11 Apr 2024 18:42:11 +0000 (11:42 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "The most important fix is the sg one because the regression it fixes
  (spurious warning and use after final put) is already backported to
  stable.

  The next biggest impact is the target fix for wrong credentials used
  to load a module because it's affecting new kernels installed on
  selinux based distributions.

  The other three fixes are an obvious off by one and SATA protocol
  issues"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: qla2xxx: Fix off by one in qla_edif_app_getstats()
  scsi: hisi_sas: Modify the deadline for ata_wait_after_reset()
  scsi: hisi_sas: Handle the NCQ error returned by D2H frame
  scsi: target: Fix SELinux error when systemd-modules loads the target module
  scsi: sg: Avoid race in error handling & drop bogus warn

5 weeks agoMerge tag 'loongarch-fixes-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 11 Apr 2024 18:30:42 +0000 (11:30 -0700)]
Merge tag 'loongarch-fixes-6.9-1' of git://git./linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch fixes from Huacai Chen:

 - make {virt, phys, page, pfn} translation work with KFENCE for
   LoongArch (otherwise NVMe and virtio-blk cannot work with KFENCE
   enabled)

 - update dts files for Loongson-2K series to make devices work
   correctly

 - fix a build error

* tag 'loongarch-fixes-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: Include linux/sizes.h in addrspace.h to prevent build errors
  LoongArch: Update dts for Loongson-2K2000 to support GMAC/GNET
  LoongArch: Update dts for Loongson-2K2000 to support PCI-MSI
  LoongArch: Update dts for Loongson-2K2000 to support ISA/LPC
  LoongArch: Update dts for Loongson-2K1000 to support ISA/LPC
  LoongArch: Make virt_addr_valid()/__virt_addr_valid() work with KFENCE
  LoongArch: Make {virt, phys, page, pfn} translation work with KFENCE
  mm: Move lowmem_page_address() a little later

5 weeks agoMerge tag 'bcachefs-2024-04-10' of https://evilpiepirate.org/git/bcachefs
Linus Torvalds [Thu, 11 Apr 2024 18:24:55 +0000 (11:24 -0700)]
Merge tag 'bcachefs-2024-04-10' of https://evilpiepirate.org/git/bcachefs

Pull more bcachefs fixes from Kent Overstreet:
 "Notable user impacting bugs

   - On multi device filesystems, recovery was looping in
     btree_trans_too_many_iters(). This checks if a transaction has
     touched too many btree paths (because of iteration over many keys),
     and isuses a restart to drop unneeded paths.

     But it's now possible for some paths to exceed the previous limit
     without iteration in the interior btree update path, since the
     transaction commit will do alloc updates for every old and new
     btree node, and during journal replay we don't use the btree write
     buffer for locking reasons and thus those updates use btree paths
     when they wouldn't normally.

   - Fix a corner case in rebalance when moving extents on a
     durability=0 device. This wouldn't be hit when a device was
     formatted with durability=0 since in that case we'll only use it as
     a write through cache (only cached extents will live on it), but
     durability can now be changed on an existing device.

   - bch2_get_acl() could rarely forget to handle a transaction restart;
     this manifested as the occasional missing acl that came back after
     dropping caches.

   - Fix a major performance regression on high iops multithreaded write
     workloads (only since 6.9-rc1); a previous fix for a deadlock in
     the interior btree update path to check the journal watermark
     introduced a dependency on the state of btree write buffer flushing
     that we didn't want.

   - Assorted other repair paths and recovery fixes"

* tag 'bcachefs-2024-04-10' of https://evilpiepirate.org/git/bcachefs: (25 commits)
  bcachefs: Fix __bch2_btree_and_journal_iter_init_node_iter()
  bcachefs: Kill read lock dropping in bch2_btree_node_lock_write_nofail()
  bcachefs: Fix a race in btree_update_nodes_written()
  bcachefs: btree_node_scan: Respect member.data_allowed
  bcachefs: Don't scan for btree nodes when we can reconstruct
  bcachefs: Fix check_topology() when using node scan
  bcachefs: fix eytzinger0_find_gt()
  bcachefs: fix bch2_get_acl() transaction restart handling
  bcachefs: fix the count of nr_freed_pcpu after changing bc->freed_nonpcpu list
  bcachefs: Fix gap buffer bug in bch2_journal_key_insert_take()
  bcachefs: Rename struct field swap to prevent macro naming collision
  MAINTAINERS: Add entry for bcachefs documentation
  Documentation: filesystems: Add bcachefs toctree
  bcachefs: JOURNAL_SPACE_LOW
  bcachefs: Disable errors=panic for BCH_IOCTL_FSCK_OFFLINE
  bcachefs: Fix BCH_IOCTL_FSCK_OFFLINE for encrypted filesystems
  bcachefs: fix rand_delete unit test
  bcachefs: fix ! vs ~ typo in __clear_bit_le64()
  bcachefs: Fix rebalance from durability=0 device
  bcachefs: Print shutdown journal sequence number
  ...

5 weeks agoMerge tag 'tag-chrome-platform-fixes-for-v6.9-rc4' of git://git.kernel.org/pub/scm...
Linus Torvalds [Thu, 11 Apr 2024 18:15:09 +0000 (11:15 -0700)]
Merge tag 'tag-chrome-platform-fixes-for-v6.9-rc4' of git://git./linux/kernel/git/chrome-platform/linux

Pull chrome platform fix from Tzung-Bi Shih:
 "Fix a NULL pointer dereference"

* tag 'tag-chrome-platform-fixes-for-v6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
  platform/chrome: cros_ec_uart: properly fix race condition

5 weeks agoMerge branch 'acpi-bus'
Rafael J. Wysocki [Thu, 11 Apr 2024 17:36:35 +0000 (19:36 +0200)]
Merge branch 'acpi-bus'

* acpi-bus:
  ACPI: bus: allow _UID matching for integer zero

5 weeks agoceph: redirty page before returning AOP_WRITEPAGE_ACTIVATE
NeilBrown [Sun, 24 Mar 2024 22:21:20 +0000 (09:21 +1100)]
ceph: redirty page before returning AOP_WRITEPAGE_ACTIVATE

The page has been marked clean before writepage is called.  If we don't
redirty it before postponing the write, it might never get written.

Cc: stable@vger.kernel.org
Fixes: 503d4fa6ee28 ("ceph: remove reliance on bdi congestion")
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
5 weeks agodrm/xe: Label RING_CONTEXT_CONTROL as masked
Ashutosh Dixit [Thu, 4 Apr 2024 16:12:56 +0000 (09:12 -0700)]
drm/xe: Label RING_CONTEXT_CONTROL as masked

RING_CONTEXT_CONTROL is a masked register.

v2: Also clean up setting register value (Lucas)

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240404161256.3852502-1-ashutosh.dixit@intel.com
(cherry picked from commit dc30c6e7149baaae4288c742de95212b31f07438)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
5 weeks agodrm/xe/xe_migrate: Cast to output precision before multiplying operands
Himal Prasad Ghimiray [Mon, 1 Apr 2024 17:53:00 +0000 (23:23 +0530)]
drm/xe/xe_migrate: Cast to output precision before multiplying operands

Addressing potential overflow in result of  multiplication of two lower
precision (u32) operands before widening it to higher precision
(u64).

-v2
Fix commit message and description. (Rodrigo)

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240401175300.3823653-1-himal.prasad.ghimiray@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 34820967ae7b45411f8f4f737c2d63b0c608e0d7)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
5 weeks agodrm/xe/hwmon: Cast result to output precision on left shift of operand
Karthik Poosa [Fri, 5 Apr 2024 13:01:27 +0000 (18:31 +0530)]
drm/xe/hwmon: Cast result to output precision on left shift of operand

Address potential overflow in result of left shift of a
lower precision (u32) operand before assignment to higher
precision (u64) variable.

v2:
 - Update commit message. (Himal)

Fixes: 4446fcf220ce ("drm/xe/hwmon: Expose power1_max_interval")
Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Cc: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240405130127.1392426-5-karthik.poosa@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 883232b47b81108b0252197c747f396ecd51455a)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
5 weeks agodrm/xe/display: Fix double mutex initialization
Lucas De Marchi [Fri, 5 Apr 2024 20:07:11 +0000 (13:07 -0700)]
drm/xe/display: Fix double mutex initialization

All of these mutexes are already initialized by the display side since
commit 3fef3e6ff86a ("drm/i915: move display mutex inits to display
code"), so the xe shouldn´t initialize them.

Fixes: 44e694958b95 ("drm/xe/display: Implement display support")
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Arun R Murthy <arun.r.murthy@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240405200711.2041428-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 117de185edf2c5767f03575219bf7a43b161ff0d)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
5 weeks agoMerge branch 'ena-driver-bug-fixes'
Paolo Abeni [Thu, 11 Apr 2024 09:21:05 +0000 (11:21 +0200)]
Merge branch 'ena-driver-bug-fixes'

David Arinzon says:

====================
ENA driver bug fixes

From: David Arinzon <darinzon@amazon.com>

This patchset contains multiple bug fixes for the
ENA driver.
====================

Link: https://lore.kernel.org/r/20240410091358.16289-1-darinzon@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agonet: ena: Set tx_info->xdpf value to NULL
David Arinzon [Wed, 10 Apr 2024 09:13:58 +0000 (09:13 +0000)]
net: ena: Set tx_info->xdpf value to NULL

The patch mentioned in the `Fixes` tag removed the explicit assignment
of tx_info->xdpf to NULL with the justification that there's no need
to set tx_info->xdpf to NULL and tx_info->num_of_bufs to 0 in case
of a mapping error. Both values won't be used once the mapping function
returns an error, and their values would be overridden by the next
transmitted packet.

While both values do indeed get overridden in the next transmission
call, the value of tx_info->xdpf is also used to check whether a TX
descriptor's transmission has been completed (i.e. a completion for it
was polled).

An example scenario:
1. Mapping failed, tx_info->xdpf wasn't set to NULL
2. A VF reset occurred leading to IO resource destruction and
   a call to ena_free_tx_bufs() function
3. Although the descriptor whose mapping failed was freed by the
   transmission function, it still passes the check
     if (!tx_info->skb)

   (skb and xdp_frame are in a union)
4. The xdp_frame associated with the descriptor is freed twice

This patch returns the assignment of NULL to tx_info->xdpf to make the
cleaning function knows that the descriptor is already freed.

Fixes: 504fd6a5390c ("net: ena: fix DMA mapping function issues in XDP")
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agonet: ena: Fix incorrect descriptor free behavior
David Arinzon [Wed, 10 Apr 2024 09:13:57 +0000 (09:13 +0000)]
net: ena: Fix incorrect descriptor free behavior

ENA has two types of TX queues:
- queues which only process TX packets arriving from the network stack
- queues which only process TX packets forwarded to it by XDP_REDIRECT
  or XDP_TX instructions

The ena_free_tx_bufs() cycles through all descriptors in a TX queue
and unmaps + frees every descriptor that hasn't been acknowledged yet
by the device (uncompleted TX transactions).
The function assumes that the processed TX queue is necessarily from
the first category listed above and ends up using napi_consume_skb()
for descriptors belonging to an XDP specific queue.

This patch solves a bug in which, in case of a VF reset, the
descriptors aren't freed correctly, leading to crashes.

Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action")
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agonet: ena: Wrong missing IO completions check order
David Arinzon [Wed, 10 Apr 2024 09:13:56 +0000 (09:13 +0000)]
net: ena: Wrong missing IO completions check order

Missing IO completions check is called every second (HZ jiffies).
This commit fixes several issues with this check:

1. Duplicate queues check:
   Max of 4 queues are scanned on each check due to monitor budget.
   Once reaching the budget, this check exits under the assumption that
   the next check will continue to scan the remainder of the queues,
   but in practice, next check will first scan the last already scanned
   queue which is not necessary and may cause the full queue scan to
   last a couple of seconds longer.
   The fix is to start every check with the next queue to scan.
   For example, on 8 IO queues:
   Bug: [0,1,2,3], [3,4,5,6], [6,7]
   Fix: [0,1,2,3], [4,5,6,7]

2. Unbalanced queues check:
   In case the number of active IO queues is not a multiple of budget,
   there will be checks which don't utilize the full budget
   because the full scan exits when reaching the last queue id.
   The fix is to run every TX completion check with exact queue budget
   regardless of the queue id.
   For example, on 7 IO queues:
   Bug: [0,1,2,3], [4,5,6], [0,1,2,3]
   Fix: [0,1,2,3], [4,5,6,0], [1,2,3,4]
   The budget may be lowered in case the number of IO queues is less
   than the budget (4) to make sure there are no duplicate queues on
   the same check.
   For example, on 3 IO queues:
   Bug: [0,1,2,0], [1,2,0,1]
   Fix: [0,1,2], [0,1,2]

Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Amit Bernstein <amitbern@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agonet: ena: Fix potential sign extension issue
David Arinzon [Wed, 10 Apr 2024 09:13:55 +0000 (09:13 +0000)]
net: ena: Fix potential sign extension issue

Small unsigned types are promoted to larger signed types in
the case of multiplication, the result of which may overflow.
In case the result of such a multiplication has its MSB
turned on, it will be sign extended with '1's.
This changes the multiplication result.

Code example of the phenomenon:
-------------------------------
u16 x, y;
size_t z1, z2;

x = y = 0xffff;
printk("x=%x y=%x\n",x,y);

z1 = x*y;
z2 = (size_t)x*y;

printk("z1=%lx z2=%lx\n", z1, z2);

Output:
-------
x=ffff y=ffff
z1=fffffffffffe0001 z2=fffe0001

The expected result of ffff*ffff is fffe0001, and without the
explicit casting to avoid the unwanted sign extension we got
fffffffffffe0001.

This commit adds an explicit casting to avoid the sign extension
issue.

Fixes: 689b2bdaaa14 ("net: ena: add functions for handling Low Latency Queues in ena_com")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agoMerge tag 'for-net-2024-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/bluet...
Paolo Abeni [Thu, 11 Apr 2024 08:42:43 +0000 (10:42 +0200)]
Merge tag 'for-net-2024-04-10' of git://git./linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

  - L2CAP: Don't double set the HCI_CONN_MGMT_CONNECTED bit
  - Fix memory leak in hci_req_sync_complete
  - hci_sync: Fix using the same interval and window for Coded PHY
  - Fix not validating setsockopt user input

* tag 'for-net-2024-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: l2cap: Don't double set the HCI_CONN_MGMT_CONNECTED bit
  Bluetooth: hci_sock: Fix not validating setsockopt user input
  Bluetooth: ISO: Fix not validating setsockopt user input
  Bluetooth: L2CAP: Fix not validating setsockopt user input
  Bluetooth: RFCOMM: Fix not validating setsockopt user input
  Bluetooth: SCO: Fix not validating setsockopt user input
  Bluetooth: Fix memory leak in hci_req_sync_complete()
  Bluetooth: hci_sync: Fix using the same interval and window for Coded PHY
  Bluetooth: ISO: Don't reject BT_ISO_QOS if parameters are unset
====================

Link: https://lore.kernel.org/r/20240410191610.4156653-1-luiz.dentz@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agoaf_unix: Fix garbage collector racing against connect()
Michal Luczaj [Tue, 9 Apr 2024 20:09:39 +0000 (22:09 +0200)]
af_unix: Fix garbage collector racing against connect()

Garbage collector does not take into account the risk of embryo getting
enqueued during the garbage collection. If such embryo has a peer that
carries SCM_RIGHTS, two consecutive passes of scan_children() may see a
different set of children. Leading to an incorrectly elevated inflight
count, and then a dangling pointer within the gc_inflight_list.

sockets are AF_UNIX/SOCK_STREAM
S is an unconnected socket
L is a listening in-flight socket bound to addr, not in fdtable
V's fd will be passed via sendmsg(), gets inflight count bumped

connect(S, addr) sendmsg(S, [V]); close(V) __unix_gc()
---------------- ------------------------- -----------

NS = unix_create1()
skb1 = sock_wmalloc(NS)
L = unix_find_other(addr)
unix_state_lock(L)
unix_peer(S) = NS
// V count=1 inflight=0

  NS = unix_peer(S)
  skb2 = sock_alloc()
skb_queue_tail(NS, skb2[V])

// V became in-flight
// V count=2 inflight=1

close(V)

// V count=1 inflight=1
// GC candidate condition met

for u in gc_inflight_list:
  if (total_refs == inflight_refs)
    add u to gc_candidates

// gc_candidates={L, V}

for u in gc_candidates:
  scan_children(u, dec_inflight)

// embryo (skb1) was not
// reachable from L yet, so V's
// inflight remains unchanged
__skb_queue_tail(L, skb1)
unix_state_unlock(L)
for u in gc_candidates:
  if (u.inflight)
    scan_children(u, inc_inflight_move_tail)

// V count=1 inflight=2 (!)

If there is a GC-candidate listening socket, lock/unlock its state. This
makes GC wait until the end of any ongoing connect() to that socket. After
flipping the lock, a possibly SCM-laden embryo is already enqueued. And if
there is another embryo coming, it can not possibly carry SCM_RIGHTS. At
this point, unix_inflight() can not happen because unix_gc_lock is already
taken. Inflight graph remains unaffected.

Fixes: 1fd05ba5a2f2 ("[AF_UNIX]: Rewrite garbage collector, fixes race.")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240409201047.1032217-1-mhal@rbox.co
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agonet: dsa: mt7530: trap link-local frames regardless of ST Port State
Arınç ÜNAL [Tue, 9 Apr 2024 15:01:14 +0000 (18:01 +0300)]
net: dsa: mt7530: trap link-local frames regardless of ST Port State

In Clause 5 of IEEE Std 802-2014, two sublayers of the data link layer
(DLL) of the Open Systems Interconnection basic reference model (OSI/RM)
are described; the medium access control (MAC) and logical link control
(LLC) sublayers. The MAC sublayer is the one facing the physical layer.

In 8.2 of IEEE Std 802.1Q-2022, the Bridge architecture is described. A
Bridge component comprises a MAC Relay Entity for interconnecting the Ports
of the Bridge, at least two Ports, and higher layer entities with at least
a Spanning Tree Protocol Entity included.

Each Bridge Port also functions as an end station and shall provide the MAC
Service to an LLC Entity. Each instance of the MAC Service is provided to a
distinct LLC Entity that supports protocol identification, multiplexing,
and demultiplexing, for protocol data unit (PDU) transmission and reception
by one or more higher layer entities.

It is described in 8.13.9 of IEEE Std 802.1Q-2022 that in a Bridge, the LLC
Entity associated with each Bridge Port is modeled as being directly
connected to the attached Local Area Network (LAN).

On the switch with CPU port architecture, CPU port functions as Management
Port, and the Management Port functionality is provided by software which
functions as an end station. Software is connected to an IEEE 802 LAN that
is wholly contained within the system that incorporates the Bridge.
Software provides access to the LLC Entity associated with each Bridge Port
by the value of the source port field on the special tag on the frame
received by software.

We call frames that carry control information to determine the active
topology and current extent of each Virtual Local Area Network (VLAN),
i.e., spanning tree or Shortest Path Bridging (SPB) and Multiple VLAN
Registration Protocol Data Units (MVRPDUs), and frames from other link
constrained protocols, such as Extensible Authentication Protocol over LAN
(EAPOL) and Link Layer Discovery Protocol (LLDP), link-local frames. They
are not forwarded by a Bridge. Permanently configured entries in the
filtering database (FDB) ensure that such frames are discarded by the
Forwarding Process. In 8.6.3 of IEEE Std 802.1Q-2022, this is described in
detail:

Each of the reserved MAC addresses specified in Table 8-1
(01-80-C2-00-00-[00,01,02,03,04,05,06,07,08,09,0A,0B,0C,0D,0E,0F]) shall be
permanently configured in the FDB in C-VLAN components and ERs.

Each of the reserved MAC addresses specified in Table 8-2
(01-80-C2-00-00-[01,02,03,04,05,06,07,08,09,0A,0E]) shall be permanently
configured in the FDB in S-VLAN components.

Each of the reserved MAC addresses specified in Table 8-3
(01-80-C2-00-00-[01,02,04,0E]) shall be permanently configured in the FDB
in TPMR components.

The FDB entries for reserved MAC addresses shall specify filtering for all
Bridge Ports and all VIDs. Management shall not provide the capability to
modify or remove entries for reserved MAC addresses.

The addresses in Table 8-1, Table 8-2, and Table 8-3 determine the scope of
propagation of PDUs within a Bridged Network, as follows:

  The Nearest Bridge group address (01-80-C2-00-00-0E) is an address that
  no conformant Two-Port MAC Relay (TPMR) component, Service VLAN (S-VLAN)
  component, Customer VLAN (C-VLAN) component, or MAC Bridge can forward.
  PDUs transmitted using this destination address, or any other addresses
  that appear in Table 8-1, Table 8-2, and Table 8-3
  (01-80-C2-00-00-[00,01,02,03,04,05,06,07,08,09,0A,0B,0C,0D,0E,0F]), can
  therefore travel no further than those stations that can be reached via a
  single individual LAN from the originating station.

  The Nearest non-TPMR Bridge group address (01-80-C2-00-00-03), is an
  address that no conformant S-VLAN component, C-VLAN component, or MAC
  Bridge can forward; however, this address is relayed by a TPMR component.
  PDUs using this destination address, or any of the other addresses that
  appear in both Table 8-1 and Table 8-2 but not in Table 8-3
  (01-80-C2-00-00-[00,03,05,06,07,08,09,0A,0B,0C,0D,0F]), will be relayed
  by any TPMRs but will propagate no further than the nearest S-VLAN
  component, C-VLAN component, or MAC Bridge.

  The Nearest Customer Bridge group address (01-80-C2-00-00-00) is an
  address that no conformant C-VLAN component, MAC Bridge can forward;
  however, it is relayed by TPMR components and S-VLAN components. PDUs
  using this destination address, or any of the other addresses that appear
  in Table 8-1 but not in either Table 8-2 or Table 8-3
  (01-80-C2-00-00-[00,0B,0C,0D,0F]), will be relayed by TPMR components and
  S-VLAN components but will propagate no further than the nearest C-VLAN
  component or MAC Bridge.

Because the LLC Entity associated with each Bridge Port is provided via CPU
port, we must not filter these frames but forward them to CPU port.

In a Bridge, the transmission Port is majorly decided by ingress and egress
rules, FDB, and spanning tree Port State functions of the Forwarding
Process. For link-local frames, only CPU port should be designated as
destination port in the FDB, and the other functions of the Forwarding
Process must not interfere with the decision of the transmission Port. We
call this process trapping frames to CPU port.

Therefore, on the switch with CPU port architecture, link-local frames must
be trapped to CPU port, and certain link-local frames received by a Port of
a Bridge comprising a TPMR component or an S-VLAN component must be
excluded from it.

A Bridge of the switch with CPU port architecture cannot comprise a
Two-Port MAC Relay (TPMR) component as a TPMR component supports only a
subset of the functionality of a MAC Bridge. A Bridge comprising two Ports
(Management Port doesn't count) of this architecture will either function
as a standard MAC Bridge or a standard VLAN Bridge.

Therefore, a Bridge of this architecture can only comprise S-VLAN
components, C-VLAN components, or MAC Bridge components. Since there's no
TPMR component, we don't need to relay PDUs using the destination addresses
specified on the Nearest non-TPMR section, and the proportion of the
Nearest Customer Bridge section where they must be relayed by TPMR
components.

One option to trap link-local frames to CPU port is to add static FDB
entries with CPU port designated as destination port. However, because that
Independent VLAN Learning (IVL) is being used on every VID, each entry only
applies to a single VLAN Identifier (VID). For a Bridge comprising a MAC
Bridge component or a C-VLAN component, there would have to be 16 times
4096 entries. This switch intellectual property can only hold a maximum of
2048 entries. Using this option, there also isn't a mechanism to prevent
link-local frames from being discarded when the spanning tree Port State of
the reception Port is discarding.

The remaining option is to utilise the BPC, RGAC1, RGAC2, RGAC3, and RGAC4
registers. Whilst this applies to every VID, it doesn't contain all of the
reserved MAC addresses without affecting the remaining Standard Group MAC
Addresses. The REV_UN frame tag utilised using the RGAC4 register covers
the remaining 01-80-C2-00-00-[04,05,06,07,08,09,0A,0B,0C,0D,0F] destination
addresses. It also includes the 01-80-C2-00-00-22 to 01-80-C2-00-00-FF
destination addresses which may be relayed by MAC Bridges or VLAN Bridges.
The latter option provides better but not complete conformance.

This switch intellectual property also does not provide a mechanism to trap
link-local frames with specific destination addresses to CPU port by
Bridge, to conform to the filtering rules for the distinct Bridge
components.

Therefore, regardless of the type of the Bridge component, link-local
frames with these destination addresses will be trapped to CPU port:

01-80-C2-00-00-[00,01,02,03,0E]

In a Bridge comprising a MAC Bridge component or a C-VLAN component:

  Link-local frames with these destination addresses won't be trapped to
  CPU port which won't conform to IEEE Std 802.1Q-2022:

  01-80-C2-00-00-[04,05,06,07,08,09,0A,0B,0C,0D,0F]

In a Bridge comprising an S-VLAN component:

  Link-local frames with these destination addresses will be trapped to CPU
  port which won't conform to IEEE Std 802.1Q-2022:

  01-80-C2-00-00-00

  Link-local frames with these destination addresses won't be trapped to
  CPU port which won't conform to IEEE Std 802.1Q-2022:

  01-80-C2-00-00-[04,05,06,07,08,09,0A]

Currently on this switch intellectual property, if the spanning tree Port
State of the reception Port is discarding, link-local frames will be
discarded.

To trap link-local frames regardless of the spanning tree Port State, make
the switch regard them as Bridge Protocol Data Units (BPDUs). This switch
intellectual property only lets the frames regarded as BPDUs bypass the
spanning tree Port State function of the Forwarding Process.

With this change, the only remaining interference is the ingress rules.
When the reception Port has no PVID assigned on software, VLAN-untagged
frames won't be allowed in. There doesn't seem to be a mechanism on the
switch intellectual property to have link-local frames bypass this function
of the Forwarding Process.

Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch")
Reviewed-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Link: https://lore.kernel.org/r/20240409-b4-for-net-mt7530-fix-link-local-when-stp-discarding-v2-1-07b1150164ac@arinc9.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agoRevert "s390/ism: fix receive message buffer allocation"
Gerd Bayer [Tue, 9 Apr 2024 11:37:53 +0000 (13:37 +0200)]
Revert "s390/ism: fix receive message buffer allocation"

This reverts commit 58effa3476536215530c9ec4910ffc981613b413.
Review was not finished on this patch. So it's not ready for
upstreaming.

Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
Link: https://lore.kernel.org/r/20240409113753.2181368-1-gbayer@linux.ibm.com
Fixes: 58effa347653 ("s390/ism: fix receive message buffer allocation")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agonet: sparx5: fix wrong config being used when reconfiguring PCS
Daniel Machon [Tue, 9 Apr 2024 10:41:59 +0000 (12:41 +0200)]
net: sparx5: fix wrong config being used when reconfiguring PCS

The wrong port config is being used if the PCS is reconfigured. Fix this
by correctly using the new config instead of the old one.

Fixes: 946e7fd5053a ("net: sparx5: add port module support")
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240409-link-mode-reconfiguration-fix-v2-1-db6a507f3627@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 weeks agoMerge tag 'amd-drm-fixes-6.9-2024-04-10' of https://gitlab.freedesktop.org/agd5f...
Dave Airlie [Thu, 11 Apr 2024 04:47:29 +0000 (14:47 +1000)]
Merge tag 'amd-drm-fixes-6.9-2024-04-10' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.9-2024-04-10:

amdgpu:
- GPU reset fixes
- Fix some confusing logging
- UMSCH fix
- Aborted suspend fix
- DCN 3.5 fixes
- S4 fix
- MES logging fixes
- SMU 14 fixes
- SDMA 4.4.2 fix
- KASAN fix
- SMU 13.0.10 fix
- VCN partition fix
- GFX11 fixes
- DWB fixes
- Plane handling fix
- FAMS fix
- DCN 3.1.6 fix
- VSC SDP fixes
- OLED panel fix
- GFX 11.5 fix

amdkfd:
- GPU reset fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240411013425.6431-1-alexander.deucher@amd.com
5 weeks agoMerge tag 'drm-intel-fixes-2024-04-10' of https://anongit.freedesktop.org/git/drm...
Dave Airlie [Thu, 11 Apr 2024 03:52:27 +0000 (13:52 +1000)]
Merge tag 'drm-intel-fixes-2024-04-10' of https://anongit.freedesktop.org/git/drm/drm-intel into drm-fixes

Display fixes:
- Couple CDCLK programming fixes (Ville)
- HDCP related fix (Suraj)
- 4 Bigjoiner related fixes (Ville)

Core fix:
- Fix for a circular locking around GuC on reset+wedged case (John)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZhcJxlzc6zLMC1c-@intel.com
5 weeks agonet/mlx5: fix possible stack overflows
Arnd Bergmann [Mon, 8 Apr 2024 07:41:10 +0000 (09:41 +0200)]
net/mlx5: fix possible stack overflows

A couple of debug functions use a 512 byte temporary buffer and call another
function that has another buffer of the same size, which in turn exceeds the
usual warning limit for excessive stack usage:

drivers/net/ethernet/mellanox/mlx5/core/steering/dr_dbg.c:1073:1: error: stack frame size (1448) exceeds limit (1024) in 'dr_dump_start' [-Werror,-Wframe-larger-than]
dr_dump_start(struct seq_file *file, loff_t *pos)
drivers/net/ethernet/mellanox/mlx5/core/steering/dr_dbg.c:1009:1: error: stack frame size (1120) exceeds limit (1024) in 'dr_dump_domain' [-Werror,-Wframe-larger-than]
dr_dump_domain(struct seq_file *file, struct mlx5dr_domain *dmn)
drivers/net/ethernet/mellanox/mlx5/core/steering/dr_dbg.c:705:1: error: stack frame size (1104) exceeds limit (1024) in 'dr_dump_matcher_rx_tx' [-Werror,-Wframe-larger-than]
dr_dump_matcher_rx_tx(struct seq_file *file, bool is_rx,

Rework these so that each of the various code paths only ever has one of
these buffers in it, and exactly the functions that declare one have
the 'noinline_for_stack' annotation that prevents them from all being
inlined into the same caller.

Fixes: 917d1e799ddf ("net/mlx5: DR, Change SWS usage to debug fs seq_file interface")
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/all/20240219100506.648089-1-arnd@kernel.org/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240408074142.3007036-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge branch 'mlx5-misc-fixes'
Jakub Kicinski [Thu, 11 Apr 2024 02:48:16 +0000 (19:48 -0700)]
Merge branch 'mlx5-misc-fixes'

Tariq Toukan says:

====================
mlx5 misc fixes

This patchset provides bug fixes to mlx5 driver.

This is V2 of the series previously submitted as PR by Saeed:
https://lore.kernel.org/netdev/20240326144646.2078893-1-saeed@kernel.org/T/

Series generated against:
commit 237f3cf13b20 ("xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING")
====================

Link: https://lore.kernel.org/r/20240409190820.227554-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet/mlx5: Disallow SRIOV switchdev mode when in multi-PF netdev
Tariq Toukan [Tue, 9 Apr 2024 19:08:19 +0000 (22:08 +0300)]
net/mlx5: Disallow SRIOV switchdev mode when in multi-PF netdev

Adaptations need to be made for the auxiliary device management in the
core driver level. Block this combination for now.

Fixes: 678eb448055a ("net/mlx5: SD, Implement basic query and instantiation")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://lore.kernel.org/r/20240409190820.227554-12-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet/mlx5e: RSS, Block XOR hash with over 128 channels
Carolina Jubran [Tue, 9 Apr 2024 19:08:18 +0000 (22:08 +0300)]
net/mlx5e: RSS, Block XOR hash with over 128 channels

When supporting more than 128 channels, the RQT size is
calculated by multiplying the number of channels by 2
and rounding up to the nearest power of 2.

The index of the RQT is derived from the RSS hash
calculations. If XOR8 is used as the RSS hash function,
there are only 256 possible hash results, and therefore,
only 256 indexes can be reached in the RQT.

Block setting the RSS hash function to XOR when the number
of channels exceeds 128.

Fixes: 74a8dadac17e ("net/mlx5e: Preparations for supporting larger number of channels")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240409190820.227554-11-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet/mlx5e: Do not produce metadata freelist entries in Tx port ts WQE xmit
Rahul Rameshbabu [Tue, 9 Apr 2024 19:08:17 +0000 (22:08 +0300)]
net/mlx5e: Do not produce metadata freelist entries in Tx port ts WQE xmit

Free Tx port timestamping metadata entries in the NAPI poll context and
consume metadata enties in the WQE xmit path. Do not free a Tx port
timestamping metadata entry in the WQE xmit path even in the error path to
avoid a race between two metadata entry producers.

Fixes: 3178308ad4ca ("net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEs")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240409190820.227554-10-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet/mlx5e: HTB, Fix inconsistencies with QoS SQs number
Carolina Jubran [Tue, 9 Apr 2024 19:08:16 +0000 (22:08 +0300)]
net/mlx5e: HTB, Fix inconsistencies with QoS SQs number

When creating a new HTB class while the interface is down,
the variable that follows the number of QoS SQs (htb_max_qos_sqs)
may not be consistent with the number of HTB classes.

Previously, we compared these two values to ensure that
the node_qid is lower than the number of QoS SQs, and we
allocated stats for that SQ when they are equal.

Change the check to compare the node_qid with the current
number of leaf nodes and fix the checking conditions to
ensure allocation of stats_list and stats for each node.

Fixes: 214baf22870c ("net/mlx5e: Support HTB offload")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240409190820.227554-9-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet/mlx5e: Fix mlx5e_priv_init() cleanup flow
Carolina Jubran [Tue, 9 Apr 2024 19:08:15 +0000 (22:08 +0300)]
net/mlx5e: Fix mlx5e_priv_init() cleanup flow

When mlx5e_priv_init() fails, the cleanup flow calls mlx5e_selq_cleanup which
calls mlx5e_selq_apply() that assures that the `priv->state_lock` is held using
lockdep_is_held().

Acquire the state_lock in mlx5e_selq_cleanup().

Kernel log:
=============================
WARNING: suspicious RCU usage
6.8.0-rc3_net_next_841a9b5 #1 Not tainted
-----------------------------
drivers/net/ethernet/mellanox/mlx5/core/en/selq.c:124 suspicious rcu_dereference_protected() usage!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
2 locks held by systemd-modules/293:
 #0: ffffffffa05067b0 (devices_rwsem){++++}-{3:3}, at: ib_register_client+0x109/0x1b0 [ib_core]
 #1: ffff8881096c65c0 (&device->client_data_rwsem){++++}-{3:3}, at: add_client_context+0x104/0x1c0 [ib_core]

stack backtrace:
CPU: 4 PID: 293 Comm: systemd-modules Not tainted 6.8.0-rc3_net_next_841a9b5 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x8a/0xa0
 lockdep_rcu_suspicious+0x154/0x1a0
 mlx5e_selq_apply+0x94/0xa0 [mlx5_core]
 mlx5e_selq_cleanup+0x3a/0x60 [mlx5_core]
 mlx5e_priv_init+0x2be/0x2f0 [mlx5_core]
 mlx5_rdma_setup_rn+0x7c/0x1a0 [mlx5_core]
 rdma_init_netdev+0x4e/0x80 [ib_core]
 ? mlx5_rdma_netdev_free+0x70/0x70 [mlx5_core]
 ipoib_intf_init+0x64/0x550 [ib_ipoib]
 ipoib_intf_alloc+0x4e/0xc0 [ib_ipoib]
 ipoib_add_one+0xb0/0x360 [ib_ipoib]
 add_client_context+0x112/0x1c0 [ib_core]
 ib_register_client+0x166/0x1b0 [ib_core]
 ? 0xffffffffa0573000
 ipoib_init_module+0xeb/0x1a0 [ib_ipoib]
 do_one_initcall+0x61/0x250
 do_init_module+0x8a/0x270
 init_module_from_file+0x8b/0xd0
 idempotent_init_module+0x17d/0x230
 __x64_sys_finit_module+0x61/0xb0
 do_syscall_64+0x71/0x140
 entry_SYSCALL_64_after_hwframe+0x46/0x4e
 </TASK>

Fixes: 8bf30be75069 ("net/mlx5e: Introduce select queue parameters")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240409190820.227554-8-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet/mlx5e: RSS, Block changing channels number when RXFH is configured
Carolina Jubran [Tue, 9 Apr 2024 19:08:14 +0000 (22:08 +0300)]
net/mlx5e: RSS, Block changing channels number when RXFH is configured

Changing the channels number after configuring the receive flow hash
indirection table may affect the RSS table size. The previous
configuration may no longer be compatible with the new receive flow
hash indirection table.

Block changing the channels number when RXFH is configured and changing
the channels number requires resizing the RSS table size.

Fixes: 74a8dadac17e ("net/mlx5e: Preparations for supporting larger number of channels")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240409190820.227554-7-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet/mlx5: Correctly compare pkt reformat ids
Cosmin Ratiu [Tue, 9 Apr 2024 19:08:13 +0000 (22:08 +0300)]
net/mlx5: Correctly compare pkt reformat ids

struct mlx5_pkt_reformat contains a naked union of a u32 id and a
dr_action pointer which is used when the action is SW-managed (when
pkt_reformat.owner is set to MLX5_FLOW_RESOURCE_OWNER_SW). Using id
directly in that case is incorrect, as it maps to the least significant
32 bits of the 64-bit pointer in mlx5_fs_dr_action and not to the pkt
reformat id allocated in firmware.

For the purpose of comparing whether two rules are identical,
interpreting the least significant 32 bits of the mlx5_fs_dr_action
pointer as an id mostly works... until it breaks horribly and produces
the outcome described in [1].

This patch fixes mlx5_flow_dests_cmp to correctly compare ids using
mlx5_fs_dr_action_get_pkt_reformat_id for the SW-managed rules.

Link: https://lore.kernel.org/netdev/ea5264d6-6b55-4449-a602-214c6f509c1e@163.com/T/#u
Fixes: 6a48faeeca10 ("net/mlx5: Add direct rule fs_cmd implementation")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240409190820.227554-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet/mlx5: Properly link new fs rules into the tree
Cosmin Ratiu [Tue, 9 Apr 2024 19:08:12 +0000 (22:08 +0300)]
net/mlx5: Properly link new fs rules into the tree

Previously, add_rule_fg would only add newly created rules from the
handle into the tree when they had a refcount of 1. On the other hand,
create_flow_handle tries hard to find and reference already existing
identical rules instead of creating new ones.

These two behaviors can result in a situation where create_flow_handle
1) creates a new rule and references it, then
2) in a subsequent step during the same handle creation references it
   again,
resulting in a rule with a refcount of 2 that is not linked into the
tree, will have a NULL parent and root and will result in a crash when
the flow group is deleted because del_sw_hw_rule, invoked on rule
deletion, assumes node->parent is != NULL.

This happened in the wild, due to another bug related to incorrect
handling of duplicate pkt_reformat ids, which lead to the code in
create_flow_handle incorrectly referencing a just-added rule in the same
flow handle, resulting in the problem described above. Full details are
at [1].

This patch changes add_rule_fg to add new rules without parents into
the tree, properly initializing them and avoiding the crash. This makes
it more consistent with how rules are added to an FTE in
create_flow_handle.

Fixes: 74491de93712 ("net/mlx5: Add multi dest support")
Link: https://lore.kernel.org/netdev/ea5264d6-6b55-4449-a602-214c6f509c1e@163.com/T/#u
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240409190820.227554-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet/mlx5: offset comp irq index in name by one
Michael Liang [Tue, 9 Apr 2024 19:08:11 +0000 (22:08 +0300)]
net/mlx5: offset comp irq index in name by one

The mlx5 comp irq name scheme is changed a little bit between
commit 3663ad34bc70 ("net/mlx5: Shift control IRQ to the last index")
and commit 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation").
The index in the comp irq name used to start from 0 but now it starts
from 1. There is nothing critical here, but it's harmless to change
back to the old behavior, a.k.a starting from 0.

Fixes: 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation")
Reviewed-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Reviewed-by: Yuanyuan Zhong <yzhong@purestorage.com>
Signed-off-by: Michael Liang <mliang@purestorage.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240409190820.227554-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet/mlx5: Register devlink first under devlink lock
Shay Drory [Tue, 9 Apr 2024 19:08:10 +0000 (22:08 +0300)]
net/mlx5: Register devlink first under devlink lock

In case device is having a non fatal FW error during probe, the
driver will report the error to user via devlink. This will trigger
a WARN_ON, since mlx5 is calling devlink_register() last.
In order to avoid the WARN_ON[1], change mlx5 to invoke devl_register()
first under devlink lock.

[1]
WARNING: CPU: 5 PID: 227 at net/devlink/health.c:483 devlink_recover_notify.constprop.0+0xb8/0xc0
CPU: 5 PID: 227 Comm: kworker/u16:3 Not tainted 6.4.0-rc5_for_upstream_min_debug_2023_06_12_12_38 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: mlx5_health0000:08:00.0 mlx5_fw_reporter_err_work [mlx5_core]
RIP: 0010:devlink_recover_notify.constprop.0+0xb8/0xc0
Call Trace:
 <TASK>
 ? __warn+0x79/0x120
 ? devlink_recover_notify.constprop.0+0xb8/0xc0
 ? report_bug+0x17c/0x190
 ? handle_bug+0x3c/0x60
 ? exc_invalid_op+0x14/0x70
 ? asm_exc_invalid_op+0x16/0x20
 ? devlink_recover_notify.constprop.0+0xb8/0xc0
 devlink_health_report+0x4a/0x1c0
 mlx5_fw_reporter_err_work+0xa4/0xd0 [mlx5_core]
 process_one_work+0x1bb/0x3c0
 ? process_one_work+0x3c0/0x3c0
 worker_thread+0x4d/0x3c0
 ? process_one_work+0x3c0/0x3c0
 kthread+0xc6/0xf0
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork+0x1f/0x30
 </TASK>

Fixes: cf530217408e ("devlink: Notify users when objects are accessible")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240409190820.227554-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agonet/mlx5: E-switch, store eswitch pointer before registering devlink_param
Shay Drory [Tue, 9 Apr 2024 19:08:09 +0000 (22:08 +0300)]
net/mlx5: E-switch, store eswitch pointer before registering devlink_param

Next patch will move devlink register to be first. Therefore, whenever
mlx5 will register a param, the user will be notified.
In order to notify the user, devlink is using the get() callback of
the param. Hence, resources that are being used by the get() callback
must be set before the devlink param is registered.

Therefore, store eswitch pointer inside mdev before registering the
param.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20240409190820.227554-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge tag 'probes-fixes-v6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 11 Apr 2024 02:48:05 +0000 (19:48 -0700)]
Merge tag 'probes-fixes-v6.9-rc3' of git://git./linux/kernel/git/trace/linux-trace

Pull probes fixes from Masami Hiramatsu:
 "Fix possible use-after-free issue on kprobe registration.

  check_kprobe_address_safe() uses `is_module_text_address()` and
  `__module_text_address()` separately.

  As a result, if the probed address is in a module that is being
  unloaded, the first `is_module_text_address()` might return true but
  then the `__module_text_address()` call might return NULL if the
  module has been unloaded between the two.

  The result is that kprobe believes the probe is on the kernel text,
  and skips getting a module reference. In this case, when it arms a
  breakpoint on the probe address, it may cause a use-after-free.

  To fix this issue, only use `__module_text_address()` once and get a
  reference to the module then. If it fails, reject the probe"

* tag 'probes-fixes-v6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  kprobes: Fix possible use-after-free issue on kprobe registration

5 weeks agonetfilter: complete validation of user input
Eric Dumazet [Tue, 9 Apr 2024 12:07:41 +0000 (12:07 +0000)]
netfilter: complete validation of user input

In my recent commit, I missed that do_replace() handlers
use copy_from_sockptr() (which I fixed), followed
by unsafe copy_from_sockptr_offset() calls.

In all functions, we can perform the @optlen validation
before even calling xt_alloc_table_info() with the following
check:

if ((u64)optlen < (u64)tmp.size + sizeof(tmp))
        return -EINVAL;

Fixes: 0c83842df40f ("netfilter: validate user input for expected length")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Link: https://lore.kernel.org/r/20240409120741.3538135-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoMerge tag 'bootconfig-fixes-v6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 11 Apr 2024 02:42:45 +0000 (19:42 -0700)]
Merge tag 'bootconfig-fixes-v6.9-rc3' of git://git./linux/kernel/git/trace/linux-trace

Pull bootconfig fixes from Masami Hiramatsu:

 - show the original cmdline only once, and only if it was modeified by
   bootconfig

* tag 'bootconfig-fixes-v6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  fs/proc: Skip bootloader comment if no embedded kernel parameters
  fs/proc: remove redundant comments from /proc/bootconfig

5 weeks agobcachefs: Fix __bch2_btree_and_journal_iter_init_node_iter()
Kent Overstreet [Wed, 10 Apr 2024 05:30:22 +0000 (01:30 -0400)]
bcachefs: Fix __bch2_btree_and_journal_iter_init_node_iter()

We weren't respecting trans->journal_replay_not_finished - we shouldn't
be searching the journal keys unless we have a ref on them.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
5 weeks agobcachefs: Kill read lock dropping in bch2_btree_node_lock_write_nofail()
Kent Overstreet [Wed, 10 Apr 2024 04:10:18 +0000 (00:10 -0400)]
bcachefs: Kill read lock dropping in bch2_btree_node_lock_write_nofail()

dropping read locks in bch2_btree_node_lock_write_nofail() dates from
before we had the cycle detector; we can now tell the cycle detector
directly when taking a lock may not fail because we can't handle
transaction restarts.

This is needed for adding should_be_locked asserts.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
5 weeks agobcachefs: Fix a race in btree_update_nodes_written()
Kent Overstreet [Wed, 10 Apr 2024 16:53:28 +0000 (12:53 -0400)]
bcachefs: Fix a race in btree_update_nodes_written()

One btree update might have terminated in a node update, and then while
it is in flight another btree update might free that original node.

This race has to be handled in btree_update_nodes_written() - we were
missing a READ_ONCE().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
5 weeks agor8169: add missing conditional compiling for call to r8169_remove_leds
Heiner Kallweit [Wed, 10 Apr 2024 13:11:28 +0000 (15:11 +0200)]
r8169: add missing conditional compiling for call to r8169_remove_leds

Add missing dependency on CONFIG_R8169_LEDS. As-is a link error occurs
if config option CONFIG_R8169_LEDS isn't enabled.

Fixes: 19fa4f2a85d7 ("r8169: fix LED-related deadlock on module removal")
Reported-by: Venkat Rao Bagalkote <venkat88@linux.vnet.ibm.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Tested-By: Venkat Rao Bagalkote <venkat88@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/d080038c-eb6b-45ac-9237-b8c1cdd7870f@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agoplatform/chrome: cros_ec_uart: properly fix race condition
Noah Loomans [Wed, 10 Apr 2024 18:26:19 +0000 (20:26 +0200)]
platform/chrome: cros_ec_uart: properly fix race condition

The cros_ec_uart_probe() function calls devm_serdev_device_open() before
it calls serdev_device_set_client_ops(). This can trigger a NULL pointer
dereference:

    BUG: kernel NULL pointer dereference, address: 0000000000000000
    ...
    Call Trace:
     <TASK>
     ...
     ? ttyport_receive_buf

A simplified version of crashing code is as follows:

    static inline size_t serdev_controller_receive_buf(struct serdev_controller *ctrl,
                                                      const u8 *data,
                                                      size_t count)
    {
            struct serdev_device *serdev = ctrl->serdev;

            if (!serdev || !serdev->ops->receive_buf) // CRASH!
                return 0;

            return serdev->ops->receive_buf(serdev, data, count);
    }

It assumes that if SERPORT_ACTIVE is set and serdev exists, serdev->ops
will also exist. This conflicts with the existing cros_ec_uart_probe()
logic, as it first calls devm_serdev_device_open() (which sets
SERPORT_ACTIVE), and only later sets serdev->ops via
serdev_device_set_client_ops().

Commit 01f95d42b8f4 ("platform/chrome: cros_ec_uart: fix race
condition") attempted to fix a similar race condition, but while doing
so, made the window of error for this race condition to happen much
wider.

Attempt to fix the race condition again, making sure we fully setup
before calling devm_serdev_device_open().

Fixes: 01f95d42b8f4 ("platform/chrome: cros_ec_uart: fix race condition")
Cc: stable@vger.kernel.org
Signed-off-by: Noah Loomans <noah@noahloomans.com>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Link: https://lore.kernel.org/r/20240410182618.169042-2-noah@noahloomans.com
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
5 weeks agonet: dsa: mt7530: fix enabling EEE on MT7531 switch on all boards
Arınç ÜNAL [Mon, 8 Apr 2024 07:08:53 +0000 (10:08 +0300)]
net: dsa: mt7530: fix enabling EEE on MT7531 switch on all boards

The commit 40b5d2f15c09 ("net: dsa: mt7530: Add support for EEE features")
brought EEE support but did not enable EEE on MT7531 switch MACs. EEE is
enabled on MT7531 switch MACs by pulling the LAN2LED0 pin low on the board
(bootstrapping), unsetting the EEE_DIS bit on the trap register, or setting
the internal EEE switch bit on the CORE_PLL_GROUP4 register. Thanks to
SkyLake Huang (黃啟澤) from MediaTek for providing information on the
internal EEE switch bit.

There are existing boards that were not designed to pull the pin low.
Because of that, the EEE status currently depends on the board design.

The EEE_DIS bit on the trap pertains to the LAN2LED0 pin which is usually
used to control an LED. Once the bit is unset, the pin will be low. That
will make the active low LED turn on. The pin is controlled by the switch
PHY. It seems that the PHY controls the pin in the way that it inverts the
pin state. That means depending on the wiring of the LED connected to
LAN2LED0 on the board, the LED may be on without an active link.

To not cause this unwanted behaviour whilst enabling EEE on all boards, set
the internal EEE switch bit on the CORE_PLL_GROUP4 register.

My testing on MT7531 shows a certain amount of traffic loss when EEE is
enabled. That said, I haven't come across a board that enables EEE. So
enable EEE on the switch MACs but disable EEE advertisement on the switch
PHYs. This way, we don't change the behaviour of the majority of the boards
that have this switch. The mediatek-ge PHY driver already disables EEE
advertisement on the switch PHYs but my testing shows that it is somehow
enabled afterwards. Disabling EEE advertisement before the PHY driver
initialises keeps it off.

With this change, EEE can now be enabled using ethtool.

Fixes: 40b5d2f15c09 ("net: dsa: mt7530: Add support for EEE features")
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Tested-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Daniel Golle <daniel@makrotopia.org>
Link: https://lore.kernel.org/r/20240408-for-net-mt7530-fix-eee-for-mt7531-mt7988-v3-1-84fdef1f008b@arinc9.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 weeks agosmb: client: fix NULL ptr deref in cifs_mark_open_handles_for_deleted_file()
Paulo Alcantara [Mon, 8 Apr 2024 21:32:17 +0000 (18:32 -0300)]
smb: client: fix NULL ptr deref in cifs_mark_open_handles_for_deleted_file()

cifs_get_fattr() may be called with a NULL inode, so check for a
non-NULL inode before calling
cifs_mark_open_handles_for_deleted_file().

This fixes the following oops:

  mount.cifs //srv/share /mnt -o ...,vers=3.1.1
  cd /mnt
  touch foo; tail -f foo &
  rm foo
  cat foo

  BUG: kernel NULL pointer dereference, address: 00000000000005c0
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 2 PID: 696 Comm: cat Not tainted 6.9.0-rc2 #1
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
  1.16.3-1.fc39 04/01/2014
  RIP: 0010:__lock_acquire+0x5d/0x1c70
  Code: 00 00 44 8b a4 24 a0 00 00 00 45 85 f6 0f 84 bb 06 00 00 8b 2d
  48 e2 95 01 45 89 c3 41 89 d2 45 89 c8 85 ed 0 0 <48> 81 3f 40 7a 76
  83 44 0f 44 d8 83 fe 01 0f 86 1b 03 00 00 31 d2
  RSP: 0018:ffffc90000b37490 EFLAGS: 00010002
  RAX: 0000000000000000 RBX: ffff888110021ec0 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000005c0
  RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
  R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000200
  FS: 00007f2a1fa08740(0000) GS:ffff888157a00000(0000)
  knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0:
  0000000080050033
  CR2: 00000000000005c0 CR3: 000000011ac7c000 CR4: 0000000000750ef0
  PKRU: 55555554
  Call Trace:
   <TASK>
   ? __die+0x23/0x70
   ? page_fault_oops+0x180/0x490
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? exc_page_fault+0x70/0x230
   ? asm_exc_page_fault+0x26/0x30
   ? __lock_acquire+0x5d/0x1c70
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? srso_alias_return_thunk+0x5/0xfbef5
   lock_acquire+0xc0/0x2d0
   ? cifs_mark_open_handles_for_deleted_file+0x3a/0x100 [cifs]
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? kmem_cache_alloc+0x2d9/0x370
   _raw_spin_lock+0x34/0x80
   ? cifs_mark_open_handles_for_deleted_file+0x3a/0x100 [cifs]
   cifs_mark_open_handles_for_deleted_file+0x3a/0x100 [cifs]
   cifs_get_fattr+0x24c/0x940 [cifs]
   ? srso_alias_return_thunk+0x5/0xfbef5
   cifs_get_inode_info+0x96/0x120 [cifs]
   cifs_lookup+0x16e/0x800 [cifs]
   cifs_atomic_open+0xc7/0x5d0 [cifs]
   ? lookup_open.isra.0+0x3ce/0x5f0
   ? __pfx_cifs_atomic_open+0x10/0x10 [cifs]
   lookup_open.isra.0+0x3ce/0x5f0
   path_openat+0x42b/0xc30
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? srso_alias_return_thunk+0x5/0xfbef5
   do_filp_open+0xc4/0x170
   do_sys_openat2+0xab/0xe0
   __x64_sys_openat+0x57/0xa0
   do_syscall_64+0xc1/0x1e0
   entry_SYSCALL_64_after_hwframe+0x72/0x7a

Fixes: ffceb7640cbf ("smb: client: do not defer close open handles to deleted files")
Reviewed-by: Meetakshi Setiya <msetiya@microsoft.com>
Reviewed-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
5 weeks agoDrivers: hv: vmbus: Don't free ring buffers that couldn't be re-encrypted
Michael Kelley [Mon, 11 Mar 2024 16:15:58 +0000 (09:15 -0700)]
Drivers: hv: vmbus: Don't free ring buffers that couldn't be re-encrypted

In CoCo VMs it is possible for the untrusted host to cause
set_memory_encrypted() or set_memory_decrypted() to fail such that an
error is returned and the resulting memory is shared. Callers need to
take care to handle these errors to avoid returning decrypted (shared)
memory to the page allocator, which could lead to functional or security
issues.

The VMBus ring buffer code could free decrypted/shared pages if
set_memory_decrypted() fails. Check the decrypted field in the struct
vmbus_gpadl for the ring buffers to decide whether to free the memory.

Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: https://lore.kernel.org/r/20240311161558.1310-6-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20240311161558.1310-6-mhklinux@outlook.com>

5 weeks agouio_hv_generic: Don't free decrypted memory
Rick Edgecombe [Mon, 11 Mar 2024 16:15:57 +0000 (09:15 -0700)]
uio_hv_generic: Don't free decrypted memory

In CoCo VMs it is possible for the untrusted host to cause
set_memory_encrypted() or set_memory_decrypted() to fail such that an
error is returned and the resulting memory is shared. Callers need to
take care to handle these errors to avoid returning decrypted (shared)
memory to the page allocator, which could lead to functional or security
issues.

The VMBus device UIO driver could free decrypted/shared pages if
set_memory_decrypted() fails. Check the decrypted field in the gpadl
to decide whether to free the memory.

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: https://lore.kernel.org/r/20240311161558.1310-5-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20240311161558.1310-5-mhklinux@outlook.com>

5 weeks agohv_netvsc: Don't free decrypted memory
Rick Edgecombe [Mon, 11 Mar 2024 16:15:56 +0000 (09:15 -0700)]
hv_netvsc: Don't free decrypted memory

In CoCo VMs it is possible for the untrusted host to cause
set_memory_encrypted() or set_memory_decrypted() to fail such that an
error is returned and the resulting memory is shared. Callers need to
take care to handle these errors to avoid returning decrypted (shared)
memory to the page allocator, which could lead to functional or security
issues.

The netvsc driver could free decrypted/shared pages if
set_memory_decrypted() fails. Check the decrypted field in the gpadl
to decide whether to free the memory.

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: https://lore.kernel.org/r/20240311161558.1310-4-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20240311161558.1310-4-mhklinux@outlook.com>

5 weeks agoDrivers: hv: vmbus: Track decrypted status in vmbus_gpadl
Rick Edgecombe [Mon, 11 Mar 2024 16:15:55 +0000 (09:15 -0700)]
Drivers: hv: vmbus: Track decrypted status in vmbus_gpadl

In CoCo VMs it is possible for the untrusted host to cause
set_memory_encrypted() or set_memory_decrypted() to fail such that an
error is returned and the resulting memory is shared. Callers need to
take care to handle these errors to avoid returning decrypted (shared)
memory to the page allocator, which could lead to functional or security
issues.

In order to make sure callers of vmbus_establish_gpadl() and
vmbus_teardown_gpadl() don't return decrypted/shared pages to
allocators, add a field in struct vmbus_gpadl to keep track of the
decryption status of the buffers. This will allow the callers to
know if they should free or leak the pages.

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: https://lore.kernel.org/r/20240311161558.1310-3-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20240311161558.1310-3-mhklinux@outlook.com>

5 weeks agoDrivers: hv: vmbus: Leak pages if set_memory_encrypted() fails
Rick Edgecombe [Mon, 11 Mar 2024 16:15:54 +0000 (09:15 -0700)]
Drivers: hv: vmbus: Leak pages if set_memory_encrypted() fails

In CoCo VMs it is possible for the untrusted host to cause
set_memory_encrypted() or set_memory_decrypted() to fail such that an
error is returned and the resulting memory is shared. Callers need to
take care to handle these errors to avoid returning decrypted (shared)
memory to the page allocator, which could lead to functional or security
issues.

VMBus code could free decrypted pages if set_memory_encrypted()/decrypted()
fails. Leak the pages if this happens.

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: https://lore.kernel.org/r/20240311161558.1310-2-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20240311161558.1310-2-mhklinux@outlook.com>

5 weeks agohv/hv_kvp_daemon: Handle IPv4 and Ipv6 combination for keyfile format
Shradha Gupta [Fri, 22 Mar 2024 13:46:02 +0000 (06:46 -0700)]
hv/hv_kvp_daemon: Handle IPv4 and Ipv6 combination for keyfile format

If the network configuration strings are passed as a combination of IPv4
and IPv6 addresses, the current KVP daemon does not handle processing for
the keyfile configuration format.
With these changes, the keyfile config generation logic scans through the
list twice to generate IPv4 and IPv6 sections for the configuration files
to handle this support.

Testcases ran:Rhel 9, Hyper-V VMs
              (IPv4 only, IPv6 only, IPv4 and IPv6 combination)

Co-developed-by: Ani Sinha <anisinha@redhat.com>
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com>
Tested-by: Ani Sinha <anisinha@redhat.com>
Reviewed-by: Ani Sinha <anisinha@redhat.com>
Link: https://lore.kernel.org/r/1711115162-11629-1-git-send-email-shradhagupta@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <1711115162-11629-1-git-send-email-shradhagupta@linux.microsoft.com>

5 weeks agohv: vmbus: Convert sprintf() family to sysfs_emit() family
Li Zhijian [Tue, 19 Mar 2024 03:43:50 +0000 (11:43 +0800)]
hv: vmbus: Convert sprintf() family to sysfs_emit() family

Per filesystems/sysfs.rst, show() should only use sysfs_emit()
or sysfs_emit_at() when formatting the value to be returned to user space.

Coccinelle complains that there are still a couple of functions that use
snprintf(). Convert them to sysfs_emit().

sprintf() and scnprintf() will be converted as well if these files have
such abused cases.

This patch is generated by

make coccicheck M=<path/to/file> MODE=patch \
COCCI=scripts/coccinelle/api/device_attr_show.cocci

No functional change intended.

CC: "K. Y. Srinivasan" <kys@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: Wei Liu <wei.liu@kernel.org>
CC: Dexuan Cui <decui@microsoft.com>
CC: linux-hyperv@vger.kernel.org
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Link: https://lore.kernel.org/r/20240319034350.1574454-1-lizhijian@fujitsu.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20240319034350.1574454-1-lizhijian@fujitsu.com>

5 weeks agoMerge tag 'media/v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab...
Linus Torvalds [Wed, 10 Apr 2024 20:38:35 +0000 (13:38 -0700)]
Merge tag 'media/v6.9-2' of git://git./linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:

 - some fixes for mediatec vcodec encoder/decoder oopses

* tag 'media/v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: mediatek: vcodec: support 36 bits physical address
  media: mediatek: vcodec: adding lock to protect encoder context list
  media: mediatek: vcodec: adding lock to protect decoder context list
  media: mediatek: vcodec: Fix oops when HEVC init fails
  media: mediatek: vcodec: Handle VP9 superframe bitstream with 8 sub-frames

5 weeks agoMerge tag 'hardening-v6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees...
Linus Torvalds [Wed, 10 Apr 2024 20:31:34 +0000 (13:31 -0700)]
Merge tag 'hardening-v6.9-rc4' of git://git./linux/kernel/git/kees/linux

Pull hardening fixes from Kees Cook:

 - gcc-plugins/stackleak: Avoid .head.text section (Ard Biesheuvel)

 - ubsan: fix unused variable warning in test module (Arnd Bergmann)

 - Improve entropy diffusion in randomize_kstack

* tag 'hardening-v6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  randomize_kstack: Improve entropy diffusion
  ubsan: fix unused variable warning in test module
  gcc-plugins/stackleak: Avoid .head.text section

5 weeks agoMerge tag 'turbostat-2024.04.10' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 10 Apr 2024 20:13:27 +0000 (13:13 -0700)]
Merge tag 'turbostat-2024.04.10' of git://git./linux/kernel/git/lenb/linux

Pull turbostat updates from Len Brown:

 - Use of the CPU MSR driver is now optional

 - Perf is now preferred for many counters

 - Non-root users can now execute turbostat, though with limited
   functionality

 - Add counters for some new GFX hardware

 - Minor fixes

* tag 'turbostat-2024.04.10' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (26 commits)
  tools/power turbostat: v2024.04.10
  tools/power/turbostat: Add support for Xe sysfs knobs
  tools/power/turbostat: Add support for new i915 sysfs knobs
  tools/power/turbostat: Introduce BIC_SAM_mc6/BIC_SAMMHz/BIC_SAMACTMHz
  tools/power/turbostat: Fix uncore frequency file string
  tools/power/turbostat: Unify graphics sysfs snapshots
  tools/power/turbostat: Cache graphics sysfs path
  tools/power/turbostat: Enable MSR_CORE_C1_RES support for ICX
  tools/power turbostat: Add selftests
  tools/power turbostat: read RAPL counters via perf
  tools/power turbostat: Add proper re-initialization for perf file descriptors
  tools/power turbostat: Clear added counters when in no-msr mode
  tools/power turbostat: add early exits for permission checks
  tools/power turbostat: detect and disable unavailable BICs at runtime
  tools/power turbostat: Add reading aperf and mperf via perf API
  tools/power turbostat: Add --no-perf option
  tools/power turbostat: Add --no-msr option
  tools/power turbostat: enhance -D (debug counter dump) output
  tools/power turbostat: Fix warning upon failed /dev/cpu_dma_latency read
  tools/power turbostat: Read base_hz and bclk from CPUID.16H if available
  ...

5 weeks agoMerge tag 'platform-drivers-x86-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 10 Apr 2024 20:10:22 +0000 (13:10 -0700)]
Merge tag 'platform-drivers-x86-v6.9-2' of git://git./linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Ilpo Järvinen:
 "Fixes:

   - intel/hid: Solve spurious hibernation aborts (power button release)

   - toshiba_acpi: Ignore 2 keys to avoid log noise during
     suspend/resume

   - intel-vbtn: Fix probe by restoring VBDL and VGBS evalutation order

   - lg-laptop: Fix W=1 %s null argument warning

  New HW Support:

   - acer-wmi: PH18-71 mode button and fan speed sensor

   - intel/hid: Lunar Lake and Arrow Lake HID IDs"

* tag 'platform-drivers-x86-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: lg-laptop: fix %s null argument warning
  platform/x86: intel-vbtn: Update tablet mode switch at end of probe
  platform/x86: intel-vbtn: Use acpi_has_method to check for switch
  platform/x86: toshiba_acpi: Silence logging for some events
  platform/x86/intel/hid: Add Lunar Lake and Arrow Lake support
  platform/x86/intel/hid: Don't wake on 5-button releases
  platform/x86: acer-wmi: Add support for Acer PH18-71

5 weeks agoBluetooth: l2cap: Don't double set the HCI_CONN_MGMT_CONNECTED bit
Archie Pusaka [Thu, 4 Apr 2024 10:50:23 +0000 (18:50 +0800)]
Bluetooth: l2cap: Don't double set the HCI_CONN_MGMT_CONNECTED bit

The bit is set and tested inside mgmt_device_connected(), therefore we
must not set it just outside the function.

Fixes: eeda1bf97bb5 ("Bluetooth: hci_event: Fix not indicating new connection for BIG Sync")
Signed-off-by: Archie Pusaka <apusaka@chromium.org>
Reviewed-by: Manish Mandlik <mmandlik@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
5 weeks agoBluetooth: hci_sock: Fix not validating setsockopt user input
Luiz Augusto von Dentz [Fri, 5 Apr 2024 20:46:50 +0000 (16:46 -0400)]
Bluetooth: hci_sock: Fix not validating setsockopt user input

Check user input length before copying data.

Fixes: 09572fca7223 ("Bluetooth: hci_sock: Add support for BT_{SND,RCV}BUF")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
5 weeks agoBluetooth: ISO: Fix not validating setsockopt user input
Luiz Augusto von Dentz [Fri, 5 Apr 2024 19:56:50 +0000 (15:56 -0400)]
Bluetooth: ISO: Fix not validating setsockopt user input

Check user input length before copying data.

Fixes: ccf74f2390d6 ("Bluetooth: Add BTPROTO_ISO socket type")
Fixes: 0731c5ab4d51 ("Bluetooth: ISO: Add support for BT_PKT_STATUS")
Fixes: f764a6c2c1e4 ("Bluetooth: ISO: Add broadcast support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
5 weeks agoBluetooth: L2CAP: Fix not validating setsockopt user input
Luiz Augusto von Dentz [Fri, 5 Apr 2024 19:50:47 +0000 (15:50 -0400)]
Bluetooth: L2CAP: Fix not validating setsockopt user input

Check user input length before copying data.

Fixes: 33575df7be67 ("Bluetooth: move l2cap_sock_setsockopt() to l2cap_sock.c")
Fixes: 3ee7b7cd8390 ("Bluetooth: Add BT_MODE socket option")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
5 weeks agoBluetooth: RFCOMM: Fix not validating setsockopt user input
Luiz Augusto von Dentz [Fri, 5 Apr 2024 19:43:45 +0000 (15:43 -0400)]
Bluetooth: RFCOMM: Fix not validating setsockopt user input

syzbot reported rfcomm_sock_setsockopt_old() is copying data without
checking user input length.

BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset
include/linux/sockptr.h:49 [inline]
BUG: KASAN: slab-out-of-bounds in copy_from_sockptr
include/linux/sockptr.h:55 [inline]
BUG: KASAN: slab-out-of-bounds in rfcomm_sock_setsockopt_old
net/bluetooth/rfcomm/sock.c:632 [inline]
BUG: KASAN: slab-out-of-bounds in rfcomm_sock_setsockopt+0x893/0xa70
net/bluetooth/rfcomm/sock.c:673
Read of size 4 at addr ffff8880209a8bc3 by task syz-executor632/5064

Fixes: 9f2c8a03fbb3 ("Bluetooth: Replace RFCOMM link mode with security level")
Fixes: bb23c0ab8246 ("Bluetooth: Add support for deferring RFCOMM connection setup")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
5 weeks agoBluetooth: SCO: Fix not validating setsockopt user input
Luiz Augusto von Dentz [Fri, 5 Apr 2024 19:41:52 +0000 (15:41 -0400)]
Bluetooth: SCO: Fix not validating setsockopt user input

syzbot reported sco_sock_setsockopt() is copying data without
checking user input length.

BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset
include/linux/sockptr.h:49 [inline]
BUG: KASAN: slab-out-of-bounds in copy_from_sockptr
include/linux/sockptr.h:55 [inline]
BUG: KASAN: slab-out-of-bounds in sco_sock_setsockopt+0xc0b/0xf90
net/bluetooth/sco.c:893
Read of size 4 at addr ffff88805f7b15a3 by task syz-executor.5/12578

Fixes: ad10b1a48754 ("Bluetooth: Add Bluetooth socket voice option")
Fixes: b96e9c671b05 ("Bluetooth: Add BT_DEFER_SETUP option to sco socket")
Fixes: 00398e1d5183 ("Bluetooth: Add support for BT_PKT_STATUS CMSG data for SCO connections")
Fixes: f6873401a608 ("Bluetooth: Allow setting of codec for HFP offload use case")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>