Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,81 +1,91 @@
From 3106d546d494f2f52ec832e7f7d04f534286e254 Mon Sep 17 00:00:00 2001
Message-ID: <3106d546d494f2f52ec832e7f7d04f534286e254.1777064117.git.lukasz@raczylo.com>
In-Reply-To: <cover.1777064117.git.lukasz@raczylo.com>
References: <cover.1777064117.git.lukasz@raczylo.com>
From 0ee595ef700d4f8dee3efe3b992f31ad8ee9e7af Mon Sep 17 00:00:00 2001
From: Lukasz Raczylo <lukasz@raczylo.com>
Date: Fri, 24 Apr 2026 21:50:55 +0100
Subject: [RFC PATCH net-next 1/3] net: macb: flush PCIe posted write after
TSTART doorbell
Date: Fri, 15 May 2026 13:46:37 +0100
Subject: [PATCH 1/3] net: macb: flush PCIe posted write after TSTART doorbell
(PCIe-only)

macb_start_xmit() and macb_tx_restart() kick transmission by
OR-ing MACB_BIT(TSTART) into NCR. On PCIe-attached macb instances
(BCM2712 + RP1 PCIe south bridge on Raspberry Pi 5 is the setup we
have in front of us), writes to NCR are posted PCIe writes: they
are not guaranteed to reach the device before the issuing CPU
returns. If the TSTART doorbell does not reach the MAC, no TX
begins, no TCOMP completion arrives, and the ring remains
quiescent without any kernel-visible indication.
OR-ing MACB_BIT(TSTART) into NCR. On PCIe-attached macb
instances (BCM2712 + RP1 PCIe south bridge on Raspberry Pi 5 is
the case I have in front of me), writes to NCR are posted PCIe
writes: they are not guaranteed to reach the device before the
issuing CPU returns. If the TSTART doorbell does not reach the
MAC, no TX begins, no TCOMP completion arrives, and the ring
remains quiescent without any kernel-visible indication.

Note that the raspberrypi/linux vendor fork carries a local patch
around the TSTART site (a queue->tx_pending breadcrumb that is
promoted to queue->txubr_pending by the next TCOMP interrupt,
triggering macb_tx_restart()). That workaround makes the loss
recoverable under traffic, but it cannot help if TCOMP itself is
not raised because no TX started -- which is exactly the case we
are targeting here. The handshake is not present in mainline.
Add a read-back of NCR after each TSTART write. The read is an
architected PCIe read barrier for earlier posted writes on the
same path; it ensures the doorbell has reached the MAC before
the function returns.

Add a read-back of NCR after each TSTART write in macb_start_xmit()
and macb_tx_restart(). The read is an architected PCIe read
barrier for earlier posted writes on the same path; it ensures the
doorbell has reached the MAC before the functions return.

We do not yet have direct hardware evidence that TSTART is being
lost on the RP1 path (that would require a PCIe protocol analyser,
or at minimum a before/after counter on queue->tx_stall_last_tail
with and without this patch applied in isolation). This patch is
one of a three-patch series ("candidate fixes for silent TX stall
on BCM2712/RP1"); see the cover letter for context. We have
verified the series compiles and applies cleanly against mainline
HEAD and against raspberrypi/linux rpi-6.18.y @ f2f68e79f16f;
runtime verification is pending.
The cost is one non-posted PCIe read per TSTART. To avoid
imposing this on SoC-integrated macb variants (Atmel, Microchip,
SiFive, Xilinx), where NCR is on-chip MMIO and no fabric
posted-write concern exists, gate the readback behind a new
MACB_CAPS_PCIE_POSTED_WRITES capability set only on
raspberrypi_rp1_config.

Link: https://github.com/cilium/cilium/issues/43198
Link: https://bugs.launchpad.net/ubuntu/+source/linux-raspi/+bug/2133877
Signed-off-by: Lukasz Raczylo <lukasz@raczylo.com>
---
drivers/net/ethernet/cadence/macb_main.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
drivers/net/ethernet/cadence/macb.h | 4 ++++
drivers/net/ethernet/cadence/macb_main.c | 13 ++++++++++++-
2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
index 0830c4897..bc2225956 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -769,6 +769,10 @@
#define MACB_CAPS_NEED_TSUCLK 0x00000400
#define MACB_CAPS_QUEUE_DISABLE 0x00000800
#define MACB_CAPS_QBV 0x00001000
+/* Register writes are posted on the parent fabric and need a non-posted
+ * read-back to guarantee delivery. Currently set only on RP1.
+ */
+#define MACB_CAPS_PCIE_POSTED_WRITES 0x00002000
#define MACB_CAPS_PCS 0x01000000
#define MACB_CAPS_HIGH_SPEED 0x02000000
#define MACB_CAPS_CLK_HW_CHG 0x04000000
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index a12aa2124..b6cca55ad 100644
index 17d4a3e03..fa80236dd 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -1922,6 +1922,13 @@ static void macb_tx_restart(struct macb_queue *queue)
@@ -1807,6 +1807,13 @@ static void macb_tx_restart(struct macb_queue *queue)

spin_lock(&bp->lock);
macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
+ /*
+ * Flush the PCIe posted-write queue so the TSTART doorbell
+ * reliably reaches the MAC. Without this, the write can sit
+ * in the fabric and the MAC never advances, causing a silent
+ * TX stall.
+ /* On PCIe-attached parts, flush the posted-write queue so the
+ * TSTART doorbell reliably reaches the MAC. Without this the
+ * write can sit in the fabric and the MAC never advances,
+ * causing a silent TX stall.
+ */
+ (void)macb_readl(bp, NCR);
+ if (bp->caps & MACB_CAPS_PCIE_POSTED_WRITES)
+ (void)macb_readl(bp, NCR);
spin_unlock(&bp->lock);

out_tx_ptr_unlock:
@@ -2560,6 +2567,11 @@ static netdev_tx_t macb_start_xmit(struct sk_buff *skb, struct net_device *dev)
@@ -2481,6 +2488,9 @@ static netdev_tx_t macb_start_xmit(struct sk_buff *skb, struct net_device *dev)

spin_lock(&bp->lock);
macb_tx_lpi_wake(bp);
macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
+ /*
+ * Flush the PCIe posted-write queue; see the comment in
+ * macb_tx_restart() for the reasoning.
+ */
+ (void)macb_readl(bp, NCR);
+ /* Flush PCIe posted-write queue; see comment in macb_tx_restart(). */
+ if (bp->caps & MACB_CAPS_PCIE_POSTED_WRITES)
+ (void)macb_readl(bp, NCR);
spin_unlock(&bp->lock);

if (CIRC_SPACE(queue->tx_head, queue->tx_tail, bp->tx_ring_size) < 1)
@@ -5474,7 +5484,8 @@ static const struct macb_config versal_config = {
static const struct macb_config raspberrypi_rp1_config = {
.caps = MACB_CAPS_GIGABIT_MODE_AVAILABLE | MACB_CAPS_CLK_HW_CHG |
MACB_CAPS_JUMBO |
- MACB_CAPS_GEM_HAS_PTP,
+ MACB_CAPS_GEM_HAS_PTP |
+ MACB_CAPS_PCIE_POSTED_WRITES,
.dma_burst_length = 16,
.clk_init = macb_clk_init,
.init = macb_init,
--
2.53.0
2.54.0

Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
From a27adeab1b08fac9ff3978d745caa536a458430b Mon Sep 17 00:00:00 2001
From: Lukasz Raczylo <lukasz@raczylo.com>
Date: Fri, 15 May 2026 13:47:20 +0100
Subject: [PATCH 2/3] net: macb: insert PCIe read barrier before TX completion
descriptor check

macb_tx_poll() runs with TCOMP masked, drains the TX ring, then
calls napi_complete_done() and re-enables TCOMP via IER. An
existing comment in the function notes that completions raised
while TCOMP is masked do not re-fire on IER re-enable, and
mitigates this by calling macb_tx_complete_pending(), which
inspects driver-visible ring state (descriptor->ctrl, after
rmb()) and reschedules NAPI if a completion is observable in
memory.

On PCIe-attached parts (BCM2712 + RP1 PCIe south bridge on
Raspberry Pi 5 is the case I have in front of me), the
descriptor DMA write that sets TX_USED may not have retired to
system memory at the point macb_tx_complete_pending() runs. The
rmb() synchronises the CPU view of earlier CPU writes; it is
not sufficient to retire an in-flight peripheral DMA write.

Add a side-effect-free MMIO read between the IER write and the
macb_tx_complete_pending() check. The read functions as an
architected PCIe read barrier for earlier peripheral-originated
DMA writes on the same path, so any in-flight TX_USED update
retires to system memory before the descriptor read.

The register chosen is IMR (the read-only interrupt mask
mirror); reading it has no side effects on either read-clear or
W1C ISR silicon (it is not the ISR).

Link: https://github.com/cilium/cilium/issues/43198
Link: https://bugs.launchpad.net/ubuntu/+source/linux-raspi/+bug/2133877
Signed-off-by: Lukasz Raczylo <lukasz@raczylo.com>
---
drivers/net/ethernet/cadence/macb_main.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index fa80236dd..23120fc7c 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -1868,6 +1868,13 @@ static int macb_tx_poll(struct napi_struct *napi, int budget)
* actions if an interrupt is raised just after enabling them,
* but this should be harmless.
*/
+ /* PCIe read barrier: flush any in-flight peripheral DMA
+ * writes (descriptor TX_USED updates) so the subsequent
+ * macb_tx_complete_pending() check observes them. IMR is
+ * the read-only interrupt mask mirror; the read has no
+ * side effects on either read-clear or W1C ISR silicon.
+ */
+ (void)queue_readl(queue, IMR);
if (macb_tx_complete_pending(queue)) {
queue_writel(queue, IDR, MACB_BIT(TCOMP));
if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
--
2.54.0

This file was deleted.

Loading