Skip to content

Optimize FEL USB performance#234

Open
jameshilliard wants to merge 5 commits into
linux-sunxi:masterfrom
jameshilliard:toc0-boot
Open

Optimize FEL USB performance#234
jameshilliard wants to merge 5 commits into
linux-sunxi:masterfrom
jameshilliard:toc0-boot

Conversation

@jameshilliard

@jameshilliard jameshilliard commented May 14, 2026

Copy link
Copy Markdown
Contributor

Implementation details: README.FEL-SPEED.md

With optimizations:

sunxi-fel -v -p uboot u-boot-sunxi-with-spl.bin write 0x49000000 fit.itb
Tracking FEL device by SID 32c05000:0c004808:01075b8a:2c9a1f11
Switching USB to high-speed mode, setting FEL bulk endpoints to 512-byte packets: 0x05100040: 0x00029940 -> 0x00029860
Selecting FEL device 000:002 by SID
Applying SMC workaround via secure-SVC return thunk...  done.
TOC0: wrapped SPL item 49111 bytes from 0x840, load 0x00020060
TOC0: payload starts at 0xe000
Stack pointers: sp_irq=0x00021400, sp=0x00053FFC
MMU is not enabled by BROM
=> Executing the SPL... done.
loading image "ARM Trusted Firmware" (41065 bytes) to 0x40000000
loading image "U-Boot" (917496 bytes) to 0x4f000000
loading DTB "sun50i-h616-whatsminer" (48408 bytes)
Patching BROM RX FIFO copy to use DMA.
100% [================================================] 41082 kB, 30185.6 kB/s 
Starting U-Boot (0x40000000).
Store entry point 0x40000000 to RVBAR 0x09010040, and request warm reset with RMR mode 3... done.

Without optimizations:

sunxi-fel --no-high-speed -v -p uboot u-boot-sunxi-with-spl.bin write 0x49000000 fit.itb
Applying SMC workaround via secure-SVC return thunk...  done.
TOC0: wrapped SPL item 49111 bytes from 0x840, load 0x00020060
TOC0: payload starts at 0xe000
Stack pointers: sp_irq=0x00021400, sp=0x00053FFC
MMU is not enabled by BROM
=> Executing the SPL... done.
loading image "ARM Trusted Firmware" (41065 bytes) to 0x40000000
loading image "U-Boot" (917496 bytes) to 0x4f000000
loading DTB "sun50i-h616-whatsminer" (48408 bytes)
100% [================================================] 41082 kB,  320.0 kB/s 
Starting U-Boot (0x40000000).
Store entry point 0x40000000 to RVBAR 0x09010040, and request warm reset with RMR mode 3... done.

@jameshilliard jameshilliard force-pushed the toc0-boot branch 3 times, most recently from 2f16236 to c7aa218 Compare May 15, 2026 20:45
@jameshilliard jameshilliard changed the title Fix h616 secure FEL uboot with TOC0 images Optimize H616 FEL USB performance May 15, 2026
@jameshilliard jameshilliard force-pushed the toc0-boot branch 10 times, most recently from f712eac to df2c0b6 Compare May 17, 2026 03:04
@jameshilliard jameshilliard changed the title Optimize H616 FEL USB performance Optimize FEL USB performance May 17, 2026
@jameshilliard jameshilliard force-pushed the toc0-boot branch 4 times, most recently from 12a1b8d to 710b55a Compare May 24, 2026 04:10
Comment thread soc_info.h
uint32_t rvbar_reg_alt;/* alternative MMIO address of RVBARADDR0_L register */
uint32_t ver_reg; /* MMIO address of "Version Register" */
uint32_t usb_musb_base;/* base address of the USB OTG controller */
uint32_t fel_endpoint_table_addr; /* BROM FEL endpoint table */

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to read this address from the MUSB controller at runtime, instead of storing it per SoC?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this can be read from the MUSB controller itself.

fel_endpoint_table_addr is not a MUSB register address. It points to the
Boot-ROM's in-SRAM FEL endpoint state table. The MUSB registers expose the
hardware controller and endpoint state, but they do not provide a pointer back
to the BROM's software endpoint descriptors.

The table layout seen in the supported BROMs is:

endpoint_table + 0x10: first FEL bulk endpoint entry
endpoint_table + 0x20: second FEL bulk endpoint entry

entry + 0x00: pointer to endpoint object, or zero
entry + 0x08: cached maxpacket value in the table entry
object + 0x04: cached maxpacket value in the endpoint object

The high-speed thunk patches those cached maxpacket fields to 512 before the
USB reconnect. This keeps the BROM FEL command loop's endpoint bookkeeping in
sync with the high-speed bulk packet size. Without that, the controller can be
switched to high-speed while the BROM's software state still describes 64-byte
full-speed bulk packets.

@jameshilliard jameshilliard force-pushed the toc0-boot branch 2 times, most recently from 0cf057c to 498c80f Compare June 5, 2026 19:07
Separate the runtime SMC-workaround probe from the code that executes
the workaround so later SoCs can choose a different implementation
without mixing that change into the existing direct-SMC path.

Describe the workaround method in SoC data and make every existing
secure-FEL user select the direct-SMC method explicitly. This preserves
the current behaviour because direct SMC remains the only implementation
in this patch.

Also make the old-format thunk header rule pattern-based while it still
only builds the existing SPL thunk header.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
On H616 with the secure boot fuse set, FEL starts in non-secure
state. The older direct SMC workaround is not sufficient there because
SMC enters monitor mode instead of directly leaving the BROM FEL command
loop in secure SVC state.

Add a secure-SVC SMC thunk for this case and keep the existing global
startup workaround model. The thunk preserves the BROM SRAM workspace
using the same swap-table convention as the SPL thunk, installs a
temporary monitor-mode SMC handler by patching only the SMC vector word,
then issues SMC. The temporary handler restores the original vector word,
clears SCR.NS, clears MVBAR, restores the secure GIC view expected by
the BROM, copies the saved SVC SP/LR into the secure bank, switches to
secure SVC, and returns to the FEL command loop.

After the transition, the normal runtime probe sees secure state and
suppresses repeat application in that sunxi-fel process, so normal SID
reads and SPL execution use the existing code paths.

H616 selects the secure-SVC SMC thunk path and gates the workaround on a
non-zero secure boot status word at SID base + 0xa0. The zero-word
runtime probe still has to match before the thunk is applied, so
non-secure H616 boards and already-transitioned secure-FEL sessions do
not enter the secure path just because one of those checks matches.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Secure boot sunxi images use TOC0 containers instead of a bare eGON
SPL header. Teach the spl/uboot command path to recognize a TOC0
image, validate its header and checksum, extract the firmware item, and
wrap it in a synthetic eGON SPL header at the SoC SPL load address.

After the wrapped SPL returns to FEL, pass the payload appended after
the TOC0 container to the existing U-Boot image loader so FIT-based
u-boot-sunxi-with-spl.bin images can be loaded by the normal uboot
command path.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Teach sunxi-fel to switch supported MUSB FEL sessions to high-speed USB.

The USB speed switch disconnects the device, so run the BROM endpoint
maxpacket update and MUSB POWER write from a small device-side thunk.
Do not rely on ordinary FEL commands after the disconnect starts. Treat
the expected disconnect during that EXEC path as success.

Add SID-based device selection and reuse it for re-enumeration
tracking. This lets command sequences reopen the same FEL device after
the speed switch, avoids accidentally continuing on another attached
FEL device, and tolerates transient libusb open and version-probe
failures while the device is reappearing. Keep --no-high-speed as an
escape hatch for the old full-speed path.

Apply the secure-state workaround after the optional speed switch, so
it runs once on the final FEL handle. This still runs before commands
on --no-high-speed, already-high-speed, and unsupported high-speed
paths.

Before requesting an RMR warm reset, clear the USB soft-connect bit
when the USB controller is known and mark the handle disconnected so
libusb cleanup does not wait on a device that intentionally left FEL.

Provide USB controller base and BROM endpoint table addresses for SoCs
whose BROM dumps show the compatible MUSB FEL endpoint layout. This
allows the generic high-speed switch and 512-byte endpoint maxpacket
patch paths to run on A33, A64, H3, H5, A63, H6, H616, V853, V5, A523
and A133.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
@electricworry

Copy link
Copy Markdown

My output with the latest push made 23 minutes ago:

sunxi-fel2 -v uboot u-boot/u-boot-sunxi-with-spl.bin
Tracking FEL device by SID 33802000:4c004808:01474788:2c6c22d1
Switching USB to high-speed mode, setting FEL bulk endpoints to 512-byte packets: 0x05100040: 0x00029940 -> 0x00029860
Selecting FEL device 005:091 by SID
found DT name in SPL header: allwinner/sun50i-h618-orangepi-zero3
Stack pointers: sp_irq=0x00021400, sp=0x00053FFC
MMU is not enabled by BROM
=> Executing the SPL... done.
loading image "ARM Trusted Firmware" (53364 bytes) to 0x40000000
loading image "U-Boot" (6829360 bytes) to 0x4a000000
loading DTB "allwinner/sun50i-h618-orangepi-zero3" (32496 bytes)
Starting U-Boot (0x40000000).
Store entry point 0x40000000 to RVBAR 0x08100040, and request warm reset with RMR mode 3... done.

The newer BROM FEL path copies received USB data byte-by-byte from the
MUSB FIFO. Even after switching the controller to high-speed mode, this
leaves large post-SPL DRAM writes far slower than the USB link can
support.

After SPL has initialized DRAM, install a parameterized RX DMA thunk for
writes into the known DRAM window. The thunk builds the MMU remap it
needs, hooks the BROM RX FIFO copy helper, and replaces it with a
MUSB/USBC endpoint-DMA receive path. It routes the active RX endpoint
DRQ through VEND0 before starting the internal DMA channel, so the FEL
wire protocol stays unchanged.

Put the common thunk code in fel_lib.c and install it from the shared
aw_fel_write_buffer() path. This makes normal writes and FIT image
loading use the same post-SPL DRAM write preparation path. Put the
SoC-specific hook, translation-table and shadow-page addresses in the
SoC table. Enable the path for SoCs whose BROM dumps contain the same
FIFO-copy helper ABI and compatible MUSB register layout: A33, A64, H3,
H5, A63, H6, H616, V853, V5, A523 and A133.

Coalesce full high-speed packets into bounded RX DMA requests. Keep each
host write as a single AW_FEL_1_WRITE request, and have the thunk copy
the final short packet with PIO after stopping DMA. This keeps arbitrary
post-SPL DRAM write sizes working without changing the FEL wire protocol.
Use a 192 KiB DMA request cap; 256 KiB requests can time out while the
host is writing.

Add the checked-in DMA thunk header to the sunxi-fel prerequisites and
thunk documentation. This makes the binary rebuild when the generated
header changes without making the sunxi-fel target regenerate thunk
headers.

Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
@electricworry

Copy link
Copy Markdown

Latest test on Orange Pi Zero 3 ran in near instantaneous time:

sunxi-fel2 -v uboot u-boot/u-boot-sunxi-with-spl.bin
Tracking FEL device by SID 33802000:4c004808:01474788:2c6c22d1
Switching USB to high-speed mode, setting FEL bulk endpoints to 512-byte packets: 0x05100040: 0x00029940 -> 0x00029860
Selecting FEL device 005:036 by SID
found DT name in SPL header: allwinner/sun50i-h618-orangepi-zero3
Stack pointers: sp_irq=0x00021400, sp=0x00053FFC
MMU is not enabled by BROM
=> Executing the SPL... done.
loading image "ARM Trusted Firmware" (53364 bytes) to 0x40000000
Patching BROM RX FIFO copy to use DMA.
loading image "U-Boot" (6829360 bytes) to 0x4a000000
loading DTB "allwinner/sun50i-h618-orangepi-zero3" (32496 bytes)
Starting U-Boot (0x40000000).
Store entry point 0x40000000 to RVBAR 0x08100040, and request warm reset with RMR mode 3... done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants