Optimize FEL USB performance#234
Conversation
2f16236 to
c7aa218
Compare
f712eac to
df2c0b6
Compare
12a1b8d to
710b55a
Compare
| uint32_t rvbar_reg_alt;/* alternative MMIO address of RVBARADDR0_L register */ | ||
| uint32_t ver_reg; /* MMIO address of "Version Register" */ | ||
| uint32_t usb_musb_base;/* base address of the USB OTG controller */ | ||
| uint32_t fel_endpoint_table_addr; /* BROM FEL endpoint table */ |
There was a problem hiding this comment.
Is it possible to read this address from the MUSB controller at runtime, instead of storing it per SoC?
There was a problem hiding this comment.
I don't think this can be read from the MUSB controller itself.
fel_endpoint_table_addr is not a MUSB register address. It points to the
Boot-ROM's in-SRAM FEL endpoint state table. The MUSB registers expose the
hardware controller and endpoint state, but they do not provide a pointer back
to the BROM's software endpoint descriptors.
The table layout seen in the supported BROMs is:
endpoint_table + 0x10: first FEL bulk endpoint entry
endpoint_table + 0x20: second FEL bulk endpoint entry
entry + 0x00: pointer to endpoint object, or zero
entry + 0x08: cached maxpacket value in the table entry
object + 0x04: cached maxpacket value in the endpoint object
The high-speed thunk patches those cached maxpacket fields to 512 before the
USB reconnect. This keeps the BROM FEL command loop's endpoint bookkeeping in
sync with the high-speed bulk packet size. Without that, the controller can be
switched to high-speed while the BROM's software state still describes 64-byte
full-speed bulk packets.
0cf057c to
498c80f
Compare
Separate the runtime SMC-workaround probe from the code that executes the workaround so later SoCs can choose a different implementation without mixing that change into the existing direct-SMC path. Describe the workaround method in SoC data and make every existing secure-FEL user select the direct-SMC method explicitly. This preserves the current behaviour because direct SMC remains the only implementation in this patch. Also make the old-format thunk header rule pattern-based while it still only builds the existing SPL thunk header. Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
On H616 with the secure boot fuse set, FEL starts in non-secure state. The older direct SMC workaround is not sufficient there because SMC enters monitor mode instead of directly leaving the BROM FEL command loop in secure SVC state. Add a secure-SVC SMC thunk for this case and keep the existing global startup workaround model. The thunk preserves the BROM SRAM workspace using the same swap-table convention as the SPL thunk, installs a temporary monitor-mode SMC handler by patching only the SMC vector word, then issues SMC. The temporary handler restores the original vector word, clears SCR.NS, clears MVBAR, restores the secure GIC view expected by the BROM, copies the saved SVC SP/LR into the secure bank, switches to secure SVC, and returns to the FEL command loop. After the transition, the normal runtime probe sees secure state and suppresses repeat application in that sunxi-fel process, so normal SID reads and SPL execution use the existing code paths. H616 selects the secure-SVC SMC thunk path and gates the workaround on a non-zero secure boot status word at SID base + 0xa0. The zero-word runtime probe still has to match before the thunk is applied, so non-secure H616 boards and already-transitioned secure-FEL sessions do not enter the secure path just because one of those checks matches. Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Secure boot sunxi images use TOC0 containers instead of a bare eGON SPL header. Teach the spl/uboot command path to recognize a TOC0 image, validate its header and checksum, extract the firmware item, and wrap it in a synthetic eGON SPL header at the SoC SPL load address. After the wrapped SPL returns to FEL, pass the payload appended after the TOC0 container to the existing U-Boot image loader so FIT-based u-boot-sunxi-with-spl.bin images can be loaded by the normal uboot command path. Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Teach sunxi-fel to switch supported MUSB FEL sessions to high-speed USB. The USB speed switch disconnects the device, so run the BROM endpoint maxpacket update and MUSB POWER write from a small device-side thunk. Do not rely on ordinary FEL commands after the disconnect starts. Treat the expected disconnect during that EXEC path as success. Add SID-based device selection and reuse it for re-enumeration tracking. This lets command sequences reopen the same FEL device after the speed switch, avoids accidentally continuing on another attached FEL device, and tolerates transient libusb open and version-probe failures while the device is reappearing. Keep --no-high-speed as an escape hatch for the old full-speed path. Apply the secure-state workaround after the optional speed switch, so it runs once on the final FEL handle. This still runs before commands on --no-high-speed, already-high-speed, and unsupported high-speed paths. Before requesting an RMR warm reset, clear the USB soft-connect bit when the USB controller is known and mark the handle disconnected so libusb cleanup does not wait on a device that intentionally left FEL. Provide USB controller base and BROM endpoint table addresses for SoCs whose BROM dumps show the compatible MUSB FEL endpoint layout. This allows the generic high-speed switch and 512-byte endpoint maxpacket patch paths to run on A33, A64, H3, H5, A63, H6, H616, V853, V5, A523 and A133. Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
|
My output with the latest push made 23 minutes ago: |
The newer BROM FEL path copies received USB data byte-by-byte from the MUSB FIFO. Even after switching the controller to high-speed mode, this leaves large post-SPL DRAM writes far slower than the USB link can support. After SPL has initialized DRAM, install a parameterized RX DMA thunk for writes into the known DRAM window. The thunk builds the MMU remap it needs, hooks the BROM RX FIFO copy helper, and replaces it with a MUSB/USBC endpoint-DMA receive path. It routes the active RX endpoint DRQ through VEND0 before starting the internal DMA channel, so the FEL wire protocol stays unchanged. Put the common thunk code in fel_lib.c and install it from the shared aw_fel_write_buffer() path. This makes normal writes and FIT image loading use the same post-SPL DRAM write preparation path. Put the SoC-specific hook, translation-table and shadow-page addresses in the SoC table. Enable the path for SoCs whose BROM dumps contain the same FIFO-copy helper ABI and compatible MUSB register layout: A33, A64, H3, H5, A63, H6, H616, V853, V5, A523 and A133. Coalesce full high-speed packets into bounded RX DMA requests. Keep each host write as a single AW_FEL_1_WRITE request, and have the thunk copy the final short packet with PIO after stopping DMA. This keeps arbitrary post-SPL DRAM write sizes working without changing the FEL wire protocol. Use a 192 KiB DMA request cap; 256 KiB requests can time out while the host is writing. Add the checked-in DMA thunk header to the sunxi-fel prerequisites and thunk documentation. This makes the binary rebuild when the generated header changes without making the sunxi-fel target regenerate thunk headers. Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
|
Latest test on Orange Pi Zero 3 ran in near instantaneous time: |
Implementation details: README.FEL-SPEED.md
With optimizations:
Without optimizations: