Conversation
Summary of ChangesHello @zhouguangyuan0718, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request integrates Thin Link-Time Optimization (LTO) and the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request enables ThinLTO and function merging by default, which is a great step for optimizing binary size and performance. The implementation correctly adds the necessary flags for both native and cross-compilation builds.
My main feedback is about code duplication. The LTO-related flags are defined in two separate places (use and UseTarget functions). I've left a comment suggesting to refactor this by extracting the flags into package-level variables to improve maintainability. I also pointed out a misleading comment.
| // For ld.lld linker, also add CPU info to linker flags | ||
| if config.Linker == "ld.lld" { | ||
| ldflags = append(ldflags, | ||
| // Enable ThinLTO, then global DCE will work well, | ||
| // and "merge-functions" can merge the function in whole program. | ||
| "--lto=thin", | ||
| "-mllvm", "-enable-merge-functions", | ||
| "-mllvm", "-mergefunc-use-aliases") | ||
| cflags = append(cflags, "-flto=thin", "-funified-lto") | ||
| ccflags = append(ccflags, "-flto=thin", "-funified-lto") | ||
| } |
There was a problem hiding this comment.
The comment on line 509 is misleading. This block adds LTO and function merging flags, not CPU info. Please update the comment for clarity.
Additionally, these LTO flags and the associated comment are duplicated from the use function (see lines 230-234 and 257-258). To improve maintainability and avoid future inconsistencies, I recommend refactoring this by extracting the different sets of flags into package-level variables. This would make the code cleaner and easier to manage.
For example:
// At package level
var (
ltoCompilerFlags = []string{"-flto=thin", "-funified-lto"}
ltoLinkerFlagsForClangDriver = []string{
// ... comment here ...
"-flto=thin",
"-Wl,-mllvm,-enable-merge-functions",
"-Wl,-mllvm,-mergefunc-use-aliases",
}
ltoLinkerFlagsForLld = []string{
// ... comment here ...
"--lto=thin",
"-mllvm", "-enable-merge-functions",
"-mllvm", "-mergefunc-use-aliases",
}
)Then you could use ltoLinkerFlagsForLld and ltoCompilerFlags here, and the other variables in the use function.
| // For ld.lld linker, also add CPU info to linker flags | |
| if config.Linker == "ld.lld" { | |
| ldflags = append(ldflags, | |
| // Enable ThinLTO, then global DCE will work well, | |
| // and "merge-functions" can merge the function in whole program. | |
| "--lto=thin", | |
| "-mllvm", "-enable-merge-functions", | |
| "-mllvm", "-mergefunc-use-aliases") | |
| cflags = append(cflags, "-flto=thin", "-funified-lto") | |
| ccflags = append(ccflags, "-flto=thin", "-funified-lto") | |
| } | |
| // For ld.lld linker, add LTO and function merging flags. | |
| if config.Linker == "ld.lld" { | |
| ldflags = append(ldflags, | |
| // Enable ThinLTO, then global DCE will work well, | |
| // and "merge-functions" can merge the function in whole program. | |
| "--lto=thin", | |
| "-mllvm", "-enable-merge-functions", | |
| "-mllvm", "-mergefunc-use-aliases") | |
| cflags = append(cflags, "-flto=thin", "-funified-lto") | |
| ccflags = append(ccflags, "-flto=thin", "-funified-lto") | |
| } |
1f80264 to
740d318
Compare
740d318 to
18167cc
Compare
5d6ef59 to
d1cad0b
Compare
40cd0a9 to
5684f3d
Compare
b2dd536 to
c3c991d
Compare
|
Overall solid work enabling ThinLTO and preserving exported symbols. The
See inline comments for details. |
| "riscv32" | ||
| "riscv64" | ||
| "rp2040" | ||
| "nintendoswitch" # undefined symbol under lto, should not work when no-lto |
There was a problem hiding this comment.
Nit: the comment says "should not work when no-lto" but the intent seems to be that it doesn't work with LTO (undefined symbol under LTO). Consider rephrasing, e.g.: # undefined symbol under lto, skip for now.
| "-Wl,--icf=safe", | ||
| // Enable ThinLTO, Using default lto kind(thinlto). | ||
| "-Wl,--lto-O0", | ||
| } |
There was a problem hiding this comment.
--lto-O0 sets the LTO optimization level to zero — this pays the full cost of ThinLTO (bitcode intermediates, slower link) while disabling the optimizations that make LTO valuable (cross-module inlining, interprocedural constant propagation, global dead code elimination). The only benefit surviving at O0 is that the linker can see all symbols, but ICF and linker-level dead stripping work without LTO as well.
Consider --lto-O2 (common ThinLTO default) to actually realize the link-time optimization benefits.
Also, the comment // Enable ThinLTO, Using default lto kind(thinlto). is misleading here — --lto-O0 doesn't enable ThinLTO (that's done by -flto=thin in CCFLAGS). This flag only controls the optimization level during the LTO link step.
Additionally, -Wl,--icf=safe and -Wl,--lto-O0 are ld.lld (ELF) flags. On macOS, -fuse-ld=lld resolves to ld64.lld (Mach-O linker) which uses different flag syntax. These flags may produce linker warnings or errors on macOS (or be silently ignored due to -Wno-unused-command-line-argument). Consider gating them behind a platform check, similar to the OS-specific branch below at line ~268.
| "-Wno-unused-command-line-argument", | ||
| "-flto=thin", | ||
| } | ||
|
|
There was a problem hiding this comment.
In the native compile path, -flto=thin is added only to CCFLAGS but not to CFLAGS. In contrast, the UseTarget path (line ~507) adds -flto=thin to both cflags and ccflags. Without -flto=thin in CFLAGS, object files compiled via that path won't contain LTO bitcode, so ThinLTO can't optimize across those translation units. Is this intentional?
| func (p Package) markLLVMUsed(v llvm.Value) { | ||
| elemTyp := p.Prog.VoidPtr().ll | ||
| p.llvmUsedValues = append(p.llvmUsedValues, llvm.ConstBitCast(v, elemTyp)) | ||
| if !p.llvmUsed.IsNil() { | ||
| p.llvmUsed.EraseFromParentAsGlobal() | ||
| } | ||
| init := llvm.ConstArray(elemTyp, p.llvmUsedValues) | ||
| global := llvm.AddGlobal(p.mod, init.Type(), "llvm.compiler.used") | ||
| global.SetInitializer(init) | ||
| global.SetLinkage(llvm.AppendingLinkage) | ||
| global.SetSection("llvm.metadata") | ||
| p.llvmUsed = global | ||
| } |
There was a problem hiding this comment.
markLLVMUsed erases and recreates the llvm.compiler.used global each time a symbol is added, resulting in O(n^2) total LLVM IR manipulation for n preserved symbols. Consider deferring the construction to a finalization step — accumulate values in llvmUsedValues during compilation and build the llvm.compiler.used global once before the module is emitted. This simplifies the code and avoids the repeated erase/create cycle.
Also, the field and method names reference llvmUsed but the actual global created is llvm.compiler.used. These are semantically different LLVM intrinsics (llvm.used prevents both compiler and linker removal; llvm.compiler.used only prevents compiler removal). The naming could be clearer — e.g., llvmCompilerUsed / llvmCompilerUsedValues.
| p.llvmUsed.EraseFromParentAsGlobal() | ||
| } | ||
| init := llvm.ConstArray(elemTyp, p.llvmUsedValues) | ||
| global := llvm.AddGlobal(p.mod, init.Type(), "llvm.compiler.used") |
There was a problem hiding this comment.
The global created here is "llvm.compiler.used", but cl/_testdata/cpkg/out.ll shows @llvm.used (not @llvm.compiler.used). These are semantically distinct LLVM intrinsics:
llvm.used: prevents both compiler and linker from removing symbolsllvm.compiler.used: prevents only the compiler/optimizer from removing, allowing the linker to strip unused symbols
The commit message says "use llvm.compiler.used ... instead of relying on llvm.used merging behavior," and the unit test in ssa_test.go:132 correctly asserts @llvm.compiler.used. But the out.ll reference file contradicts this. Is the out.ll file correctly regenerated? If LLVM is normalizing the name during llgen/gentests output, that would be worth investigating.
| if pkgPath == "syscall" && goos == "darwin" && (goarch == "arm64" || goarch == "amd64") && | ||
| (strings.HasSuffix(resolved, "RawSyscall") || strings.HasSuffix(resolved, "RawSyscall6")) { | ||
| continue | ||
| } | ||
| keep = append(keep, fn) | ||
| } |
There was a problem hiding this comment.
Minor: The darwin RawSyscall/RawSyscall6 filter lacks an explanatory comment, unlike the sibling Linux filter for rawVforkSyscall above (line 254). Consider adding a brief comment explaining why these assembly functions are excluded on darwin (e.g., provided by runtime via go:linkname, or incompatible with LTO).
592cd90 to
3fa5c27
Compare
use llvm.compiler.used to preserve exported symbols during LTO Signed-off-by: ZhouGuangyuan <zhouguangyuan.xian@gmail.com>
bce6845 to
def4427
Compare
Signed-off-by: ZhouGuangyuan <zhouguangyuan.xian@gmail.com>
def4427 to
6d59d79
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1622 +/- ##
==========================================
- Coverage 92.99% 92.99% -0.01%
==========================================
Files 47 47
Lines 13175 13210 +35
==========================================
+ Hits 12252 12284 +32
- Misses 737 742 +5
+ Partials 186 184 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
No description provided.