Skip to content

[Feat] Enable LTO#1622

Open
zhouguangyuan0718 wants to merge 3 commits intogoplus:mainfrom
zhouguangyuan0718:main-lto
Open

[Feat] Enable LTO#1622
zhouguangyuan0718 wants to merge 3 commits intogoplus:mainfrom
zhouguangyuan0718:main-lto

Conversation

@zhouguangyuan0718
Copy link
Copy Markdown
Contributor

No description provided.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @zhouguangyuan0718, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates Thin Link-Time Optimization (LTO) and the merge-functions feature into the default build process. This enhancement is designed to improve overall code optimization and potentially reduce the size of the compiled binaries by allowing the linker to perform whole-program analysis and merge identical functions, leading to more efficient and smaller executables.

Highlights

  • ThinLTO Enabled by Default: Thin Link-Time Optimization (LTO) is now enabled by default for both clang and ld.lld linkers, allowing for whole-program optimization.
  • Merge Functions Enabled: The merge-functions feature is now enabled by default, which helps reduce binary size by merging identical functions across the entire program.
  • Unified LTO Flags: The -funified-lto flag has been added to CCFLAGS, cflags, and ccflags to ensure consistent LTO behavior across compilation and linking stages.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • internal/crosscompile/crosscompile.go
    • Added -flto=thin, -Wl,-mllvm,-enable-merge-functions, and -Wl,-mllvm,-mergefunc-use-aliases to ldflags when clangRoot is present, enabling ThinLTO and function merging.
    • Appended -flto=thin and -funified-lto to export.CCFLAGS for clang.
    • Introduced a conditional block to apply --lto=thin, -mllvm -enable-merge-functions, -mllvm -mergefunc-use-aliases to ldflags, and -flto=thin, -funified-lto to cflags and ccflags specifically when ld.lld is configured as the linker.
Activity
  • No specific activity (comments, reviews, or progress updates) has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables ThinLTO and function merging by default, which is a great step for optimizing binary size and performance. The implementation correctly adds the necessary flags for both native and cross-compilation builds.

My main feedback is about code duplication. The LTO-related flags are defined in two separate places (use and UseTarget functions). I've left a comment suggesting to refactor this by extracting the flags into package-level variables to improve maintainability. I also pointed out a misleading comment.

Comment on lines +509 to +519
// For ld.lld linker, also add CPU info to linker flags
if config.Linker == "ld.lld" {
ldflags = append(ldflags,
// Enable ThinLTO, then global DCE will work well,
// and "merge-functions" can merge the function in whole program.
"--lto=thin",
"-mllvm", "-enable-merge-functions",
"-mllvm", "-mergefunc-use-aliases")
cflags = append(cflags, "-flto=thin", "-funified-lto")
ccflags = append(ccflags, "-flto=thin", "-funified-lto")
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment on line 509 is misleading. This block adds LTO and function merging flags, not CPU info. Please update the comment for clarity.

Additionally, these LTO flags and the associated comment are duplicated from the use function (see lines 230-234 and 257-258). To improve maintainability and avoid future inconsistencies, I recommend refactoring this by extracting the different sets of flags into package-level variables. This would make the code cleaner and easier to manage.

For example:

// At package level
var (
	ltoCompilerFlags = []string{"-flto=thin", "-funified-lto"}
	ltoLinkerFlagsForClangDriver = []string{
		// ... comment here ...
		"-flto=thin",
		"-Wl,-mllvm,-enable-merge-functions",
		"-Wl,-mllvm,-mergefunc-use-aliases",
	}
	ltoLinkerFlagsForLld = []string{
		// ... comment here ...
		"--lto=thin",
		"-mllvm", "-enable-merge-functions",
		"-mllvm", "-mergefunc-use-aliases",
	}
)

Then you could use ltoLinkerFlagsForLld and ltoCompilerFlags here, and the other variables in the use function.

Suggested change
// For ld.lld linker, also add CPU info to linker flags
if config.Linker == "ld.lld" {
ldflags = append(ldflags,
// Enable ThinLTO, then global DCE will work well,
// and "merge-functions" can merge the function in whole program.
"--lto=thin",
"-mllvm", "-enable-merge-functions",
"-mllvm", "-mergefunc-use-aliases")
cflags = append(cflags, "-flto=thin", "-funified-lto")
ccflags = append(ccflags, "-flto=thin", "-funified-lto")
}
// For ld.lld linker, add LTO and function merging flags.
if config.Linker == "ld.lld" {
ldflags = append(ldflags,
// Enable ThinLTO, then global DCE will work well,
// and "merge-functions" can merge the function in whole program.
"--lto=thin",
"-mllvm", "-enable-merge-functions",
"-mllvm", "-mergefunc-use-aliases")
cflags = append(cflags, "-flto=thin", "-funified-lto")
ccflags = append(ccflags, "-flto=thin", "-funified-lto")
}

@zhouguangyuan0718 zhouguangyuan0718 force-pushed the main-lto branch 5 times, most recently from 1f80264 to 740d318 Compare February 13, 2026 16:09
@zhouguangyuan0718 zhouguangyuan0718 changed the title [Feat] Enable LTO and merge-functions default [Feat] Enable LTO Feb 26, 2026
@zhouguangyuan0718 zhouguangyuan0718 force-pushed the main-lto branch 21 times, most recently from 5d6ef59 to d1cad0b Compare March 2, 2026 11:28
@zhouguangyuan0718 zhouguangyuan0718 force-pushed the main-lto branch 3 times, most recently from 40cd0a9 to 5684f3d Compare March 2, 2026 14:29
@zhouguangyuan0718 zhouguangyuan0718 marked this pull request as ready for review March 2, 2026 15:03
@zhouguangyuan0718 zhouguangyuan0718 force-pushed the main-lto branch 2 times, most recently from b2dd536 to c3c991d Compare March 2, 2026 15:44
@xgopilot
Copy link
Copy Markdown
Contributor

xgopilot bot commented Mar 2, 2026

Overall solid work enabling ThinLTO and preserving exported symbols. The llvm.compiler.used mechanism and fitIntSize slice fixes are well-structured and properly tested. A few concerns:

  1. --lto-O0 sets LTO optimization to O0, paying ThinLTO overhead (bitcode files, slower link) without its key benefits (cross-module inlining, IPO). Consider --lto-O2.
  2. out.ll shows @llvm.used but source code creates llvm.compiler.used — these have different semantics.
  3. ELF-specific linker flags (--icf=safe, --lto-O0) used unconditionally on macOS native path where ld64.lld is used.

See inline comments for details.

"riscv32"
"riscv64"
"rp2040"
"nintendoswitch" # undefined symbol under lto, should not work when no-lto
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the comment says "should not work when no-lto" but the intent seems to be that it doesn't work with LTO (undefined symbol under LTO). Consider rephrasing, e.g.: # undefined symbol under lto, skip for now.

Comment on lines +228 to 231
"-Wl,--icf=safe",
// Enable ThinLTO, Using default lto kind(thinlto).
"-Wl,--lto-O0",
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--lto-O0 sets the LTO optimization level to zero — this pays the full cost of ThinLTO (bitcode intermediates, slower link) while disabling the optimizations that make LTO valuable (cross-module inlining, interprocedural constant propagation, global dead code elimination). The only benefit surviving at O0 is that the linker can see all symbols, but ICF and linker-level dead stripping work without LTO as well.

Consider --lto-O2 (common ThinLTO default) to actually realize the link-time optimization benefits.

Also, the comment // Enable ThinLTO, Using default lto kind(thinlto). is misleading here — --lto-O0 doesn't enable ThinLTO (that's done by -flto=thin in CCFLAGS). This flag only controls the optimization level during the LTO link step.

Additionally, -Wl,--icf=safe and -Wl,--lto-O0 are ld.lld (ELF) flags. On macOS, -fuse-ld=lld resolves to ld64.lld (Mach-O linker) which uses different flag syntax. These flags may produce linker warnings or errors on macOS (or be silently ignored due to -Wno-unused-command-line-argument). Consider gating them behind a platform check, similar to the OS-specific branch below at line ~268.

"-Wno-unused-command-line-argument",
"-flto=thin",
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the native compile path, -flto=thin is added only to CCFLAGS but not to CFLAGS. In contrast, the UseTarget path (line ~507) adds -flto=thin to both cflags and ccflags. Without -flto=thin in CFLAGS, object files compiled via that path won't contain LTO bitcode, so ThinLTO can't optimize across those translation units. Is this intentional?


if config.Linker == "ld.lld" {
// Enable ThinLTO, Using default lto kind(thinlto).
ldflags = append(ldflags, "--lto-O0")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same --lto-O0 concern as the native path — this pays ThinLTO overhead without the optimization benefits. Consider --lto-O2.

Comment on lines +735 to +747
func (p Package) markLLVMUsed(v llvm.Value) {
elemTyp := p.Prog.VoidPtr().ll
p.llvmUsedValues = append(p.llvmUsedValues, llvm.ConstBitCast(v, elemTyp))
if !p.llvmUsed.IsNil() {
p.llvmUsed.EraseFromParentAsGlobal()
}
init := llvm.ConstArray(elemTyp, p.llvmUsedValues)
global := llvm.AddGlobal(p.mod, init.Type(), "llvm.compiler.used")
global.SetInitializer(init)
global.SetLinkage(llvm.AppendingLinkage)
global.SetSection("llvm.metadata")
p.llvmUsed = global
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

markLLVMUsed erases and recreates the llvm.compiler.used global each time a symbol is added, resulting in O(n^2) total LLVM IR manipulation for n preserved symbols. Consider deferring the construction to a finalization step — accumulate values in llvmUsedValues during compilation and build the llvm.compiler.used global once before the module is emitted. This simplifies the code and avoids the repeated erase/create cycle.

Also, the field and method names reference llvmUsed but the actual global created is llvm.compiler.used. These are semantically different LLVM intrinsics (llvm.used prevents both compiler and linker removal; llvm.compiler.used only prevents compiler removal). The naming could be clearer — e.g., llvmCompilerUsed / llvmCompilerUsedValues.

p.llvmUsed.EraseFromParentAsGlobal()
}
init := llvm.ConstArray(elemTyp, p.llvmUsedValues)
global := llvm.AddGlobal(p.mod, init.Type(), "llvm.compiler.used")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The global created here is "llvm.compiler.used", but cl/_testdata/cpkg/out.ll shows @llvm.used (not @llvm.compiler.used). These are semantically distinct LLVM intrinsics:

  • llvm.used: prevents both compiler and linker from removing symbols
  • llvm.compiler.used: prevents only the compiler/optimizer from removing, allowing the linker to strip unused symbols

The commit message says "use llvm.compiler.used ... instead of relying on llvm.used merging behavior," and the unit test in ssa_test.go:132 correctly asserts @llvm.compiler.used. But the out.ll reference file contradicts this. Is the out.ll file correctly regenerated? If LLVM is normalizing the name during llgen/gentests output, that would be worth investigating.

Comment on lines +257 to 262
if pkgPath == "syscall" && goos == "darwin" && (goarch == "arm64" || goarch == "amd64") &&
(strings.HasSuffix(resolved, "RawSyscall") || strings.HasSuffix(resolved, "RawSyscall6")) {
continue
}
keep = append(keep, fn)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: The darwin RawSyscall/RawSyscall6 filter lacks an explanatory comment, unlike the sibling Linux filter for rawVforkSyscall above (line 254). Consider adding a brief comment explaining why these assembly functions are excluded on darwin (e.g., provided by runtime via go:linkname, or incompatible with LTO).

@zhouguangyuan0718 zhouguangyuan0718 force-pushed the main-lto branch 4 times, most recently from 592cd90 to 3fa5c27 Compare March 3, 2026 15:03
use llvm.compiler.used to preserve exported symbols during LTO

Signed-off-by: ZhouGuangyuan <zhouguangyuan.xian@gmail.com>
@zhouguangyuan0718 zhouguangyuan0718 force-pushed the main-lto branch 4 times, most recently from bce6845 to def4427 Compare March 4, 2026 16:26
Signed-off-by: ZhouGuangyuan <zhouguangyuan.xian@gmail.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 5, 2026

Codecov Report

❌ Patch coverage is 94.73684% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.99%. Comparing base (3b3ff41) to head (6d59d79).

Files with missing lines Patch % Lines
internal/crosscompile/crosscompile.go 80.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1622      +/-   ##
==========================================
- Coverage   92.99%   92.99%   -0.01%     
==========================================
  Files          47       47              
  Lines       13175    13210      +35     
==========================================
+ Hits        12252    12284      +32     
- Misses        737      742       +5     
+ Partials      186      184       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant