Skip to content

build: switch package pipeline to bitcode(BC) linking#1673

Open
luoliwoshang wants to merge 10 commits intogoplus:mainfrom
luoliwoshang:codex/bitcode-lto-inprocess-link
Open

build: switch package pipeline to bitcode(BC) linking#1673
luoliwoshang wants to merge 10 commits intogoplus:mainfrom
luoliwoshang:codex/bitcode-lto-inprocess-link

Conversation

@luoliwoshang
Copy link
Copy Markdown
Member

@luoliwoshang luoliwoshang commented Mar 2, 2026

  • 其实比较纠结.a里面放bc这个事情,因为最终会需要在最后把bc重新拉出来做最后一次编译,如果放在.a里就会导致最后进行合并时需要多一层解包提取的处理,性能上感觉并不合算。
  • 由于在该PR中本来就是落的BC文件,所以genll模式直接通过ModuleToString直接输出文件就行
  • 现存问题:
    1. 直接在llvm.Module写出的.bc文件回读Parse时结构会有结构错误问题,所以现在处理为先写出为.ll文件,然后再编译为bc(实在还是不优雅)需要解决
    2. LLDB test 直接使用merge后的.bc时会出现调试符号的测试FAIL,暂时用了一个临时的Fallback Logic
    3. 最后的全模块视图下的CodeGen由于前面的包仅保留了BC,而统一到最后编译,就导致在全视图下编译压力过大,在Demo下表现还行macos-latest编译时间从 2:30左右到了4:50,但是对于llgo test ./... 的时候,本应该在七分钟结束的测试列表,在半个小时还么跑完

Summary

  • switch package build outputs to bitcode-first flow and normalize per-package outputs into merged .bc (+ optional native link inputs for non-bitcode objects)
  • link final program by loading and linking package bitcode modules in-process via LLVM APIs (TinyGo-style), then compiling one linked object for native link
  • update C/cgo/extra-file compilation paths to emit .bc for LTO (assembly stays native .o)
  • update build-cache tests and collect tests for the new bitcode cache layout

Cache Behavior (Before vs After)

Before

  • cache key: fingerprint from manifest inputs
  • cache payload: package .a + .manifest
  • cache hit condition: .a exists for fingerprint
  • link input model: final link consumed per-package .a directly

After

  • cache key: unchanged (same fingerprint computation)
  • cache payload: package .bc (required) + .manifest, optional native .a for non-bitcode inputs
  • cache hit condition: .bc exists for fingerprint (optional .a reused when present)
  • link input model: package .bc files are linked in-process to one program module, compiled to one .o, then linked with optional native inputs

Validation

  • go test ./internal/build -count=1
  • /tmp/llgo-new build -x -target esp32c3 -obin -a . in _demo/embed/esp32c3/println (verified in-process bitcode link path, no llvm-link command)
  • /tmp/llgo-new build -size -target esp32c3 -obin -a . in _demo/embed/esp32c3/println

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the build process to leverage bitcode-first compilation and in-process LLVM API linking. The core change involves producing and caching bitcode files as the primary artifact for packages, which are then linked together at a program level before final native compilation. This approach aims to improve optimization opportunities through Link-Time Optimization (LTO) and streamline the linking phase by handling bitcode modules directly within the build system.

Highlights

  • Bitcode-first build flow: The package build pipeline has been switched to a bitcode-first flow, normalizing per-package outputs into merged '.bc' files, with optional native archives for non-bitcode inputs.
  • LLVM API linking: The final program is now linked by loading and linking package bitcode modules in-process via LLVM APIs, similar to TinyGo, before compiling a single linked object for native linking.
  • Updated caching strategy: Existing fingerprint logic is maintained, but package bitcode artifacts ('.bc') are now cached as the primary cache payload.
  • C/cgo/extra-file compilation: Compilation paths for C, cgo, and extra files have been updated to emit '.bc' for Link-Time Optimization (LTO), while assembly files continue to emit native '.o' files.
  • Build cache and collect tests: The build-cache and collect tests have been updated to reflect the new bitcode cache layout and logic.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • internal/build/build.go
    • Added a bitcodeLTO field to the context struct to enable bitcode LTO linking.
    • Refactored package output normalization by replacing normalizeToArchive with new functions (splitBitcodeAndNativeInputs, tempNamePrefix, mergeBitcodeFiles, normalizePackageOutputs, compileLinkedBitcodeToObject) to handle separate bitcode and native inputs.
    • Updated calls to package output normalization to use the new normalizePackageOutputs function.
    • Modified compileExtraFiles to emit bitcode for C-like files and native objects for assembly, returning a mixed list of link inputs.
    • Reworked linkMainPkg to differentiate between bitcode and native link inputs, enabling in-process LLVM API linking for bitcode modules.
    • Changed exportObject to compile LLVM IR (.ll) to bitcode (.bc) instead of native objects (.o).
    • Added a BitcodeFile field to the aPackage struct and clarified descriptions for ObjFiles and ArchiveFile.
    • Initialized the new BitcodeFile field when creating aPackage instances in buildSSAPkgs.
    • Updated clFile to emit bitcode for C-like files and native objects for assembly, aligning with the LTO strategy.
  • internal/build/cache.go
    • Introduced cacheBitcodeExt constant for bitcode file extensions.
    • Added a Bitcode field to the cachePaths struct to store the path to the cached bitcode file and clarified the Archive description.
    • Modified PackagePaths to generate a path for the bitcode file.
    • Updated cacheExists to prioritize checking for the bitcode file in the cache.
    • Adjusted cache listing and statistics functions (listCachedPackages, stats) to count bitcode files as primary cached items instead of archive files.
  • internal/build/cache_test.go
    • Added a test assertion for the newly introduced Bitcode path in TestCacheManager_PackagePaths.
    • Updated TestCacheManager_CacheExists to reflect the change in primary cached artifact from archive to bitcode.
    • Adjusted cache cleaning, listing, and stats tests (TestCacheManager_CleanPackageCache, TestCacheManager_CleanAllCache, TestCacheManager_ListCachedPackages, TestCacheManager_Stats) to interact with bitcode files.
  • internal/build/collect.go
    • Modified tryLoadFromCache to check for the existence of the bitcode file in the cache.
    • Adjusted tryLoadFromCache to load the bitcode file and optionally the native archive from cache.
    • Reworked saveToCache to save the bitcode file as the primary cached artifact, with the native archive as optional.
  • internal/build/collect_test.go
    • Updated TestTryLoadFromCache_ForceRebuild to align with the new bitcode-first caching strategy, using pkg.BitcodeFile and checking for .bc files.
    • Adjusted TestSaveToCache_Success to verify saving and existence of bitcode files in the cache.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and valuable change by switching the package build pipeline to a bitcode-first LTO flow. This will enable more aggressive cross-module optimizations. The changes are extensive, touching the build, caching, and compilation logic across several files. My review identifies a resource leak due to unclosed temporary files and suggests a couple of refactorings to improve code maintainability and performance.

mergedBitcode, err := mergeBitcodeFiles(moduleName, bitcodeFiles)
if err != nil {
return "", err
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The temporary file mergedBitcode created by mergeBitcodeFiles (when there's more than one input file) is not being cleaned up, leading to a resource leak. You should defer its removal after it's created.

	}
	if len(bitcodeFiles) > 1 {
		defer os.Remove(mergedBitcode)
	}

Comment on lines +598 to +607
func tempNamePrefix(moduleName string) string {
name := strings.ReplaceAll(moduleName, "/", "_")
name = strings.ReplaceAll(name, "\\", "_")
name = strings.ReplaceAll(name, ":", "_")
name = strings.ReplaceAll(name, ".", "_")
if name == "" {
return "llgo"
}
return name
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function can be made more efficient and readable by using strings.NewReplacer to perform all replacements in a single pass.

func tempNamePrefix(moduleName string) string {
	r := strings.NewReplacer("/", "_", "\\", "_", ":", "_", ".", "_")
	name := r.Replace(moduleName)
	if name == "" {
		return "llgo"
	}
	return name
}

Comment on lines +1099 to +1101
} else {
nativeLinkInputs = append(nativeLinkInputs, linkBitcodeInputs...)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When bitcodeLTO is false, raw .bc files are appended directly to nativeLinkInputs and passed to the native linker. The native linker cannot consume .bc without LTO support enabled on the clang driver side. Since bitcodeLTO is hardcoded to true (line 366), this is dead code today, but it's a latent correctness bug.

Consider either removing the else branch (and the field) entirely, or compiling each .bc to .o in the fallback path.

Comment on lines +636 to +645
out, err := os.CreateTemp("", tempNamePrefix(moduleName)+"-*.bc")
if err != nil {
return "", fmt.Errorf("create merged bitcode file: %w", err)
}
defer out.Close()

if err := gllvm.WriteBitcodeToFile(mod, out); err != nil {
return "", fmt.Errorf("write merged bitcode file: %w", err)
}
archiveFile.Close()
archivePath := archiveFile.Name()
return out.Name(), nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If WriteBitcodeToFile fails, the temp file created here is leaked — the caller receives "" and has no way to clean it up. Same pattern applies to compileLinkedBitcodeToObject (line 691) where if Compile fails, the temp .o file is left on disk.

Suggested change
out, err := os.CreateTemp("", tempNamePrefix(moduleName)+"-*.bc")
if err != nil {
return "", fmt.Errorf("create merged bitcode file: %w", err)
}
defer out.Close()
if err := gllvm.WriteBitcodeToFile(mod, out); err != nil {
return "", fmt.Errorf("write merged bitcode file: %w", err)
}
archiveFile.Close()
archivePath := archiveFile.Name()
return out.Name(), nil
out, err := os.CreateTemp("", tempNamePrefix(moduleName)+"-*.bc")
if err != nil {
return "", fmt.Errorf("create merged bitcode file: %w", err)
}
defer out.Close()
if err := gllvm.WriteBitcodeToFile(mod, out); err != nil {
os.Remove(out.Name())
return "", fmt.Errorf("write merged bitcode file: %w", err)
}

Comment on lines +626 to +633
for _, bitcodeFile := range bitcodeFiles[1:] {
srcMod, err := ctx.ParseBitcodeFile(bitcodeFile)
if err != nil {
return "", fmt.Errorf("parse bitcode %s: %w", bitcodeFile, err)
}
if err := gllvm.LinkModules(mod, srcMod); err != nil {
return "", fmt.Errorf("link bitcode module %s: %w", bitcodeFile, err)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If LinkModules fails, the source module srcMod may not have been consumed by LLVM and will be leaked. Consider disposing it on the error path:

Suggested change
for _, bitcodeFile := range bitcodeFiles[1:] {
srcMod, err := ctx.ParseBitcodeFile(bitcodeFile)
if err != nil {
return "", fmt.Errorf("parse bitcode %s: %w", bitcodeFile, err)
}
if err := gllvm.LinkModules(mod, srcMod); err != nil {
return "", fmt.Errorf("link bitcode module %s: %w", bitcodeFile, err)
}
for _, bitcodeFile := range bitcodeFiles[1:] {
srcMod, err := ctx.ParseBitcodeFile(bitcodeFile)
if err != nil {
return "", fmt.Errorf("parse bitcode %s: %w", bitcodeFile, err)
}
if err := gllvm.LinkModules(mod, srcMod); err != nil {
srcMod.Dispose()
return "", fmt.Errorf("link bitcode module %s: %w", bitcodeFile, err)
}
}

args := []string{"-o", objFile.Name(), "-c", mergedBitcode, "-Wno-override-module"}
if ctx.shouldPrintCommands(verbose) {
fmt.Fprintf(os.Stderr, "# compiling linked bitcode for %s\n", moduleName)
fmt.Fprintln(os.Stderr, "clang", args)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fmt.Fprintln(os.Stderr, "clang", args) prints the slice with Go formatting (brackets), e.g. clang [-o file.o -c ...]. Other verbose prints in this file use strings.Join(args, " ") for a copy-pasteable command. Same issue exists in exportObject at line 1417.

@xgopilot
Copy link
Copy Markdown
Contributor

xgopilot bot commented Mar 2, 2026

Review Summary

The architecture — merging per-package bitcode in-process via LLVM APIs then compiling one native object — is a clean approach. The separation of bitcode vs native inputs and the cache migration are handled well. A few items to address:

  1. Fallback path correctness: The bitcodeLTO=false branch passes raw .bc to the native linker, which won't work. Since the field is hardcoded true, consider removing the dead branch or fixing it.
  2. Resource leaks on error: mergeBitcodeFiles and compileLinkedBitcodeToObject leak temp files when write/compile fails. srcMod may leak if LinkModules fails.
  3. Debug print format: fmt.Fprintln(os.Stderr, "clang", args) prints Go slice formatting instead of a shell-pasteable command.

See inline comments for details and suggestions.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 2, 2026

Codecov Report

❌ Patch coverage is 80.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.97%. Comparing base (9432f9e) to head (dc0dfd3).
⚠️ Report is 60 commits behind head on main.

Files with missing lines Patch % Lines
ssa/eh.go 60.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1673      +/-   ##
==========================================
+ Coverage   91.35%   92.97%   +1.61%     
==========================================
  Files          47       47              
  Lines       12681    13185     +504     
==========================================
+ Hits        11585    12259     +674     
+ Misses        906      740     -166     
+ Partials      190      186       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@luoliwoshang luoliwoshang changed the title build: switch package pipeline to bitcode LTO linking build: switch package pipeline to bitcode(BC) linking Mar 2, 2026
@luoliwoshang
Copy link
Copy Markdown
Member Author

  • 其实比较纠结.a里面放bc这个事情,因为最终会需要在最后把bc重新拉出来做最后一次编译,如果放在.a里就会导致最后进行合并时需要多一层解包提取的处理,性能上感觉并不合算

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant