Copy-on-Write AnnotatedTypeMirror prototype#1837
Draft
wmdietl wants to merge 13 commits into
Draft
Conversation
Document the measured-flat result for migrating annotation-name dispatch from value-based String comparison (AnnotationUtils.annotationName) to javac Name identity comparison (annotationNameAsName + ==), framework-wide. The fully-realized form (CFAM caches its Name, annotationNameAsName has a CFAM fast path, isSupportedQualifier uses a companion Set<Name>) is within the run-to-run noise band on both allocation and wall clock across nullness and Value workloads. Root cause: CheckerFrameworkAnnotationMirror already caches the decoded interned name, so the toString()/intern() the change targets is not on the hot path. A ceiling profile on an annotation-saturated workload puts all name-handling frames at 0 samples; the hot leaf that does appear (IdentityHashMap.get -> getQualifierKind) is itself already rejected. Also add the "measure the ceiling on an adversarial workload" habit and a do-not-propose entry to the cf-performance skill. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…caches Replace the placeholder "scope-bounded expression-type cache" direction (#2 under the architectural getAnnotatedType item) with the measured result. Instrumented addComputedTypeAnnotations(Tree, ATM), splitting it at the useFlow boundary and recording per-tree redundancy plus the unsound fraction on all-systems (120 files) and a loop-heavy workload. Both forms are unsound. Caching the pre-flow (2a) type is 25% (all-systems) to 38% (loops) unstable because tree annotators transitively query flow-refined subexpression types. A phase-scoped full cache, restricted to the post-flow checking traversal (analysis.isRunning() false, frozen flowResult), is still 26% to 59% unstable: the same expression tree yields different types across repeated queries even after flow completes. The flow step itself is cheap, so the cost lives in the 2a computation whose result is the unstable thing; the stable trees are the cheap-to-recompute ones. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Extend the rejected getAnnotatedType expression-cache direction with the leaf-subset measurement. The hypothesis was that subexpression-free leaves (identifiers, literals) have a flow-independent prefix and so could be cached even where compound trees cannot. Instrumented, split by kind, checking-phase only. Rejected: in the checking phase the pre-flow leaf type is still unstable 22% (identifiers) / 40% (literals) on all-systems. The cause is context- dependence in which hierarchy's annotations are applied -- the same `1` literal returns `int` vs `@Initialized int` (nullness runs with the Initialization checker) -- so a tree-keyed cache would serve an under-annotated type. The cost is immaterial anyway: safely-cacheable checking-phase leaf 2a is ~1.5% of compile. getAnnotatedType is keyed by call context, not tree identity. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
My two earlier entries rejected the pre-flow ("split") and leaf-subset
getAnnotatedType caches as "25-59% unstable." That was a measurement
artifact: the per-tree signature used AnnotatedTypeMirror.toString(),
which conflates cross-hierarchy completeness (int vs @initialized int =
the Initialization annotation absent vs present) with real within-hierarchy
disagreement, and the comparison indexed getTopAnnotations() (an unstable-
order Set) positionally, misaligning hierarchies across calls.
Re-measured per hierarchy (keyed by top-annotation name; present-disagree
and completeness counted separately): value leaves are 0% unstable / 0%
incomplete over ~70k (all-systems) and ~208k (loops) repeats; compound
expressions ~0.46% / 0%; type-name identifiers ~0.7%. The pre-flow type is
stable; flow-dependence is confined to step 2b, as the split assumed. So
the split cache is a sound candidate (value still pending a built-cache
A/B). Reclassify accordingly and record the per-hierarchy measurement
method in the skill so the toString trap is not repeated.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…rement My prior correction reached the right conclusion (the pre-flow split cache is a sound candidate) but blamed the wrong cause: "unstable getTopAnnotations iteration order." That is wrong. getTopAnnotations returns a build-once, ArrayList-backed AnnotationMirrorSet over a TreeMap<QualifierKind,...> -- already a stable, deterministic order. The actual confound was the instrumentation: a static per-Tree map shared across the FOUR sub-factories a nullness compile runs (NullnessNoInit, Initialization, InitializationFieldAccess, KeyFor), so it compared one type system's snapshot of a tree against another's. Re-measured with a per-factory (instance) map: value leaves are still 0% disagree / 0% completeness (58,447 / 130,894 repeats); compound 0.69% / 0. The candidate stands; the explanation is now correct. Skill updated with both measurement traps (toString completeness conflation; static Tree-keyed state across factories). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ject it Resolve the "candidate, pending A/B" status of the pre-flow split cache by actually building it and measuring. A working per-factory cache (frozen pre-flow types, populated in addComputedTypeAnnotations, served by an overridden getAnnotatedType that re-applies only the flow step, toggled by -Dcf.preflowcache) was implemented and tested. Rejected for two independent reasons: - Unsound across checkers. IndexTest fails (PredecrementTest: an extra array.access.unsafe.low). UpperBound/LowerBound/Optional/Lock/Signedness override addComputedTypeAnnotations to add flow-dependent annotations after super; intercepting at getAnnotatedType bypasses that subclass logic on a hit. A nullness-only diagnostic diff was clean, which is why nullness validation missed it. - Flat where sound. Deterministic A/B (nullness): all-systems alloc -1.6% / wall ~0; loops alloc +2.0% (worse) / wall -4.7%. The per-hit deepCopy+freeze overhead cancels the saved 2a work; the projected 4-15% did not appear. Update the skill: stability != cacheable; a getAnnotatedType cache must compose with subclass addComputedTypeAnnotations overrides; validate on the override-checkers (Index/Lock/Optional/Signedness), not just nullness. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…index Move the pre-flow split cache from "Open, low-value" to "Closed, do not re-propose" now that it has been built and rejected. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…-2 cost) Profiled the Nullness Checker on the Guava `guava` module (625 files, 7,814 JFR samples) -- the first hotspot trace of a large heavy-generics codebase (Guava is the 994-work-unit case used only for correctness/budget so far). Leaf self-time generalizes: same flat profile as checkNullness, no CF leaf above 3.85% (IdentityHashMap lookups, ATM traversal, apiComplete, defaulting, AnnotatedTypeCopier, createType). New fact: type-argument inference is ~23% inclusive (methodFromUse 24.6% -> inferTypeArgs 23.0% -> reduceOneStep 14.1%), where it is negligible on all-systems/checkNullness -- but 50% of that is getAnnotatedType and 28% is the already-shipped #1829 incorporation, so it reinforces the Short list rather than opening a new lever. Note Guava (or jspecify-conformance) as the venue for any future inference work. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Editorial pass for review: rewrite the pre-flow split-cache entry and the companion skill lesson to present the final understanding coherently (built & rejected; stability != cacheable; the two measurement traps; getTopAnnotations is already stable), dropping the incremental "prior revision of this entry" framing. No new findings; content-equivalent. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Prototype of copy-on-write that unblocks the immutability allocation win (return shared frozen cache masters instead of deepCopy()), under -Dcf.cow. Design: the 6 post-pipeline caches return cowCopy() (a non-frozen shallow copy sharing the master's frozen children) instead of deepCopy(); the ~13 child accessors and the three fixupBoundAnnotations lazily unshare a frozen child of a non-frozen parent (cowChild/cowChildren); a per-node cowDirty flag keeps the COW scan off the hot path for non-cache types. Validated: ElementTypeCacheWildcardBound, all-systems 269 (identical diagnostics), full Guava nullness build (BUILD SUCCESS, 0 crashes). Measured (all-systems 269): allocation -4.8%, wall +5.3%. Correctness-complete but wall-negative; off by default. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… win Two wall optimizations attempted on the COW prototype: 1. Gate every child accessor with cowActive() (= COW && cowDirty) so non-cache types skip the cowChild call and field write. Removed the field-write cost but did not move wall (the cost was not the write). 2. Flip only elementTypeCache (read-only-majority consumers) instead of all six caches, reverting the full-walk caches (element/fromMember/fromExpression/ fromType/methodAsMemberOf) to deepCopy. Results (all-systems 269): all-six COW = alloc -4.8% / wall +5.3%; elementTypeCache-only = alloc -0.2% / wall +1.8% (noise). No configuration is wall-positive. Structural reasons: the copier is already cheap (~1-2%), so eliminating it cannot beat the global per-accessor cowActive() tax; and the cache consumers fully walk the result (defaulting/annotators), so piecemeal per-child cowCopy is slower than one batched deepCopy and the read-only-skip benefit never materializes. COW is a memory/GC-pressure optimization; for wall clock, the existing deepCopy is already optimal. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Restore all six cache flips on top of the cowActive() gating, so this is the max-allocation-reduction reference (alloc -4.8%). Tested the "downstream / GC-bound timing win" hypothesis on all-systems 269 across a heap sweep. The wall penalty shrinks as the heap tightens (-Xmx512m +8.1%, 320m +0.5%, 256m +0.1%), BUT a clean GC measurement at 256m shows COW does NOT reduce GC: 0.77s vs 0.76s pauses, 225 collections both. So the gap-closing is measurement variance, not GC savings -- the -4.8% allocation does not convert to fewer collections. CF compilation is CPU-bound (~96% on-CPU, GC <=4% even at scale per the notes), so the allocation reduction has no wall payoff (GC ceiling ~4%, COW captures none of it) and cannot offset the +5% CPU tax. No single-compile or downstream timing win from COW. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This experiment is already described in the performance-notes.md file:
Opening just for completeness, will be abandoned.