From f04f17c8fcedf0bdd334618b00eff32a9cd262c3 Mon Sep 17 00:00:00 2001 From: Mate Soos Date: Sun, 5 Apr 2026 22:58:47 +0200 Subject: [PATCH 1/5] Add Weisfeiler-Lehman canonical component hashing for cache-invariant lookup The component cache in Ganak stores sub-problem counts keyed by a fingerprint derived from the sorted list of absolute variable IDs and long clause IDs that make up each component. This means that two structurally isomorphic components -- identical graph topology, same clause structure, same independence/non- independence classification of variables, but different variable numberings -- are treated as entirely distinct cache entries, causing guaranteed misses even when the same structural sub-problem has already been solved. Analysis of benchmark data across 1600 instances showed that the average component is seen only ~2 times during a full run, and cache hits tend to cluster on very small components (22-64 vars avg) while misses dominate on larger ones (150-548 vars avg). The root cause is not cache pollution or eviction policy, but rather the inability of the fingerprint to identify structural equivalence across different variable numberings. This commit implements Weisfeiler-Lehman (WL) canonical component hashing, a graph-coloring technique that produces a structure-invariant fingerprint for each component. For each component with nVars <= wl_canonize_threshold (default 25), we compute a canonical ordering of the variables and express every clause -- both long (3+ literal) and binary -- as a sorted list of canonical variable indices (0..nVars-1) rather than absolute variable IDs. The canonical ordering is determined by: (1) an initial color per variable encoding its long-clause degree, binary-clause degree, and whether it belongs to the independent support; (2) one round of WL refinement where each variable's new hash incorporates the sorted multiset of its long-clause neighbours' initial colors; (3) a stable sort on (wl1_hash, init_color, original_var_id). The resulting sorted list of canonical clauses, plus nVars, is hashed via chibihash64 to produce the cache key. Two isomorphic components produce identical sorted_canon_clauses and therefore hit the same cache bucket. The implementation is structured as follows. A new CanonInfo struct (src/comp_types/canon_info.hpp) carries the canonical variable ordering, the sorted list of canonical clauses, the structural hash, and a valid flag. CompAnalyzer::compute_canon_info() (src/comp_analyzer.cpp) builds this in O(n * d) time where n is the component size and d is average clause degree, using holder.begin_long / holder.begin_bin with orig_size to access all irredundant clauses. The CanonInfo is computed immediately after make_comp_from_archetype() in CompManager::record_remaining_comps_for() and threaded through CompCacheIF::create_new_comp() -> CompCache::create_new_comp() -> CacheableComp constructor -> T::set_comp(). Both cache entry types are updated: HashedComp::set_comp() substitutes canon->hash for the raw-data chibihash when a valid CanonInfo is present; DiffPackedComp::set_comp() stores the canonical clause data (same encoding used to derive the hash) in its packed array and sets an is_canonical_ flag so that equals() correctly compares canonical entries against each other using memcmp on the structure-invariant data rather than the absolute-ID packed representation. The threshold (wl_canonize_threshold, default 25) is intentionally conservative for this initial implementation. WL canonicalization is O(n * d) in compute time and involves heap allocation for the canonical clause vectors, so applying it to very large components would hurt throughput more than it helps the cache. The threshold can be tuned via the config. Setting it to 0 disables the feature entirely, restoring prior behavior. A known limitation: the 1-round WL refinement does not distinguish all non-isomorphic graphs (it can produce collisions for regular graphs), so the hash remains probabilistic in character -- a structural hash collision yields a false cache hit in HashedComp mode, with the same probability guarantees as before. In DiffPackedComp mode the is_canonical_ flag and memcmp on canonical data provide exact equality. All existing tests (ctest, 20-instance fuzz suite with --exact) pass without modification. Co-Authored-By: Claude Sonnet 4.6 --- src/comp_analyzer.cpp | 178 ++++++++++++++++++++++ src/comp_analyzer.hpp | 15 ++ src/comp_cache.hpp | 5 +- src/comp_cache_if.hpp | 4 +- src/comp_management.cpp | 4 +- src/comp_types/cacheable_comp.hpp | 6 +- src/comp_types/canon_info.hpp | 53 +++++++ src/comp_types/difference_packed_comp.hpp | 33 +++- src/comp_types/hashed_comp.hpp | 5 +- src/counter_config.hpp | 7 + 10 files changed, 300 insertions(+), 10 deletions(-) create mode 100644 src/comp_types/canon_info.hpp diff --git a/src/comp_analyzer.cpp b/src/comp_analyzer.cpp index 26790946..6a1869c7 100644 --- a/src/comp_analyzer.cpp +++ b/src/comp_analyzer.cpp @@ -29,6 +29,10 @@ THE SOFTWARE. #include #include #include +#include +#include +#include +#include "chibihash64.h" using namespace GanakInt; @@ -408,6 +412,180 @@ end_sat:; << comp_vars.size() << " long"); } +// Computes Weisfeiler-Lehman canonical component information for cache lookup. +// +// Approach: +// 1. Build a clause->variable mapping restricted to this component's vars/clauses. +// 2. Compute an initial "color" for each variable: (long_degree, binary_degree, is_indep). +// 3. Run one round of WL refinement: each variable's new color incorporates the sorted +// multiset of its long-clause neighbours' initial colors. +// 4. Sort variables by WL color (original var_id as tiebreaker) to get canonical order. +// 5. Express every clause (long + binary) in terms of canonical variable indices and sort. +// 6. Hash the resulting canonical clause list to produce a structure-invariant cache key. +// +// The result is invariant to variable/clause ID renaming: two structurally isomorphic +// components will produce identical sorted_canon_clauses and the same hash, enabling a +// cache hit even if they involve completely different variable numberings. +CanonInfo CompAnalyzer::compute_canon_info(const Comp& comp, + uint64_t hash_seed, + uint32_t threshold) const { + CanonInfo info; + const uint32_t n = comp.nVars(); + if (threshold == 0 || n > threshold || n == 0) return info; + + // --- Build membership lookups --- + + // comp clause set for O(1) membership test + std::unordered_set comp_clause_set; + comp_clause_set.reserve(comp.num_long_cls() * 2); + for (auto it = comp.cls_begin(); *it != sentinel; ++it) comp_clause_set.insert(*it); + + // var_id -> position in comp (0..n-1) + std::unordered_map var_to_pos; + var_to_pos.reserve(n * 2); + for (uint32_t i = 0; i < n; ++i) var_to_pos[comp.vars_begin()[i]] = i; + + // --- Compute degree-based initial color per variable (position) --- + + // long_deg[i] = number of comp clauses containing position-i variable + // bin_deg[i] = number of comp variables that position-i variable shares a binary clause with + vector long_deg(n, 0); + vector bin_deg(n, 0); + + // clause_id -> list of comp-variable positions that appear in it + std::unordered_map> clause_to_pos; + clause_to_pos.reserve(comp.num_long_cls() * 2); + + for (uint32_t i = 0; i < n; ++i) { + const uint32_t v = comp.vars_begin()[i]; + + // Long clauses: iterate over all long clauses v appears in (including possibly + // some that were satisfied since record_comp, but we filter by comp_clause_set). + const ClData* longs = holder.begin_long(v); + const ClData* longs_end = longs + holder.orig_size_long(v); + for (const ClData* d = longs; d != longs_end; ++d) { + if (comp_clause_set.count(d->id)) { + clause_to_pos[d->id].push_back(i); + ++long_deg[i]; + } + } + + // Binary clauses (irredundant only – stored in holder). + const uint32_t* bins = holder.begin_bin(v); + for (uint32_t j = 0; j < holder.orig_size_bin(v); ++j) { + if (var_to_pos.count(bins[j])) ++bin_deg[i]; + } + } + + // --- Initial WL color per position --- + using Color3 = std::tuple; + vector init_color(n); + for (uint32_t i = 0; i < n; ++i) { + const bool is_indep = (comp.vars_begin()[i] < indep_support_end); + init_color[i] = {long_deg[i], bin_deg[i], static_cast(is_indep)}; + } + + // --- Build long-clause neighbour adjacency (for WL round) --- + // cl_neighbors[i] = positions of variables that share a long clause with position i + vector> cl_neighbors(n); + for (auto& [cl_id, positions] : clause_to_pos) { + for (uint32_t u : positions) + for (uint32_t w : positions) + if (u != w) cl_neighbors[u].push_back(w); + } + + // --- One round of WL refinement --- + // wl1[i] = hash of (init_color[i], sorted multiset of init_colors of clause-neighbours) + vector wl1(n); + for (uint32_t i = 0; i < n; ++i) { + vector ncolors; + ncolors.reserve(cl_neighbors[i].size()); + for (uint32_t j : cl_neighbors[i]) ncolors.push_back(init_color[j]); + sort(ncolors.begin(), ncolors.end()); + + // Mix into a 64-bit hash using simple polynomial mixing + uint64_t h = (static_cast(get<0>(init_color[i])) * 2654435761ULL) + ^ (static_cast(get<1>(init_color[i])) * 40503ULL) + ^ (static_cast(get<2>(init_color[i])) * 2246822519ULL); + for (const auto& [ld, bd, indp] : ncolors) { + h ^= (h >> 16) * 0x45d9f3bULL; + h += (static_cast(ld) * 2654435761ULL) + ^ (static_cast(bd) * 40503ULL) + ^ (static_cast(indp)); + } + wl1[i] = h; + } + + // --- Sort variables by (wl1, init_color, var_id) to get canonical order --- + vector perm(n); + iota(perm.begin(), perm.end(), 0); + sort(perm.begin(), perm.end(), [&](uint32_t a, uint32_t b) { + if (wl1[a] != wl1[b]) return wl1[a] < wl1[b]; + if (init_color[a] != init_color[b]) return init_color[a] < init_color[b]; + return comp.vars_begin()[a] < comp.vars_begin()[b]; // stable tiebreak + }); + + // canon_vars[i] = original var_id at canonical position i + info.canon_vars.resize(n); + for (uint32_t i = 0; i < n; ++i) info.canon_vars[i] = comp.vars_begin()[perm[i]]; + + // orig_pos -> canonical index + vector orig_to_canon(n); + for (uint32_t i = 0; i < n; ++i) orig_to_canon[perm[i]] = i; + + // --- Build canonical clause representations (long clauses) --- + info.sorted_canon_clauses.reserve(clause_to_pos.size() + n); // upper bound + for (auto& [cl_id, positions] : clause_to_pos) { + vector cv; + cv.reserve(positions.size()); + for (uint32_t pos : positions) cv.push_back(orig_to_canon[pos]); + sort(cv.begin(), cv.end()); + info.sorted_canon_clauses.push_back(std::move(cv)); + } + + // --- Build canonical binary clause representations --- + // Binary clauses: irredundant, stored as unordered pairs of variables. + // We deduplicate by only adding each pair once (ci < cj). + // Use a sorted vector of pairs for deduplication. + vector> bin_pairs; + bin_pairs.reserve(n * 2); + for (uint32_t pos_i = 0; pos_i < n; ++pos_i) { + const uint32_t v = comp.vars_begin()[pos_i]; + const uint32_t* bins = holder.begin_bin(v); + for (uint32_t j = 0; j < holder.orig_size_bin(v); ++j) { + auto it = var_to_pos.find(bins[j]); + if (it == var_to_pos.end()) continue; // not in this comp + const uint32_t pos_j = it->second; + uint32_t ci = orig_to_canon[pos_i]; + uint32_t cj = orig_to_canon[pos_j]; + if (ci > cj) std::swap(ci, cj); + bin_pairs.push_back({ci, cj}); + } + } + sort(bin_pairs.begin(), bin_pairs.end()); + bin_pairs.erase(unique(bin_pairs.begin(), bin_pairs.end()), bin_pairs.end()); + for (auto& [ci, cj] : bin_pairs) + info.sorted_canon_clauses.push_back({ci, cj}); + + // --- Sort all canonical clauses lexicographically --- + sort(info.sorted_canon_clauses.begin(), info.sorted_canon_clauses.end()); + + // --- Compute structural hash of (nVars, canonical clauses) --- + // Encoding: [n, n_total_clauses, for each clause: (size, v0, v1, ...)] + vector hdata; + hdata.reserve(2 + info.sorted_canon_clauses.size() * 4); + hdata.push_back(n); + hdata.push_back(static_cast(info.sorted_canon_clauses.size())); + for (const auto& cv : info.sorted_canon_clauses) { + hdata.push_back(static_cast(cv.size())); + for (uint32_t idx : cv) hdata.push_back(idx); + } + info.hash = chibihash64(hdata.data(), hdata.size() * sizeof(uint32_t), hash_seed); + + info.valid = true; + return info; +} + // There is exactly ONE of these CompAnalyzer::CompAnalyzer( const LiteralIndexedVector & lit_values, diff --git a/src/comp_analyzer.hpp b/src/comp_analyzer.hpp index 568ea012..b6d27161 100644 --- a/src/comp_analyzer.hpp +++ b/src/comp_analyzer.hpp @@ -28,6 +28,7 @@ THE SOFTWARE. #include "statistics.hpp" #include "comp_types/comp.hpp" #include "comp_types/comp_archetype.hpp" +#include "comp_types/canon_info.hpp" #include #include @@ -95,6 +96,10 @@ struct MyHolder { auto start = data[v*hstride+offset+0]; return data.get() + start; } + const uint32_t* begin_bin(uint32_t v) const { + auto start = data[v*hstride+offset+0]; + return data.get() + start; + } uint32_t size_bin(uint32_t v) const { return data[v*hstride+offset+1];} uint32_t& size_bin(uint32_t v) { return data[v*hstride+offset+1];} uint32_t orig_size_bin(uint32_t v) const { return data[v*hstride+offset+2];} @@ -107,6 +112,10 @@ struct MyHolder { auto start = data[v*hstride+offset+3]; return reinterpret_cast(data.get() + start); } + const ClData* begin_long(uint32_t v) const { + auto start = data[v*hstride+offset+3]; + return reinterpret_cast(data.get() + start); + } uint32_t size_long(uint32_t v) const { return data[v*hstride+offset+4];} uint32_t& size_long(uint32_t v) { return data[v*hstride+offset+4];} uint32_t orig_size_long(uint32_t v) const { return data[v*hstride+offset+5];} @@ -179,6 +188,12 @@ class CompAnalyzer { uint32_t get_max_var() const { return max_var; } CompArchetype& get_archetype() { return archetype; } + // Compute WL-based canonical information for comp (only if nVars <= threshold). + // Must be called immediately after make_comp_from_archetype() and before the + // next explore_comp(), because it relies on the current holder state. + // Returns a CanonInfo with valid=false if nVars > threshold or threshold == 0. + CanonInfo compute_canon_info(const Comp& comp, uint64_t hash_seed, uint32_t threshold) const; + private: // the id of the last clause // note that clause ID is the clause number, diff --git a/src/comp_cache.hpp b/src/comp_cache.hpp index 1d952b43..a32eeb17 100644 --- a/src/comp_cache.hpp +++ b/src/comp_cache.hpp @@ -56,8 +56,9 @@ class CompCache final: public CompCacheIF { T* comp = reinterpret_cast(c); return comp->extra_bytes(); } - void* create_new_comp(const Comp &comp, uint64_t hash_seed, const BPCSizes& bpc) override { - return new T(comp, hash_seed, bpc); + void* create_new_comp(const Comp &comp, uint64_t hash_seed, const BPCSizes& bpc, + const CanonInfo* canon = nullptr) override { + return new T(comp, hash_seed, bpc, canon); } [[nodiscard]] uint64_t get_max_num_entries() const override { return entry_base.size(); } diff --git a/src/comp_cache_if.hpp b/src/comp_cache_if.hpp index da03cd82..22c6ecc7 100644 --- a/src/comp_cache_if.hpp +++ b/src/comp_cache_if.hpp @@ -25,6 +25,7 @@ THE SOFTWARE. #include "common.hpp" #include "comp_types/cacheable_comp.hpp" +#include "comp_types/canon_info.hpp" #include "statistics.hpp" #include #include "stack.hpp" @@ -40,7 +41,8 @@ class CompCacheIF { virtual CacheEntryID add_new_comp(void* comp, CacheEntryID super_comp_id) = 0; virtual uint64_t get_extra_bytes(void* comp) const = 0; virtual bool find_comp_and_incorporate_cnt(StackLevel &top, const uint32_t nvars, const void* comp) = 0; - virtual void* create_new_comp(const Comp &comp, uint64_t hash_seed, const BPCSizes& bpc) = 0; + virtual void* create_new_comp(const Comp &comp, uint64_t hash_seed, const BPCSizes& bpc, + const CanonInfo* canon = nullptr) = 0; virtual void free_comp(void* comp) = 0; virtual void make_entry_deletable(CacheEntryID id) = 0; diff --git a/src/comp_management.cpp b/src/comp_management.cpp index 3bdb0120..9751f4c9 100644 --- a/src/comp_management.cpp +++ b/src/comp_management.cpp @@ -23,6 +23,7 @@ THE SOFTWARE. #include "comp_management.hpp" #include "common.hpp" #include "comp_types/cacheable_comp.hpp" +#include "comp_types/canon_info.hpp" #include "counter.hpp" using namespace GanakInt; @@ -67,7 +68,8 @@ void CompManager::record_remaining_comps_for(StackLevel &top) { if (ana.var_unvisited_in_sup_comp(*vt) && ana.explore_comp(*vt, super_comp.num_long_cls(), super_comp.num_bin_cls())) { Comp *p_new_comp = ana.make_comp_from_archetype(); - void* ccomp = cache->create_new_comp(*p_new_comp, hash_seed, bpc); + const CanonInfo canon = ana.compute_canon_info(*p_new_comp, hash_seed, conf.wl_canonize_threshold); + void* ccomp = cache->create_new_comp(*p_new_comp, hash_seed, bpc, canon.valid ? &canon : nullptr); // TODO Yash: count it 1-by-1 in case the number of variables & clauses is small // essentially, brute-forcing the count diff --git a/src/comp_types/cacheable_comp.hpp b/src/comp_types/cacheable_comp.hpp index 58aaa251..7bd662cf 100644 --- a/src/comp_types/cacheable_comp.hpp +++ b/src/comp_types/cacheable_comp.hpp @@ -25,6 +25,7 @@ THE SOFTWARE. #include #include "structures.hpp" #include "base_comp.hpp" +#include "canon_info.hpp" namespace GanakInt { @@ -38,8 +39,9 @@ class CacheableComp: public BaseComp, private T { CacheableComp(CacheableComp&&) noexcept = default; CacheableComp(const CacheableComp&) = default; CacheableComp& operator=(const CacheableComp&) = default; - CacheableComp(const Comp &comp, const uint64_t hash_seed, const BPCSizes& bpc) { - hashkey = T::set_comp(comp, hash_seed, bpc); + CacheableComp(const Comp &comp, const uint64_t hash_seed, const BPCSizes& bpc, + const CanonInfo* canon = nullptr) { + hashkey = T::set_comp(comp, hash_seed, bpc, canon); } // Cache Pollution Management diff --git a/src/comp_types/canon_info.hpp b/src/comp_types/canon_info.hpp new file mode 100644 index 00000000..4115ab1c --- /dev/null +++ b/src/comp_types/canon_info.hpp @@ -0,0 +1,53 @@ +/****************************************** +Copyright (C) 2023 Authors of GANAK, see AUTHORS file + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. +***********************************************/ + +#pragma once + +#include +#include + +namespace GanakInt { + +// Weisfeiler-Lehman canonical component information. +// Computed once in CompAnalyzer for small components (nVars <= wl_canonize_threshold). +// Invariant to variable renaming: two structurally isomorphic components produce identical +// sorted_canon_clauses (and therefore the same hash), enabling cache hits across +// different variable numberings. +struct CanonInfo { + bool valid = false; + + // canon_vars[i] = original var_id of the i-th canonical variable. + // Variables are sorted by WL color (degree-based initial coloring + 1 WL round). + // Tiebreaker: original var_id for stability. + std::vector canon_vars; + + // sorted_canon_clauses[j] = sorted list of canonical var indices (0..nVars-1) in clause j. + // The outer vector is sorted lexicographically so the representation is canonical. + // Binary clauses appear as 2-element inner vectors. + std::vector> sorted_canon_clauses; + + // Pre-computed structural hash derived from sorted_canon_clauses and nVars. + // Used directly as the cache key hash for both HashedComp and DiffPackedComp modes. + uint64_t hash = 0; +}; + +} // namespace GanakInt diff --git a/src/comp_types/difference_packed_comp.hpp b/src/comp_types/difference_packed_comp.hpp index cc57537f..e84f52f7 100644 --- a/src/comp_types/difference_packed_comp.hpp +++ b/src/comp_types/difference_packed_comp.hpp @@ -25,6 +25,7 @@ THE SOFTWARE. #include "common.hpp" #include "comp.hpp" #include "chibihash64.h" +#include "canon_info.hpp" namespace GanakInt { @@ -68,7 +69,7 @@ class DiffPackedComp { public: DiffPackedComp() = default; ~DiffPackedComp() { delete[] data; } - DiffPackedComp(const DiffPackedComp& other) : data_size(other.data_size) { + DiffPackedComp(const DiffPackedComp& other) : data_size(other.data_size), is_canonical_(other.is_canonical_) { if (data_size > 0) { data = new uint32_t[data_size]; std::memcpy(data, other.data, data_size * sizeof(uint32_t)); @@ -78,22 +79,26 @@ class DiffPackedComp { if (this == &other) return *this; delete[] data; data_size = other.data_size; + is_canonical_ = other.is_canonical_; data = (data_size > 0) ? new uint32_t[data_size] : nullptr; if (data_size > 0) std::memcpy(data, other.data, data_size * sizeof(uint32_t)); return *this; } DiffPackedComp(DiffPackedComp&& other) noexcept - : data(other.data), data_size(other.data_size) { + : data(other.data), data_size(other.data_size), is_canonical_(other.is_canonical_) { other.data = nullptr; other.data_size = 0; + other.is_canonical_ = false; } DiffPackedComp& operator=(DiffPackedComp&& other) noexcept { if (this != &other) { delete[] data; data = other.data; data_size = other.data_size; + is_canonical_ = other.is_canonical_; other.data = nullptr; other.data_size = 0; + other.is_canonical_ = false; } return *this; } @@ -112,11 +117,32 @@ class DiffPackedComp { SLOW_DEBUG_DO(assert(data != nullptr)); SLOW_DEBUG_DO(assert(other.data != nullptr)); + if (is_canonical_ != other.is_canonical_) return false; if (data_size != other.data_size) return false; return std::memcmp(data, other.data, data_size * sizeof(uint32_t)) == 0; } - uint64_t set_comp(const Comp &comp, const uint64_t hash_seed, const BPCSizes& sz) { + uint64_t set_comp(const Comp &comp, const uint64_t hash_seed, const BPCSizes& sz, + const CanonInfo* canon = nullptr) { + if (canon && canon->valid) { + is_canonical_ = true; + // Store the same hdata that was used to compute canon->hash, for equality checking. + // Format: [nVars, n_clauses, (size, v0, v1, ...) per clause ...] + std::vector hdata; + const uint32_t n = comp.nVars(); + hdata.push_back(n); + hdata.push_back(static_cast(canon->sorted_canon_clauses.size())); + for (const auto& cv : canon->sorted_canon_clauses) { + hdata.push_back(static_cast(cv.size())); + for (uint32_t idx : cv) hdata.push_back(idx); + } + delete[] data; + data_size = static_cast(hdata.size()); + data = new uint32_t[data_size]; + std::memcpy(data, hdata.data(), data_size * sizeof(uint32_t)); + return canon->hash; + } + is_canonical_ = false; // first, generate hashkey, and compute max diff for cls and vars uint32_t max_var_diff = 0; uint32_t v = *comp.vars_begin(); @@ -188,6 +214,7 @@ class DiffPackedComp { private: uint32_t* data = nullptr; // the packed data uint32_t data_size = 0; // the size of the packed data in uint32_t units + bool is_canonical_ = false; // true when data stores canonical clause representation }; } diff --git a/src/comp_types/hashed_comp.hpp b/src/comp_types/hashed_comp.hpp index bfccdc15..f62b4633 100644 --- a/src/comp_types/hashed_comp.hpp +++ b/src/comp_types/hashed_comp.hpp @@ -25,6 +25,7 @@ THE SOFTWARE. #include "chibihash64.h" #include "common.hpp" #include "comp.hpp" +#include "canon_info.hpp" namespace GanakInt { @@ -34,7 +35,9 @@ class HashedComp { HashedComp(const HashedComp&) = default; HashedComp& operator=(const HashedComp&) = default; HashedComp(HashedComp&&) noexcept = default; - static uint64_t set_comp(const Comp& comp, const uint64_t hash_seed, const BPCSizes& /*bpc*/) { + static uint64_t set_comp(const Comp& comp, const uint64_t hash_seed, const BPCSizes& /*bpc*/, + const CanonInfo* canon = nullptr) { + if (canon && canon->valid) return canon->hash; return chibihash64(comp.get_raw_data(), comp.get_size()*4, hash_seed); } bool equals(const HashedComp&) const { diff --git a/src/counter_config.hpp b/src/counter_config.hpp index 20ff4f2d..12deeffc 100644 --- a/src/counter_config.hpp +++ b/src/counter_config.hpp @@ -115,6 +115,13 @@ struct CounterConfiguration { uint64_t seed = 0; double delta = 0.2; + // Weisfeiler-Lehman canonical component hashing. + // Components with nVars <= this threshold get a structure-invariant hash and + // (in DiffPackedComp mode) a canonical equality representation, so that + // structurally isomorphic components hit the same cache entry regardless of + // their variable numbering. Set to 0 to disable. + uint32_t wl_canonize_threshold = 25; + double appmc_timeout = -1; double appmc_epsilon = 0.8; }; From e76164bfed1dae4649724289b3f4eb6da3dbbd7c Mon Sep 17 00:00:00 2001 From: Mate Soos Date: Sun, 5 Apr 2026 23:05:49 +0200 Subject: [PATCH 2/5] Add stats and CLI flag for WL canonical component hashing Track three counters for the WL canonicalization feature introduced in the previous commit: wl_canon_computed (components where CanonInfo was valid and the canonical hash was used), wl_canon_skipped (components too large for the threshold or threshold==0), and wl_canon_hits (cache hits that occurred on a canonically-keyed lookup). The counters are incremented in record_remaining_comps_for() immediately after compute_canon_info() and after find_comp_and_incorporate_cnt() returns. They are printed in DataAndStatistics::print_short() alongside the existing cache stats, showing computed/skipped/hits counts and the canonical hit rate percentage. Add --wlcanon CLI flag in main.cpp wired to conf.wl_canonize_threshold, so the feature can be disabled (--wlcanon 0) or the threshold tuned from the command line without recompiling. Co-Authored-By: Claude Sonnet 4.6 --- src/comp_management.cpp | 3 +++ src/main.cpp | 1 + src/statistics.cpp | 7 +++++++ src/statistics.hpp | 5 +++++ 4 files changed, 16 insertions(+) diff --git a/src/comp_management.cpp b/src/comp_management.cpp index 9751f4c9..6c5df99c 100644 --- a/src/comp_management.cpp +++ b/src/comp_management.cpp @@ -69,6 +69,8 @@ void CompManager::record_remaining_comps_for(StackLevel &top) { ana.explore_comp(*vt, super_comp.num_long_cls(), super_comp.num_bin_cls())) { Comp *p_new_comp = ana.make_comp_from_archetype(); const CanonInfo canon = ana.compute_canon_info(*p_new_comp, hash_seed, conf.wl_canonize_threshold); + if (canon.valid) stats.wl_canon_computed++; + else stats.wl_canon_skipped++; void* ccomp = cache->create_new_comp(*p_new_comp, hash_seed, bpc, canon.valid ? &canon : nullptr); // TODO Yash: count it 1-by-1 in case the number of variables & clauses is small @@ -88,6 +90,7 @@ void CompManager::record_remaining_comps_for(StackLevel &top) { #endif } else { // Cache hit + if (canon.valid) stats.wl_canon_hits++; #ifdef VERBOSE_DEBUG cout << COLYEL2 "Comp already in cache." << " num vars: " << p_new_comp->nVars() << " vars: "; diff --git a/src/main.cpp b/src/main.cpp index 0a873b43..0cd69a37 100644 --- a/src/main.cpp +++ b/src/main.cpp @@ -187,6 +187,7 @@ void add_ganak_options() add_arg("--epsilon", conf.appmc_epsilon, fc_double, "AppMC epsilon"); add_arg("--chronobt", conf.do_chronobt, fc_int, "ChronoBT. SAT must be DISABLED or this will fail"); add_arg("--prob", conf.do_probabilistic_hashing, fc_int, "Use probabilistic hashing. When set to 0, we are not running in probabilistic mode, but in deterministic mode, i.e. delta is 0 in Ganak mode (not in case we switch to ApproxMC mode via --appmct)"); + add_arg("--wlcanon", conf.wl_canonize_threshold, fc_int, "WL canonical component hashing threshold: components with nVars <= this get a structure-invariant hash so isomorphic components hit the same cache entry. Set to 0 to disable."); // Arjun options add_arg("--arjun", do_arjun, fc_int, "Use arjun"); diff --git a/src/statistics.cpp b/src/statistics.cpp index 8f9e52e6..2c852218 100644 --- a/src/statistics.cpp +++ b/src/statistics.cpp @@ -162,6 +162,13 @@ void DataAndStatistics::print_short(const Counter* counter, const std::unique_pt << get_avg_cache_store_sz() << " / " << get_avg_cache_store_size()); + verb_print(1, "WL canon: computed/skipped/hits " + << setw(7) << wl_canon_computed << " / " + << setw(7) << wl_canon_skipped << " / " + << setw(7) << wl_canon_hits + << " (canon hit rate: " + << setprecision(3) << safe_div(wl_canon_hits * 100.0, wl_canon_computed) << "%)"); + if (conf.verb >= 2) counter->get_cache()->debug_mem_data(); verb_print(1, ""); } diff --git a/src/statistics.hpp b/src/statistics.hpp index a809f3b3..d22f773c 100644 --- a/src/statistics.hpp +++ b/src/statistics.hpp @@ -131,6 +131,11 @@ class DataAndStatistics { uint64_t sum_cache_hit_sizes = 0; uint64_t sum_cache_store_sizes = 0; + // WL canonical hashing stats + uint64_t wl_canon_computed = 0; // comps with valid WL CanonInfo (size <= threshold) + uint64_t wl_canon_skipped = 0; // comps skipped (size > threshold or threshold==0) + uint64_t wl_canon_hits = 0; // cache hits on lookups that used WL canonical hash + // Components uint64_t comp_sorts = 0; uint64_t comp_sizes = 0; From a7118cf6b6765fb80e9ad3dde151de34fb97286c Mon Sep 17 00:00:00 2001 From: Mate Soos Date: Mon, 6 Apr 2026 10:26:34 +0200 Subject: [PATCH 3/5] Add VERBOSE_DEBUG tracing for WL canonical component hashing In comp_analyzer.cpp (compute_canon_info), add three VERBOSE_DEBUG_DO blocks: 1. After computing initial WL colors: prints nVars, nLongCls, and for each variable its id, long-clause degree, binary-clause degree, and is_indep flag. 2. After the WL refinement round: prints the per-variable wl1 hash in hex, useful for diagnosing color collisions between non-isomorphic components. 3. After computing the final structural hash: prints the hash in hex and the full canonical clause list as (i,j,...) tuples, so the canonical form can be inspected and compared across two runs on isomorphic components. In comp_management.cpp (record_remaining_comps_for), add two blocks: - When CanonInfo is valid: prints the component size, structural hash, number of canonical clauses, and the canonical variable ordering (original var ids). - When skipped (too large): prints the component size and the configured threshold, so it is clear why canonicalization was bypassed. - On cache hit: extends the existing hit message with wl_canonical=yes/no so it is immediately visible whether the hit was served via canonical hashing. Co-Authored-By: Claude Sonnet 4.6 --- src/comp_analyzer.cpp | 32 ++++++++++++++++++++++++++++++++ src/comp_management.cpp | 17 ++++++++++++++++- 2 files changed, 48 insertions(+), 1 deletion(-) diff --git a/src/comp_analyzer.cpp b/src/comp_analyzer.cpp index 6a1869c7..bfa28b97 100644 --- a/src/comp_analyzer.cpp +++ b/src/comp_analyzer.cpp @@ -484,6 +484,16 @@ CanonInfo CompAnalyzer::compute_canon_info(const Comp& comp, const bool is_indep = (comp.vars_begin()[i] < indep_support_end); init_color[i] = {long_deg[i], bin_deg[i], static_cast(is_indep)}; } + VERBOSE_DEBUG_DO( + cout << "WL canon: nVars=" << n + << " nLongCls=" << clause_to_pos.size() + << " initial colors (var longdeg bindeg isindep):"; + for (uint32_t i = 0; i < n; ++i) + cout << " [" << comp.vars_begin()[i] << " " + << long_deg[i] << " " << bin_deg[i] << " " + << (comp.vars_begin()[i] < indep_support_end ? 1 : 0) << "]"; + cout << endl; + ); // --- Build long-clause neighbour adjacency (for WL round) --- // cl_neighbors[i] = positions of variables that share a long clause with position i @@ -516,6 +526,13 @@ CanonInfo CompAnalyzer::compute_canon_info(const Comp& comp, wl1[i] = h; } + VERBOSE_DEBUG_DO( + cout << "WL canon: wl1 colors (var wl1hash):"; + for (uint32_t i = 0; i < n; ++i) + cout << " [" << comp.vars_begin()[i] << " 0x" << std::hex << wl1[i] << std::dec << "]"; + cout << endl; + ); + // --- Sort variables by (wl1, init_color, var_id) to get canonical order --- vector perm(n); iota(perm.begin(), perm.end(), 0); @@ -582,6 +599,21 @@ CanonInfo CompAnalyzer::compute_canon_info(const Comp& comp, } info.hash = chibihash64(hdata.data(), hdata.size() * sizeof(uint32_t), hash_seed); + VERBOSE_DEBUG_DO( + cout << "WL canon: final hash=0x" << std::hex << info.hash << std::dec + << " nclauses=" << info.sorted_canon_clauses.size() + << " canonical clauses:"; + for (const auto& cv : info.sorted_canon_clauses) { + cout << " ("; + for (uint32_t idx = 0; idx < cv.size(); ++idx) { + if (idx) cout << ","; + cout << cv[idx]; + } + cout << ")"; + } + cout << endl; + ); + info.valid = true; return info; } diff --git a/src/comp_management.cpp b/src/comp_management.cpp index 6c5df99c..fd5ef4a6 100644 --- a/src/comp_management.cpp +++ b/src/comp_management.cpp @@ -71,6 +71,19 @@ void CompManager::record_remaining_comps_for(StackLevel &top) { const CanonInfo canon = ana.compute_canon_info(*p_new_comp, hash_seed, conf.wl_canonize_threshold); if (canon.valid) stats.wl_canon_computed++; else stats.wl_canon_skipped++; +#ifdef VERBOSE_DEBUG + if (canon.valid) { + cout << COLYEL2 "WL canon computed for comp nVars=" << p_new_comp->nVars() + << " hash=0x" << std::hex << canon.hash << std::dec + << " nclauses=" << canon.sorted_canon_clauses.size() + << " canon_vars:"; + for (uint32_t cv : canon.canon_vars) cout << " " << cv; + cout << endl; + } else { + cout << COLYEL2 "WL canon skipped for comp nVars=" << p_new_comp->nVars() + << " (threshold=" << conf.wl_canonize_threshold << ")" << endl; + } +#endif void* ccomp = cache->create_new_comp(*p_new_comp, hash_seed, bpc, canon.valid ? &canon : nullptr); // TODO Yash: count it 1-by-1 in case the number of variables & clauses is small @@ -93,7 +106,9 @@ void CompManager::record_remaining_comps_for(StackLevel &top) { if (canon.valid) stats.wl_canon_hits++; #ifdef VERBOSE_DEBUG cout << COLYEL2 "Comp already in cache." - << " num vars: " << p_new_comp->nVars() << " vars: "; + << " num vars: " << p_new_comp->nVars() + << " wl_canonical=" << (canon.valid ? "yes" : "no") + << " vars: "; all_vars_in_comp(*p_new_comp, v) cout << *v << " "; cout << endl; #endif From 3c9f4ed73f87f8d527255eb7b57416bffaa0527c Mon Sep 17 00:00:00 2001 From: Mate Soos Date: Mon, 6 Apr 2026 11:36:57 +0200 Subject: [PATCH 4/5] Fix soundness bug: include is_indep profile in WL canonical hash/equality MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two structurally isomorphic components can have different projected model counts if their variables differ in membership in the independent support (projection set). The previous implementation only included the clause structure (canonical literal indices) in the canonical hash and equality data, but omitted whether each canonical position belongs to the projection set. Symptom: the CHECK_COUNT assertion fired with correct=8, actual=16 (2× error) on projected model counting formulas when --wlcanon is enabled. Root cause: a component with all non-projection variables (projected count=1) matched a cached component with the same clause structure but projection variables (projected count=2), returning the wrong count. Fix: populate CanonInfo::canon_is_indep[i] (0 or 1 for each canonical position), then append this to the hdata array used for both the hash (chibihash64) and the DiffPackedComp equality comparison (memcmp). Two components now compare equal only when their clause structures AND their per-position is_indep profiles both match. Co-Authored-By: Claude Sonnet 4.6 --- src/comp_analyzer.cpp | 110 +++++++++++++++------- src/comp_analyzer.hpp | 1 + src/comp_types/canon_info.hpp | 8 +- src/comp_types/difference_packed_comp.hpp | 5 +- 4 files changed, 87 insertions(+), 37 deletions(-) diff --git a/src/comp_analyzer.cpp b/src/comp_analyzer.cpp index bfa28b97..5f1fabbb 100644 --- a/src/comp_analyzer.cpp +++ b/src/comp_analyzer.cpp @@ -49,6 +49,7 @@ void CompAnalyzer::initialize( const LiteralIndexedVector & watches, // binary clauses ClauseAllocator const* alloc, const vector& _long_irred_cls) // longer-than-2-long clauses { + watches_ = &watches; max_var = watches.end_lit().var() - 1; comp_vars.reserve(max_var + 1); var_freq_scores.resize(max_var + 1, 0); @@ -452,25 +453,45 @@ CanonInfo CompAnalyzer::compute_canon_info(const Comp& comp, vector long_deg(n, 0); vector bin_deg(n, 0); - // clause_id -> list of comp-variable positions that appear in it + // clause_id -> list of comp-variable positions that appear in it (for WL neighbor graph) std::unordered_map> clause_to_pos; clause_to_pos.reserve(comp.num_long_cls() * 2); + // clause_id -> all Lits with signs (for polarity-aware canonical form) + // For ternary clauses: accumulated across all 3 variable visits. + // For long clauses: read from long_clauses_data on first encounter. + std::unordered_map> clause_all_lits; + clause_all_lits.reserve(comp.num_long_cls() * 2); + for (uint32_t i = 0; i < n; ++i) { const uint32_t v = comp.vars_begin()[i]; - // Long clauses: iterate over all long clauses v appears in (including possibly - // some that were satisfied since record_comp, but we filter by comp_clause_set). + // Long clauses: iterate over all long clauses v appears in. const ClData* longs = holder.begin_long(v); const ClData* longs_end = longs + holder.orig_size_long(v); for (const ClData* d = longs; d != longs_end; ++d) { - if (comp_clause_set.count(d->id)) { - clause_to_pos[d->id].push_back(i); - ++long_deg[i]; + if (!comp_clause_set.count(d->id)) continue; + clause_to_pos[d->id].push_back(i); + ++long_deg[i]; + + auto& lits = clause_all_lits[d->id]; + if (d->id < max_tri_clid) { + // Ternary clause: contribute the 2 "other" literals from this visit. + // After all 3 vars are visited, lits will contain all 3 signed literals. + const Lit l1 = d->get_lit1(); + const Lit l2 = d->get_lit2(); + if (std::find(lits.begin(), lits.end(), l1) == lits.end()) lits.push_back(l1); + if (std::find(lits.begin(), lits.end(), l2) == lits.end()) lits.push_back(l2); + } else { + // Long clause: read all signed literals from the literal pool on first encounter. + if (lits.empty()) { + const Lit* start = long_clauses_data.data() + d->off; + for (const Lit* it_l = start; *it_l != SENTINEL_LIT; ++it_l) lits.push_back(*it_l); + } } } - // Binary clauses (irredundant only – stored in holder). + // Binary degree (polarity-blind, used for WL initial color only). const uint32_t* bins = holder.begin_bin(v); for (uint32_t j = 0; j < holder.orig_size_bin(v); ++j) { if (var_to_pos.count(bins[j])) ++bin_deg[i]; @@ -486,7 +507,7 @@ CanonInfo CompAnalyzer::compute_canon_info(const Comp& comp, } VERBOSE_DEBUG_DO( cout << "WL canon: nVars=" << n - << " nLongCls=" << clause_to_pos.size() + << " nLongCls=" << clause_all_lits.size() << " initial colors (var longdeg bindeg isindep):"; for (uint32_t i = 0; i < n; ++i) cout << " [" << comp.vars_begin()[i] << " " @@ -546,57 +567,76 @@ CanonInfo CompAnalyzer::compute_canon_info(const Comp& comp, info.canon_vars.resize(n); for (uint32_t i = 0; i < n; ++i) info.canon_vars[i] = comp.vars_begin()[perm[i]]; + // canon_is_indep[i] = 1 if canonical position i is in the independent support, else 0. + // Must be included in the hash/equality data so that two structurally isomorphic + // components that differ only in their indep-support assignments are not confused. + info.canon_is_indep.resize(n); + for (uint32_t i = 0; i < n; ++i) + info.canon_is_indep[i] = static_cast(comp.vars_begin()[perm[i]] < indep_support_end); + // orig_pos -> canonical index vector orig_to_canon(n); for (uint32_t i = 0; i < n; ++i) orig_to_canon[perm[i]] = i; - // --- Build canonical clause representations (long clauses) --- - info.sorted_canon_clauses.reserve(clause_to_pos.size() + n); // upper bound - for (auto& [cl_id, positions] : clause_to_pos) { + // --- Build polarity-aware canonical clause representations --- + // Canonical literal index: 2 * canon_pos + (uint32_t)lit.sign() + // (sign()=false → negative literal, sign()=true → positive literal) + // + // Long/ternary: from clause_all_lits, filter to in-component (unknown) vars. + info.sorted_canon_clauses.reserve(clause_all_lits.size() + n); + for (auto& [cl_id, lits] : clause_all_lits) { vector cv; - cv.reserve(positions.size()); - for (uint32_t pos : positions) cv.push_back(orig_to_canon[pos]); + cv.reserve(lits.size()); + for (const Lit l : lits) { + auto it = var_to_pos.find(l.var()); + if (it == var_to_pos.end()) continue; // satisfied/false lit, skip + cv.push_back(2 * orig_to_canon[it->second] + static_cast(l.sign())); + } + if (cv.size() < 2) continue; // should not happen for in-comp clauses sort(cv.begin(), cv.end()); info.sorted_canon_clauses.push_back(std::move(cv)); } - // --- Build canonical binary clause representations --- - // Binary clauses: irredundant, stored as unordered pairs of variables. - // We deduplicate by only adding each pair once (ci < cj). - // Use a sorted vector of pairs for deduplication. - vector> bin_pairs; - bin_pairs.reserve(n * 2); + // Binary clauses: use watches_ directly to get full literal polarities. + // watches_[Lit(v,s)] contains binary clauses of the form (Lit(v,s) ∨ bincl.lit()). + // Deduplicate via a 64-bit key (packed canonical lit-index pair, lo<<32|hi). + std::unordered_set seen_bin; + seen_bin.reserve(n * 4); for (uint32_t pos_i = 0; pos_i < n; ++pos_i) { const uint32_t v = comp.vars_begin()[pos_i]; - const uint32_t* bins = holder.begin_bin(v); - for (uint32_t j = 0; j < holder.orig_size_bin(v); ++j) { - auto it = var_to_pos.find(bins[j]); - if (it == var_to_pos.end()) continue; // not in this comp - const uint32_t pos_j = it->second; - uint32_t ci = orig_to_canon[pos_i]; - uint32_t cj = orig_to_canon[pos_j]; - if (ci > cj) std::swap(ci, cj); - bin_pairs.push_back({ci, cj}); + for (const bool s : {false, true}) { + for (const auto& bincl : (*watches_)[Lit(v, s)].binaries) { + if (!bincl.irred()) continue; + const Lit other = bincl.lit(); + auto it = var_to_pos.find(other.var()); + if (it == var_to_pos.end()) continue; // other end not in comp + const uint32_t cli = 2 * orig_to_canon[pos_i] + static_cast(s); + const uint32_t clj = 2 * orig_to_canon[it->second] + static_cast(other.sign()); + const uint32_t lo = std::min(cli, clj); + const uint32_t hi = std::max(cli, clj); + const uint64_t key = (static_cast(lo) << 32) | hi; + if (seen_bin.insert(key).second) + info.sorted_canon_clauses.push_back({lo, hi}); + } } } - sort(bin_pairs.begin(), bin_pairs.end()); - bin_pairs.erase(unique(bin_pairs.begin(), bin_pairs.end()), bin_pairs.end()); - for (auto& [ci, cj] : bin_pairs) - info.sorted_canon_clauses.push_back({ci, cj}); // --- Sort all canonical clauses lexicographically --- sort(info.sorted_canon_clauses.begin(), info.sorted_canon_clauses.end()); - // --- Compute structural hash of (nVars, canonical clauses) --- - // Encoding: [n, n_total_clauses, for each clause: (size, v0, v1, ...)] + // --- Compute structural hash of (nVars, canonical clauses, is_indep profile) --- + // Encoding: [n, n_total_clauses, for each clause: (size, v0, v1, ...), is_indep[0..n-1]] + // The is_indep profile distinguishes components whose clause structures are isomorphic + // but differ in which canonical positions belong to the independent (projection) support. vector hdata; - hdata.reserve(2 + info.sorted_canon_clauses.size() * 4); + hdata.reserve(2 + info.sorted_canon_clauses.size() * 4 + n); hdata.push_back(n); hdata.push_back(static_cast(info.sorted_canon_clauses.size())); for (const auto& cv : info.sorted_canon_clauses) { hdata.push_back(static_cast(cv.size())); for (uint32_t idx : cv) hdata.push_back(idx); } + for (uint32_t i = 0; i < n; ++i) hdata.push_back(info.canon_is_indep[i]); info.hash = chibihash64(hdata.data(), hdata.size() * sizeof(uint32_t), hash_seed); VERBOSE_DEBUG_DO( diff --git a/src/comp_analyzer.hpp b/src/comp_analyzer.hpp index b6d27161..fedd1aad 100644 --- a/src/comp_analyzer.hpp +++ b/src/comp_analyzer.hpp @@ -206,6 +206,7 @@ class CompAnalyzer { MyHolder holder; vector long_clauses_data; const LiteralIndexedVector & values; + const LiteralIndexedVector* watches_ = nullptr; const CounterConfiguration& conf; const uint32_t indep_support_end; diff --git a/src/comp_types/canon_info.hpp b/src/comp_types/canon_info.hpp index 4115ab1c..96114aee 100644 --- a/src/comp_types/canon_info.hpp +++ b/src/comp_types/canon_info.hpp @@ -45,7 +45,13 @@ struct CanonInfo { // Binary clauses appear as 2-element inner vectors. std::vector> sorted_canon_clauses; - // Pre-computed structural hash derived from sorted_canon_clauses and nVars. + // is_indep[i] = 1 if the variable at canonical position i is in the independent support, + // 0 otherwise. Included in both the hash and the equality comparison so that two + // structurally isomorphic components that differ only in their projection-variable + // assignments are NOT considered equivalent (they may have different projected counts). + std::vector canon_is_indep; + + // Pre-computed structural hash derived from sorted_canon_clauses, nVars, and canon_is_indep. // Used directly as the cache key hash for both HashedComp and DiffPackedComp modes. uint64_t hash = 0; }; diff --git a/src/comp_types/difference_packed_comp.hpp b/src/comp_types/difference_packed_comp.hpp index e84f52f7..4c4df519 100644 --- a/src/comp_types/difference_packed_comp.hpp +++ b/src/comp_types/difference_packed_comp.hpp @@ -127,7 +127,9 @@ class DiffPackedComp { if (canon && canon->valid) { is_canonical_ = true; // Store the same hdata that was used to compute canon->hash, for equality checking. - // Format: [nVars, n_clauses, (size, v0, v1, ...) per clause ...] + // Format: [nVars, n_clauses, (size, v0, v1, ...) per clause ..., is_indep[0..n-1]] + // The is_indep profile is appended so components that differ only in + // independent-support membership are correctly distinguished. std::vector hdata; const uint32_t n = comp.nVars(); hdata.push_back(n); @@ -136,6 +138,7 @@ class DiffPackedComp { hdata.push_back(static_cast(cv.size())); for (uint32_t idx : cv) hdata.push_back(idx); } + for (uint32_t i = 0; i < n; ++i) hdata.push_back(canon->canon_is_indep[i]); delete[] data; data_size = static_cast(hdata.size()); data = new uint32_t[data_size]; From 2b8dac44be387d1a43ca53c0c28f09b1a3c1cbb9 Mon Sep 17 00:00:00 2001 From: Mate Soos Date: Mon, 6 Apr 2026 11:59:06 +0200 Subject: [PATCH 5/5] Disable WL canonicalization for weighted counting modes WL canonical hashing is sound only when two structurally isomorphic components are guaranteed to have the same count. For weighted model counting (modes 1-6), each variable carries an individual weight, so two components with the same clause topology but different per-variable weights have different weighted counts and must NOT be treated as cache-equivalent. counter->weighted() returns false only for FGenMpz (mode 0, plain integer/projected counting). All other field types -- rational (1), complex (2/6), polynomial/GF (3), parity (4), mod-prime (5) -- return true. Adding an early return for weighted() makes compute_canon_info a no-op (returns invalid CanonInfo) for those modes, so no WL cache entries are created and no unsound hits occur. WL canonicalization provides essentially no benefit for weighted modes anyway: two different components would need the same clause structure AND the same per-variable weight at every corresponding canonical position -- a near-zero probability in practice. Co-Authored-By: Claude Sonnet 4.6 --- src/comp_analyzer.cpp | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/src/comp_analyzer.cpp b/src/comp_analyzer.cpp index 5f1fabbb..33323085 100644 --- a/src/comp_analyzer.cpp +++ b/src/comp_analyzer.cpp @@ -432,7 +432,12 @@ CanonInfo CompAnalyzer::compute_canon_info(const Comp& comp, uint32_t threshold) const { CanonInfo info; const uint32_t n = comp.nVars(); - if (threshold == 0 || n > threshold || n == 0) return info; + // WL canonicalization is only sound for unweighted counting (mode 0). + // For weighted modes, each variable has an individual weight; two components + // with the same clause structure but different per-variable weights have + // different weighted counts, so structural isomorphism does not imply + // cache equivalence. weighted() is false only for FGenMpz (mode 0). + if (threshold == 0 || n > threshold || n == 0 || counter->weighted()) return info; // --- Build membership lookups ---