Skip to content

HIVE-29281: Make proactive cache eviction work with catalog#6379

Open
Neer393 wants to merge 1 commit intoapache:masterfrom
Neer393:HIVE-29281
Open

HIVE-29281: Make proactive cache eviction work with catalog#6379
Neer393 wants to merge 1 commit intoapache:masterfrom
Neer393:HIVE-29281

Conversation

@Neer393
Copy link
Copy Markdown
Contributor

@Neer393 Neer393 commented Mar 19, 2026

What changes were proposed in this pull request?

Made the proactive cache eviction catalog aware by making changes in ProactiveCacheEviction file and the CacheTag file.

Why are the changes needed?

The proactive cache eviction should be catalog aware otherwise same name tables under different catalogs may cause false cache hits/miss. To avoid this, the cache eviction should be aware of the catalog.

Does this PR introduce any user-facing change?

No user facing changes as user does not know about the proactive cache eviction.

How was this patch tested?

Added unit tests for with and without catalog and all of them passed. Not sure how to manually test proactive cache eviction so verified only via unit tests

@Neer393
Copy link
Copy Markdown
Contributor Author

Neer393 commented Mar 20, 2026

@zhangbutao I need a review here. I looked at all the merged PRs under HIVE-22820 and have made changes accordingly for making it catalog aware. Please help me here if I missed anything. Thanks

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes LLAP proactive cache eviction catalog-aware by propagating catalog names through cache tags and eviction requests, preventing collisions when identical db/table names exist across catalogs.

Changes:

  • Extend LLAP proactive eviction request structure to include catalog scoping (catalog → db → table → partitions).
  • Introduce catalog tracking on TableDesc/PartitionDesc and update cache-tag generation to include catalog-qualified names.
  • Update LLAP cache metadata serialization and unit tests to reflect catalog-qualified cache tags.

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
storage-api/src/java/org/apache/hadoop/hive/common/io/CacheTag.java Updates cache tag semantics/docs and parent-tag derivation to preserve catalog prefix.
ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java Adds catalogName field and updates constructors/clone to carry catalog without polluting EXPLAIN.
ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java Exposes catalog name via PartitionDesc based on TableDesc, with default fallback.
ql/src/java/org/apache/hadoop/hive/llap/ProactiveEviction.java Makes eviction requests catalog-scoped and includes catalog in proto requests and tag matching.
llap-common/src/protobuf/LlapDaemonProtocol.proto Adds catalog name field to EvictEntityRequestProto.
ql/src/java/org/apache/hadoop/hive/llap/LlapHiveUtils.java Prefixes cache tags with catalog when deriving metrics tags.
llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java Adjusts eviction debug logging for catalog+db structure.
llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapCacheMetadataSerializer.java Adds backward-ish handling for cache tags missing catalog during decode.
llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcFileEstimateErrors.java Updates synthetic tag to include default catalog prefix.
ql/src/java/org/apache/hadoop/hive/ql/ddl/** Ensures eviction builders are invoked with catalog where available.
ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java Ensures TableDesc created from Table carries catalog name.
ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java Updates TableDesc construction call sites for new signature.
Various test files Update existing tests and add new coverage for catalog-aware eviction and proto round-trips.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +337 to 344
/**
* Add a partition of a table scoped to the given catalog.
*/
public Builder addPartitionOfATable(String catalog, String db, String tableName,
LinkedHashMap<String, String> partSpec) {
ensureTable(catalog, db, tableName);
entities.get(catalog).get(db).get(tableName).add(partSpec);
return this;
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request.Builder claims the catalog key defaults to Warehouse.DEFAULT_CATALOG_NAME, but the builder currently stores the catalog parameter as-is. If a caller passes null (or an empty string), this will create a null key and later NPE in toProtoRequests() when calling toLowerCase(). Normalize catalog (and arguably db/table) at the builder boundary, e.g. default null/blank catalog to the default catalog and enforce non-null keys.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All calls made to this will never be null.

@Neer393
Copy link
Copy Markdown
Contributor Author

Neer393 commented Mar 25, 2026

@zhangbutao resolved the copilot's reviews

@zhangbutao
Copy link
Copy Markdown
Contributor

@zhangbutao resolved the copilot's reviews

Thanks for pinging me. I will do the code review later.

@Neer393
Copy link
Copy Markdown
Contributor Author

Neer393 commented Apr 2, 2026

Made changes as per requested.
Need approval here @zhangbutao @deniskuzZ

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Apr 3, 2026

@Neer393
Copy link
Copy Markdown
Contributor Author

Neer393 commented Apr 3, 2026

Fixed sonarqube issues as well. All good for approval and merge @zhangbutao @deniskuzZ

Copy link
Copy Markdown
Contributor

@zhangbutao zhangbutao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
I think most of This PR is fine. Thanks.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 24 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +222 to +228
public String getCatalogName() {
return catalogName;
}

public void setCatalogName(String catalogName) {
this.catalogName = catalogName == null ? Warehouse.DEFAULT_CATALOG_NAME : catalogName;
}
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TableDesc.catalogName can remain null for instances built via the no-arg constructor + setters, even though the new constructor/setter normalize null to Warehouse.DEFAULT_CATALOG_NAME. Consider initializing catalogName eagerly (field initializer or in TableDesc()) so getCatalogName() never returns null. Also, equals()/hashCode() currently ignore catalogName, which can cause different-catalog descriptors to compare equal and collide in hash-based collections; include catalogName in both (or document why it must be excluded).

Copilot uses AI. Check for mistakes.
Comment on lines +169 to +170
* the same DB name, and that getSingleCatalogName/getSingleDbName return null when multiple
* catalog-DB pairs are present.
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test’s Javadoc mentions getSingleCatalogName/getSingleDbName, but those methods no longer exist (they were replaced by hasCatalogName/hasDatabaseName). Update the comment to reflect the current API/behavior to avoid confusion for future maintainers.

Suggested change
* the same DB name, and that getSingleCatalogName/getSingleDbName return null when multiple
* catalog-DB pairs are present.
* the same DB name, and that requests spanning multiple catalog-DB pairs are not treated as
* having a single catalog or database; callers should use hasCatalogName/hasDatabaseName with
* explicit values instead.

Copilot uses AI. Check for mistakes.

public static CacheTag cacheTagBuilder(String dbAndTable, String... partitions) {
String[] parts = dbAndTable.split("\\.");
if(parts.length < 3) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space

partDescs.put("mutli=one", "one=/1");
partDescs.put("mutli=two/", "two=2");
tag = CacheTag.build("math.rules", partDescs);
tag = CacheTag.build(Warehouse.DEFAULT_CATALOG_NAME + ".math.rules", partDescs);
Copy link
Copy Markdown
Member

@deniskuzZ deniskuzZ Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not handle DEFAULT_CATALOG_NAME inside CacheTag.build when size ==2?
you wouldn't need CacheTag.build(Warehouse.DEFAULT_CATALOG_NAME, "math.rules") either

if (part == null) {
return CacheTag.build(LlapUtil.getDbAndTableNameForMetrics(path, includeParts));
return CacheTag.build(
Warehouse.DEFAULT_CATALOG_NAME, LlapUtil.getDbAndTableNameForMetrics(path, includeParts));
Copy link
Copy Markdown
Member

@deniskuzZ deniskuzZ Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we handle path with catalog?
seems that we always go with default catalog when partitionDesc is null. would it ever be non-null for unpartitioned tables?

return CacheTag.build(catalogName, part.getTableName());
} else {
return CacheTag.build(part.getTableName(), part.getPartSpec());
return CacheTag.build(catalogName + '.' + part.getTableName(), part.getPartSpec());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

method signatures are not consistent

// Holds a hierarchical structure of catalogs, DBs, tables and partitions such as:
// { testcatalog : { testdb : { testtab0 : [], testtab1 : [ {pk0 : p0v0, pk1 : p0v1} ] }, testdb2 : {} } }
// The catalog key defaults to Warehouse.DEFAULT_CATALOG_NAME ("hive") when no explicit catalog is given.
private final Map<String, Map<String, Map<String, Set<PartitionSpec>>>> entities;
Copy link
Copy Markdown
Member

@deniskuzZ deniskuzZ Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we get rid of this nesting by using compound key?

Map<CatalogDb, Map<String, Set<PartitionSpec>>> entities;

record CatalogDb(String catalog, String database) {}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subject: [PATCH] patch
---
Index: llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java b/llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java
--- a/llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java	(revision 5e82676aa447c5066283bead8553dfc682053297)
+++ b/llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java	(date 1775575270784)
@@ -324,11 +324,10 @@
     if (LOG.isDebugEnabled()) {
       StringBuilder sb = new StringBuilder();
       sb.append(markedBytes).append(" bytes marked for eviction from LLAP cache buffers that belong to table(s): ");
-      request.getEntities().forEach((catalog, dbs) ->
-          dbs.forEach((db, tables) ->
-              tables.forEach((table, partitions) ->
-                  sb.append(catalog + "." + db + "." + table).append(" "))
-          )
+      request.getEntities().forEach((catalogDb, tables) ->
+          tables.forEach((table, partitions) ->
+              sb.append(catalogDb.catalog()).append(".").append(catalogDb.database())
+                  .append(".").append(table).append(" "))
       );
       sb.append(" Duration: ").append(time).append(" ms");
       LOG.debug(sb.toString());
Index: ql/src/java/org/apache/hadoop/hive/llap/ProactiveEviction.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/ql/src/java/org/apache/hadoop/hive/llap/ProactiveEviction.java b/ql/src/java/org/apache/hadoop/hive/llap/ProactiveEviction.java
--- a/ql/src/java/org/apache/hadoop/hive/llap/ProactiveEviction.java	(revision 5e82676aa447c5066283bead8553dfc682053297)
+++ b/ql/src/java/org/apache/hadoop/hive/llap/ProactiveEviction.java	(date 1775575734238)
@@ -157,16 +157,23 @@
   public static final class Request {
 
     public record PartitionSpec(Map<String, String> spec) {}
-    // Holds a hierarchical structure of catalogs, DBs, tables and partitions such as:
-    // { testcatalog : { testdb : { testtab0 : [], testtab1 : [ {pk0 : p0v0, pk1 : p0v1} ] }, testdb2 : {} } }
-    // The catalog key defaults to Warehouse.DEFAULT_CATALOG_NAME ("hive") when no explicit catalog is given.
-    private final Map<String, Map<String, Map<String, Set<PartitionSpec>>>> entities;
+
+    public record CatalogDb(String catalog, String database) {
+      public static CatalogDb of(String catalog, String database) {
+        return new CatalogDb(catalog, database);
+      }
+    }
 
-    private Request(Map<String, Map<String, Map<String, Set<PartitionSpec>>>> entities) {
+    // Holds a hierarchical structure of (catalog, DB) -> tables -> partitions such as:
+    // { (testcatalog, testdb) : { testtab0 : [], testtab1 : [ {pk0 : p0v0, pk1 : p0v1} ] } }
+    // The catalog defaults to Warehouse.DEFAULT_CATALOG_NAME ("hive") when no explicit catalog is given.
+    private final Map<CatalogDb, Map<String, Set<PartitionSpec>>> entities;
+
+    private Request(Map<CatalogDb, Map<String, Set<PartitionSpec>>> entities) {
       this.entities = entities;
     }
 
-    public Map<String, Map<String, Map<String, Set<PartitionSpec>>>> getEntities() {
+    public Map<CatalogDb, Map<String, Set<PartitionSpec>>> getEntities() {
       return entities;
     }
 
@@ -174,57 +181,44 @@
       return entities.isEmpty();
     }
 
-    public boolean hasCatalogName(String catalogName) {
-      return entities.containsKey(catalogName);
-    }
-
-    public boolean hasDatabaseName(String catalogName, String dbName) {
-      return hasCatalogName(catalogName) && entities.get(catalogName).containsKey(dbName);
-    }
-
     /**
      * Translate to Protobuf requests.
      * @return list of request instances ready to be sent over protobuf.
      */
     public List<LlapDaemonProtocolProtos.EvictEntityRequestProto> toProtoRequests() {
       return entities.entrySet().stream()
-          .flatMap(catalogEntry -> {
-            String catalogName = catalogEntry.getKey();
-            Map<String, Map<String, Set<PartitionSpec>>> dbs = catalogEntry.getValue();
-
-            return dbs.entrySet().stream().map(dbEntry -> {
-              String dbName = dbEntry.getKey();
-              Map<String, Set<PartitionSpec>> tables = dbEntry.getValue();
+          .map(entry -> {
+            CatalogDb catalogDb = entry.getKey();
+            Map<String, Set<PartitionSpec>> tables = entry.getValue();
 
-              LlapDaemonProtocolProtos.EvictEntityRequestProto.Builder requestBuilder =
-                  LlapDaemonProtocolProtos.EvictEntityRequestProto.newBuilder();
+            LlapDaemonProtocolProtos.EvictEntityRequestProto.Builder requestBuilder =
+                LlapDaemonProtocolProtos.EvictEntityRequestProto.newBuilder();
 
-              requestBuilder.setCatalogName(catalogName.toLowerCase());
-              requestBuilder.setDbName(dbName.toLowerCase());
+            requestBuilder.setCatalogName(catalogDb.catalog().toLowerCase());
+            requestBuilder.setDbName(catalogDb.database().toLowerCase());
 
-              tables.forEach((tableName, partitions) -> {
-                LlapDaemonProtocolProtos.TableProto.Builder tableBuilder =
-                    LlapDaemonProtocolProtos.TableProto.newBuilder();
+            tables.forEach((tableName, partitions) -> {
+              LlapDaemonProtocolProtos.TableProto.Builder tableBuilder =
+                  LlapDaemonProtocolProtos.TableProto.newBuilder();
 
-                tableBuilder.setTableName(tableName.toLowerCase());
+              tableBuilder.setTableName(tableName.toLowerCase());
 
-                Set<String> partitionKeys = null;
+              Set<String> partitionKeys = null;
 
-                for (PartitionSpec partitionSpec : partitions) {
-                  if (partitionKeys == null) {
-                    partitionKeys = new LinkedHashSet<>(partitionSpec.spec().keySet());
-                    tableBuilder.addAllPartKey(partitionKeys);
-                  }
-                  for (String partKey : tableBuilder.getPartKeyList()) {
-                    tableBuilder.addPartVal(partitionSpec.spec().get(partKey));
-                  }
-                }
-                // For a given table the set of partition columns (keys) should not change.
-                requestBuilder.addTable(tableBuilder.build());
-              });
+              for (PartitionSpec partitionSpec : partitions) {
+                if (partitionKeys == null) {
+                  partitionKeys = new LinkedHashSet<>(partitionSpec.spec().keySet());
+                  tableBuilder.addAllPartKey(partitionKeys);
+                }
+                for (String partKey : tableBuilder.getPartKeyList()) {
+                  tableBuilder.addPartVal(partitionSpec.spec().get(partKey));
+                }
+              }
+              // For a given table the set of partition columns (keys) should not change.
+              requestBuilder.addTable(tableBuilder.build());
+            });
 
-              return requestBuilder.build();
-            });
+            return requestBuilder.build();
           })
           .toList();
     }
@@ -253,18 +247,12 @@
       // getTableName() returns "catalog.db.table"; TableName.fromString handles 3-part names.
       TableName tagTableName = TableName.fromString(cacheTag.getTableName(), null, null);
 
-      // Check that the tag's catalog is present in the eviction request.
-      if (!entities.containsKey(catalog)) {
+      CatalogDb key = CatalogDb.of(catalog, db);
+      if (!entities.containsKey(key)) {
         return false;
       }
 
-      // Check that the tag's DB is present in the eviction request for this catalog.
-      Map<String, Map<String, Set<PartitionSpec>>> catalogEntities = entities.getOrDefault(catalog, Map.of());
-      if (!catalogEntities.containsKey(db)) {
-        return false;
-      }
-
-      Map<String, Set<PartitionSpec>> tables = catalogEntities.getOrDefault(db, Map.of());
+      Map<String, Set<PartitionSpec>> tables = entities.getOrDefault(key, Map.of());
 
       // If true, must be a drop DB event and this cacheTag matches.
       if (tables.isEmpty()) {
@@ -311,7 +299,7 @@
      */
     public static final class Builder {
 
-      private final Map<String, Map<String, Map<String, Set<PartitionSpec>>>> entities;
+      private final Map<CatalogDb, Map<String, Set<PartitionSpec>>> entities;
 
       private Builder() {
         this.entities = new HashMap<>();
@@ -327,7 +315,7 @@
       public Builder addPartitionOfATable(String catalog, String db, String tableName,
                                           Map<String, String> partSpec) {
         ensureTable(catalog, db, tableName);
-        entities.get(catalog).get(db).get(tableName).add(new PartitionSpec(partSpec));
+        entities.get(CatalogDb.of(catalog, db)).get(tableName).add(new PartitionSpec(partSpec));
         return this;
       }
 
@@ -342,7 +330,7 @@
        * Add a database scoped to the given catalog.
        */
       public Builder addDb(String catalog, String db) {
-        ensureDb(catalog, db);
+        entities.computeIfAbsent(CatalogDb.of(catalog, db), k -> new HashMap<>());
         return this;
       }
 
@@ -372,18 +360,10 @@
         return new Request(entities);
       }
 
-      private void ensureCatalog(String catalogName) {
-        entities.computeIfAbsent(catalogName, k -> new HashMap<>());
-      }
-
-      private void ensureDb(String catalogName, String dbName) {
-        ensureCatalog(catalogName);
-        entities.get(catalogName).computeIfAbsent(dbName, k -> new HashMap<>());
-      }
-
       private void ensureTable(String catalogName, String dbName, String tableName) {
-        ensureDb(catalogName, dbName);
-        entities.get(catalogName).get(dbName).computeIfAbsent(tableName, k -> new HashSet<>());
+        CatalogDb key = CatalogDb.of(catalogName, dbName);
+        entities.computeIfAbsent(key, k -> new HashMap<>())
+            .computeIfAbsent(tableName, k -> new HashSet<>());
       }
 
       /**
@@ -396,6 +376,7 @@
         String catalogName = protoRequest.getCatalogName().toLowerCase();
         String dbName = protoRequest.getDbName().toLowerCase();
 
+        CatalogDb key = CatalogDb.of(catalogName, dbName);
         Map<String, Set<PartitionSpec>> entitiesInDb = new HashMap<>();
         List<LlapDaemonProtocolProtos.TableProto> tables = protoRequest.getTableList();
 
@@ -425,9 +406,7 @@
             entitiesInDb.put(dbAndTableName, partitions);
           }
         }
-        Map<String, Map<String, Set<PartitionSpec>>> entitiesInCatalog = new HashMap<>();
-        entitiesInCatalog.put(dbName, entitiesInDb);
-        entities.put(catalogName, entitiesInCatalog);
+        entities.put(key, entitiesInDb);
         return this;
       }
     }

* {@code Table}).
*/
public String getCatalogName() {
if (tableDesc != null && tableDesc.getCatalogName() != null) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can it be null? partition without link to table doesn't exists

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants