Skip to content

[Core] Introduce StructuredZStream#392

Merged
george-zubrienko merged 24 commits into
mainfrom
structured-stream
May 13, 2026
Merged

[Core] Introduce StructuredZStream#392
george-zubrienko merged 24 commits into
mainfrom
structured-stream

Conversation

@george-zubrienko
Copy link
Copy Markdown
Contributor

@george-zubrienko george-zubrienko commented May 8, 2026

Closes #393
Prepares for #352 #262

This PR removes batch grouping by schema and instead changes SoDP / SDP API to emit a stream of (rowStream, schema) pairs, thus guaranteeing schema consistency within the stream.

All affiliated code is updated, including:

  • Schema migration moved to a separate processor and will result in no-op for all batches in each substream except the very first one, if there is a schema change at all.
  • Maintenance moved to a separate processor and batch counting logic has been isolated to that processor.

Base automatically changed from 23-prep to main May 11, 2026 09:11
@george-zubrienko george-zubrienko changed the title Structured stream [Core] Introduce StructuredZStream May 13, 2026
@github-actions
Copy link
Copy Markdown

File Coverage
All files 69%
logging/ZIOLogAnnotations.scala 82%
models/app/BaseStreamContext.scala 32%
models/app/PluginStreamContext.scala 0%
models/app/PluginStreamContext.scala 0%
models/app/PluginStreamContext.scala 0%
models/batches/BlobBatchCommons.scala 70%
models/batches/SqlServerChangeTracking.scala 0%
models/batches/UpsertBlob.scala 0%
models/batches/SynapseLink.scala 79%
models/batches/SynapseLink.scala 68%
models/batches/SynapseLink.scala 0%
models/batches/SynapseLink.scala 0%
models/batches/SqlServerChangeTracking.scala 0%
models/batches/SqlServerChangeTracking.scala 79%
models/batches/UpsertBlob.scala 78%
models/batches/UpsertBlob.scala 0%
models/batches/UpsertBlob.scala 0%
models/cdm/SimpleCdmModel.scala 88%
models/cdm/CdmParser.scala 90%
models/cdm/CdmParser.scala 97%
models/ddl/CreateTableRequest.scala 80%
models/maintenance/JdbcAnalyzeRequest.scala 30%
models/maintenance/JdbcOptimizationRequest.scala 50%
models/maintenance/JdbcOrphanFilesExpirationRequest.scala 50%
models/maintenance/JdbcSnapshotExpirationRequest.scala 50%
models/queries/MergeQuerySegment.scala 90%
models/queries/MergeQuerySegment.scala 0%
models/queries/MergeQuery.scala 80%
models/queries/MergeQuerySegment.scala 94%
models/schemas/ArcaneSchema.scala 84%
models/schemas/ArcaneSchema.scala 51%
models/schemas/ArcaneSchema.scala 62%
models/schemas/DataCell.scala 86%
models/schemas/ArcaneSchema.scala 60%
models/schemas/DataCell.scala 0%
models/schemas/ArcaneSchema.scala 0%
models/schemas/DataCell.scala 0%
models/schemas/ArcaneSchema.scala 71%
models/serialization/ZIODurationRW.scala 0%
models/serialization/FlowRateRW.scala 68%
models/serialization/AdlsStoragePathRW.scala 0%
models/serialization/JavaDurationRW.scala 0%
models/serialization/S3RegionRW.scala 0%
models/serialization/UriRW.scala 0%
models/settings/FieldSelectionRuleSettings.scala 96%
models/settings/TableNaming.scala 60%
models/settings/TablePropertiesSettings.scala 86%
models/settings/azure/AzureStorageConnectionSettings.scala 66%
models/settings/azure/AzureStorageConnectionSettings.scala 66%
models/settings/azure/AzureStorageConnectionSettings.scala 83%
models/settings/database/JdbcConnectionExtensions.scala 69%
models/settings/iceberg/DefaultIcebergStagingSettings.scala 0%
models/settings/mssql/MsSqlServerConnectionSettings.scala 0%
models/settings/sink/DefaultIcebergSinkSettings.scala 0%
models/settings/sources/blob/JsonBlobSourceSettings.scala 0%
models/settings/sources/blob/ParquetBlobSourceSettings.scala 0%
models/settings/staging/JdbcCredentialType.scala 0%
models/settings/staging/JdbcMergeServiceClientSettings.scala 0%
models/settings/staging/JdbcMergeServiceClientSettings.scala 0%
models/settings/staging/JdbcCredentialType.scala 48%
models/settings/staging/JdbcMergeServiceClientSettings.scala 71%
models/settings/streaming/ThroughputSettings.scala 89%
services/app/PosixStreamLifetimeService.scala 0%
services/app/GenericStreamRunnerService.scala 88%
services/app/StreamRunnerServiceImpl.scala 0%
services/blobsource/UpsertBlobBackfillOverwriteBatchFactory.scala 0%
services/blobsource/DefaultS3Reader.scala 0%
services/blobsource/providers/BlobSourceDataProvider.scala 81%
services/blobsource/providers/BlobSourceStreamingDataProvider.scala 69%
services/blobsource/readers/listing/BlobListingJsonSource.scala 83%
services/blobsource/readers/listing/BlobListingCsvSource.scala 0%
services/blobsource/readers/listing/BlobListingParquetSource.scala 61%
services/blobsource/versioning/BlobSourceWatermark.scala 92%
services/blobsource/versioning/UpsertBlobStagedBatchFactory.scala 0%
services/bootstrap/DefaultStreamBootstrapper.scala 62%
services/filters/ColumnSummaryFieldsFilteringService.scala 86%
services/filters/FieldsFilteringService.scala 85%
services/iceberg/SchemaConversions.scala 0%
services/iceberg/SchemaConversions.scala 16%
services/iceberg/IcebergCatalogCredential.scala 60%
services/iceberg/SchemaConversions.scala 86%
services/iceberg/IcebergEntityManager.scala 50%
services/iceberg/SchemaConversions.scala 87%
services/iceberg/IcebergS3CatalogWriter.scala 95%
services/iceberg/IcebergTablePropertyManager.scala 80%
services/iceberg/IcebergCatalogFactory.scala 99%
services/iceberg/interop/JsonScanner.scala 82%
services/iceberg/interop/ParquetScanner.scala 80%
services/iceberg/interop/ParquetInputStreamAdapter.scala 75%
services/merging/JdbcMergeServiceClient.scala 33%
services/merging/cleanup/CatalogDisposeServiceClient.scala 0%
services/metrics/GlobalMetricTagProvider.scala 47%
services/metrics/DataDog.scala 0%
services/mssql/MsSqlStagedBatchFactory.scala 0%
services/mssql/MsSqlDataProvider.scala 49%
services/mssql/MsSqlBackfillOverwriteBatchFactory.scala 0%
services/mssql/MsSqlExtensions.scala 74%
services/mssql/MsSqlStreamingDataProvider.scala 69%
services/mssql/QueryProvider.scala 74%
services/mssql/base/MsSqlReader.scala 85%
services/mssql/query/ScalarQueryResult.scala 65%
services/mssql/query/LazyQueryResult.scala 84%
services/mssql/query/LazyQueryResult.scala 65%
services/storage/models/azure/AzureModelConversions.scala 98%
services/storage/models/azure/AdlsStoragePath.scala 82%
services/storage/models/base/StoredBlob.scala 85%
services/storage/models/s3/S3StoragePath.scala 80%
services/storage/models/s3/S3ClientSettings.scala 83%
services/storage/services/azure/AzureBlobStorageReader.scala 78%
services/storage/services/s3/S3BlobStorageReader.scala 70%
services/streaming/base/DefaultSourceDataProvider.scala 65%
services/streaming/base/GenericBackfillStreamingMergeDataProvider.scala 80%
services/streaming/base/StreamGraphBuilder.scala 0%
services/streaming/base/SourceWatermark.scala 0%
services/streaming/base/SourceWatermark.scala 0%
services/streaming/base/DefaultStreamDataProvider.scala 84%
services/streaming/graph_builders/GenericStreamingGraphBuilder.scala 85%
services/streaming/graph_builders/backfill/GenericBackfillMergeGraphBuilder.scala 81%
services/streaming/graph_builders/backfill/GenericBackfillOverwriteGraphBuilder.scala 54%
services/streaming/processors/batch_processors/WatermarkProcessingExtensions.scala 19%
services/streaming/processors/batch_processors/backfill/BackfillDisposeBatchProcessor.scala 0%
services/streaming/processors/batch_processors/backfill/BackfillOverwriteWatermarkProcessor.scala 0%
services/streaming/processors/batch_processors/backfill/BackfillApplyBatchProcessor.scala 85%
services/streaming/processors/batch_processors/maintenance/TargetMaintenanceProcessor.scala 77%
services/streaming/processors/batch_processors/streaming/SchemaMigrationProcessor.scala 19%
services/streaming/processors/batch_processors/streaming/MergeBatchProcessor.scala 98%
services/streaming/processors/batch_processors/streaming/DisposeBatchProcessor.scala 79%
services/streaming/processors/transformers/StagingProcessor.scala 91%
services/streaming/throughput/MemoryBoundShaper.scala 74%
services/streaming/throughput/StaticShaper.scala 0%
services/streaming/throughput/base/ThroughputShaperBuilder.scala 50%
services/synapse/SynapseLinkStreamingDataProvider.scala 69%
services/synapse/SynapseBatchFactory.scala 0%
services/synapse/SynapseEntitySchemaProvider.scala 84%
services/synapse/SynapseBackfillOverwriteBatchFactory.scala 0%
services/synapse/base/SynapseLinkReader.scala 60%
services/synapse/base/SynapseLinkDataProvider.scala 51%
services/synapse/versioning/SynapseWatermark.scala 83%
utils/SqlUtils.scala 0%
utils/SqlUtils.scala 48%
utils/CollectionUtils.scala 0%

Minimum allowed coverage is 40%

Generated by 🐒 cobertura-action against 639b9e0

@george-zubrienko george-zubrienko marked this pull request as ready for review May 13, 2026 08:19
@george-zubrienko george-zubrienko merged commit 1f6039a into main May 13, 2026
2 checks passed
@george-zubrienko george-zubrienko deleted the structured-stream branch May 13, 2026 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Schema Management] Eliminate schema grouping

2 participants