-
Notifications
You must be signed in to change notification settings - Fork 255
feat: ColliderML reader #5546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
benjaminhuth
wants to merge
101
commits into
acts-project:main
Choose a base branch
from
benjaminhuth:feature/collider-ml-reader-arrow
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
feat: ColliderML reader #5546
Changes from 96 commits
Commits
Show all changes
101 commits
Select commit
Hold shift + click to select a range
6ede096
feat: add parent id to existing SimParticle EDM
paulgessinger eb9835a
feat: Make ScopedTimer threadsafe
paulgessinger 37b3ec6
particle docs fixes
paulgessinger 06005ef
clang-format
paulgessinger 59290dd
MERGE
paulgessinger 32d6228
feat: Initial arrow/parquet support
paulgessinger 970f19d
experiment with arrow object library
paulgessinger d2080b8
clean up symbol visibility in wrapper target
paulgessinger f331c3b
make the isolated arrow absorption optional
paulgessinger 3ce451b
add parquet option to full chain odd
paulgessinger 99478a4
updated particle arrow schema based on colliderml
paulgessinger c28a93c
particle arrow converter writes parent id
paulgessinger 5fbc174
use row indices as particle ids
paulgessinger 5801bf5
add edm4hep to parquet conversion script
paulgessinger 299d69e
update output converters to produce proper nulls
paulgessinger 74cd1ea
add sim hit output converter + connect to track hit_ids
paulgessinger c6b9587
update detector resolver
paulgessinger a163863
add jobs arg to full chain odd
paulgessinger a824336
drop separate generated particles output
paulgessinger bcd8891
add plan for edm4hep input perf opt
paulgessinger 4bbc029
clang-format
paulgessinger 2de4baf
initial calo conversion
paulgessinger c421bc2
validated calo output
paulgessinger 8c72b08
optimization for calo hits and averaging timers
paulgessinger fddfd47
some timing for edm4hepsiminput
paulgessinger fed4480
add proper detector encoding, speedup
paulgessinger ffbd6a8
restore pythia script (?)
paulgessinger fac62a5
use acts units more
paulgessinger b15e10c
dataset system shards files
paulgessinger 5865ab1
address large number of propagation to perigee failures
paulgessinger d240655
test updates, fix init bug
paulgessinger 33d01b4
lint / ci fixes
paulgessinger 2240629
arrow schema (evolution) story follow up
paulgessinger 5c1e993
add C-API / ABI safe interop with python
paulgessinger 8bafc7c
bridge arrow schema into python cleanly
paulgessinger 5d16d1a
add schema validation to ParquetWriter
paulgessinger c817d51
drop calo hit code
paulgessinger e94fd31
make misconfig a hard error
paulgessinger 5648de8
clang-format
paulgessinger d890ddd
reduce diff
paulgessinger 4720f3d
feat: Initial arrow/parquet support
paulgessinger 3a09ce1
experiment with arrow object library
paulgessinger ebdde33
clean up symbol visibility in wrapper target
paulgessinger 230727b
make the isolated arrow absorption optional
paulgessinger 080bf22
add parquet option to full chain odd
paulgessinger 2159d6f
updated particle arrow schema based on colliderml
paulgessinger c444056
particle arrow converter writes parent id
paulgessinger 18fb443
use row indices as particle ids
paulgessinger e07bac0
add edm4hep to parquet conversion script
paulgessinger 308778b
update output converters to produce proper nulls
paulgessinger f75fa18
add sim hit output converter + connect to track hit_ids
paulgessinger e3ee308
update detector resolver
paulgessinger 683da75
add jobs arg to full chain odd
paulgessinger 4396547
drop separate generated particles output
paulgessinger ce2c5c9
add plan for edm4hep input perf opt
paulgessinger b4b4971
clang-format
paulgessinger 527780c
initial calo conversion
paulgessinger fc28cba
validated calo output
paulgessinger 8b7a984
optimization for calo hits and averaging timers
paulgessinger a940d6b
some timing for edm4hepsiminput
paulgessinger 38811b1
add proper detector encoding, speedup
paulgessinger 4603d0c
restore pythia script (?)
paulgessinger 8e836f4
use acts units more
paulgessinger ed599b5
dataset system shards files
paulgessinger 0cfde9a
address large number of propagation to perigee failures
paulgessinger 7786cd1
test updates, fix init bug
paulgessinger a94812a
lint / ci fixes
paulgessinger a3ccceb
arrow schema (evolution) story follow up
paulgessinger 1cea9ae
add C-API / ABI safe interop with python
paulgessinger 94f045a
bridge arrow schema into python cleanly
paulgessinger 6824502
add schema validation to ParquetWriter
paulgessinger 0c382f2
drop calo hit code
paulgessinger 552a7d8
make misconfig a hard error
paulgessinger ee33a72
clang-format
paulgessinger ce2ab37
reduce diff
paulgessinger 421fa0d
cmake lint
paulgessinger 5f14a9a
Merge remote-tracking branch 'paulgessinger/feat/arrow-plugin+convers…
benjaminhuth 7811dc8
add ColliderML → ACTS EDM converter and Kalman truth tracking demo
benjaminhuth 1651eb4
fix geo map matching and simhit/measurement correspondence; switch to…
benjaminhuth cc89ed9
store geo map as Parquet; absorb ColliderMLInputConverter into Arrow …
benjaminhuth 77000b5
refactor: ColliderML scripts to run-function style; separate slides
benjaminhuth 019c026
fix: serialize histograms to numpy arrays before pickling
benjaminhuth a76eed0
fix: handle all three histogram wrapper types in _serialize_hists
benjaminhuth 33b2fc8
fix: clean up performance plot style
benjaminhuth 0e34a38
feat: timestamp PDF filename in performance plots script
benjaminhuth 1821e68
fix: geo map matching + hard errors for g2l/unknown-geoId
benjaminhuth 9b1230f
fix: reliable geo map generation + hard errors in converter
benjaminhuth b905684
feat: proto-track efficiency evaluation + seeding vs KF comparison
benjaminhuth 44c837f
fix: evaluate efficiency against truth_seeded_particles denominator
benjaminhuth 355ec56
feat: title slide + step-function plot style
benjaminhuth e8fc04d
Merge upstream/main into feature/collider-ml-reader-arrow
benjaminhuth 1b7c564
update
benjaminhuth 134fa89
update unused files
benjaminhuth a9391d2
lint
benjaminhuth 5fce52a
Merge remote-tracking branch 'upstream/main' into feature/collider-ml…
benjaminhuth 494f445
remove unrelated stuff
benjaminhuth b1cf5ed
update
benjaminhuth a3e6abd
restore odd.py
benjaminhuth 4e995fc
address PR review comments: schema validation, cleanup
benjaminhuth 2636986
remove redundant schema validation from execute()
benjaminhuth 476a802
update
benjaminhuth File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -131,6 +131,7 @@ | |
| ".yml", | ||
| ".xml", | ||
| ".sh", | ||
| ".parquet", | ||
| ) | ||
|
|
||
|
|
||
|
|
||
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
148 changes: 148 additions & 0 deletions
148
Examples/Io/Parquet/include/ActsExamples/Io/Parquet/ColliderMLInputConverter.hpp
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,148 @@ | ||
| // This file is part of the ACTS project. | ||
| // | ||
| // Copyright (C) 2016 CERN for the benefit of the ACTS project | ||
| // | ||
| // This Source Code Form is subject to the terms of the Mozilla Public | ||
| // License, v. 2.0. If a copy of the MPL was not distributed with this | ||
| // file, You can obtain one at https://mozilla.org/MPL/2.0/. | ||
|
|
||
| #pragma once | ||
|
|
||
| #include "Acts/Geometry/GeometryIdentifier.hpp" | ||
| #include "Acts/Utilities/Logger.hpp" | ||
| #include "ActsExamples/Digitization/DigitizationConfig.hpp" | ||
| #include "ActsExamples/EventData/Measurement.hpp" | ||
| #include "ActsExamples/EventData/SimHit.hpp" | ||
| #include "ActsExamples/EventData/SimParticle.hpp" | ||
| #include "ActsExamples/EventData/TruthMatching.hpp" | ||
| #include "ActsExamples/Framework/DataHandle.hpp" | ||
| #include "ActsExamples/Framework/IAlgorithm.hpp" | ||
| #include "ActsPlugins/Arrow/ArrowUtil.hpp" | ||
|
|
||
| #include <cstdint> | ||
| #include <filesystem> | ||
| #include <memory> | ||
| #include <string> | ||
| #include <unordered_map> | ||
|
|
||
| namespace Acts { | ||
| class TrackingGeometry; | ||
| } | ||
|
|
||
| namespace ActsExamples { | ||
|
|
||
| /// Packed ColliderML geometry key: det(8b)|vol(8b)|layer(16b)|surface(32b). | ||
|
benjaminhuth marked this conversation as resolved.
Outdated
|
||
| inline std::uint64_t colliderMLGeoKey(std::uint8_t detector, | ||
| std::uint8_t volume, std::uint16_t layer, | ||
| std::uint32_t surface) { | ||
| return (static_cast<std::uint64_t>(detector) << 40) | | ||
| (static_cast<std::uint64_t>(volume) << 32) | | ||
| (static_cast<std::uint64_t>(layer) << 16) | | ||
| static_cast<std::uint64_t>(surface); | ||
| } | ||
|
|
||
| /// Load a ColliderML geometry ID map from a CSV file. | ||
|
benjaminhuth marked this conversation as resolved.
Outdated
|
||
| /// | ||
| /// Expected columns (with header): detector, volume, layer, surface, | ||
| /// acts_geo_id. The @c acts_geo_id value is parsed as a hex or decimal | ||
| /// unsigned 64-bit integer (e.g. @c 0x1000000000010001). | ||
| /// | ||
| /// @param path Path to the CSV file. | ||
| /// @return Map from packed ColliderML key to @c Acts::GeometryIdentifier. | ||
| std::unordered_map<std::uint64_t, Acts::GeometryIdentifier> | ||
| loadColliderMLGeoIdMap(const std::filesystem::path& path); | ||
|
|
||
| /// Convert ColliderML Arrow tables to ACTS EDM types. | ||
| /// | ||
| /// Reads two Arrow tables placed on the whiteboard by @c ParquetReader — | ||
| /// one for particles and one for tracker hits — and emits any combination | ||
| /// of @c SimParticleContainer, @c SimHitContainer, and | ||
| /// @c MeasurementContainer depending on which output keys are non-empty. | ||
| /// | ||
| /// Modes (controlled by which output keys are set in the config): | ||
| /// - Particles only: set @c outputParticles, leave hits keys empty. | ||
| /// - + SimHits: also set @c outputSimHits. | ||
| /// - + Measurements: also set @c outputMeasurements; requires | ||
| /// @c trackingGeometry and @c digiConfig. | ||
| /// | ||
| /// @note SimHit momentum fields are zero-filled; ColliderML does not | ||
| /// record per-hit momentum. | ||
| class ColliderMLInputConverter : public IAlgorithm { | ||
| public: | ||
| struct Config { | ||
| /// Whiteboard key for the particles Arrow table (from ParquetReader). | ||
| std::string inputParticlesTable; | ||
| /// Whiteboard key for the tracker-hits Arrow table (from ParquetReader). | ||
| std::string inputHitsTable; | ||
|
|
||
| /// Output key for @c SimParticleContainer. Empty = skip. | ||
| std::string outputParticles; | ||
| /// Output key for @c SimHitContainer. Empty = skip. | ||
| std::string outputSimHits; | ||
| /// Output key for @c MeasurementContainer. Empty = skip. | ||
| std::string outputMeasurements; | ||
| /// Output key for @c MeasurementSubset covering all measurements (required | ||
| /// by CKF / SpacePointMaker). Empty = skip. | ||
| std::string outputMeasurementSubset; | ||
| /// Output key for @c MeasurementSimHitsMap. Empty = skip. | ||
| std::string outputMeasSimHitsMap; | ||
| /// Output key for @c MeasurementParticlesMap (measurement→particle). | ||
| /// Empty = skip. | ||
| std::string outputMeasParticlesMap; | ||
| /// Output key for @c ParticleMeasurementsMap (particle→measurements). | ||
| /// Required by TruthTrackFinder / TruthSeeding. Empty = skip. | ||
| std::string outputParticleMeasurementsMap; | ||
|
|
||
| /// Required when @c outputMeasurements is non-empty. | ||
| std::shared_ptr<const Acts::TrackingGeometry> trackingGeometry; | ||
|
|
||
| /// Digitisation config used to determine subspace and covariance per | ||
| /// surface. Required when @c outputMeasurements is non-empty. | ||
| /// Load with @c readDigiConfigFromJson (acts.examples.json module). | ||
| /// The hierarchy map is queried per hit with full fallback | ||
| /// (sensitive → layer → volume). | ||
| DigiConfigContainer digiConfig; | ||
|
|
||
| /// ColliderML (det, vol, layer, surf) → ACTS GeometryIdentifier. | ||
| /// Optional. When empty, geometry IDs are constructed directly from the | ||
| /// ColliderML (volume, layer, surface) fields — only correct when the data | ||
| /// was produced from a geometry whose ID scheme matches the current build. | ||
| /// Load with @c loadColliderMLGeoIdMap(). | ||
| std::unordered_map<std::uint64_t, Acts::GeometryIdentifier> geoIdMap; | ||
| }; | ||
|
|
||
| ColliderMLInputConverter(const Config& cfg, | ||
| std::unique_ptr<const Acts::Logger> logger); | ||
|
|
||
| ColliderMLInputConverter(const Config& cfg, Acts::Logging::Level level); | ||
|
|
||
| ~ColliderMLInputConverter() override; | ||
|
|
||
| ProcessCode execute(const AlgorithmContext& ctx) const final; | ||
|
|
||
| const Config& config() const { return m_cfg; } | ||
|
|
||
| private: | ||
| Config m_cfg; | ||
|
|
||
| ReadDataHandle<ActsPlugins::ArrowUtil::ArrowTable> m_inputParticles{ | ||
| this, "InputParticles"}; | ||
| ReadDataHandle<ActsPlugins::ArrowUtil::ArrowTable> m_inputHits{this, | ||
| "InputHits"}; | ||
|
|
||
| WriteDataHandle<SimParticleContainer> m_outputParticles{this, | ||
| "OutputParticles"}; | ||
| WriteDataHandle<SimHitContainer> m_outputSimHits{this, "OutputSimHits"}; | ||
| WriteDataHandle<MeasurementContainer> m_outputMeasurements{ | ||
| this, "OutputMeasurements"}; | ||
| WriteDataHandle<MeasurementSubset> m_outputMeasurementSubset{ | ||
| this, "OutputMeasurementSubset"}; | ||
| WriteDataHandle<MeasurementSimHitsMap> m_outputMeasSimHitsMap{ | ||
| this, "OutputMeasSimHitsMap"}; | ||
| WriteDataHandle<MeasurementParticlesMap> m_outputMeasParticlesMap{ | ||
| this, "OutputMeasParticlesMap"}; | ||
| WriteDataHandle<ParticleMeasurementsMap> m_outputParticleMeasurementsMap{ | ||
| this, "OutputParticleMeasurementsMap"}; | ||
| }; | ||
|
|
||
| } // namespace ActsExamples | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the volume mapping? Should that be parquet? I think there's benefit in having this be ASCII, no? How large is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had it in CSV and thought I put it in parquet because its CSV with ten-thousands of lines of samples... but can do CSV as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, the CSV is roughly 500KB, vs 131KB, so CSV is acceptable.