-
Notifications
You must be signed in to change notification settings - Fork 26
Add checkpointing to metagraph build #197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 16 commits
8567bff
97294cd
1f731ff
7e981ba
b9fbc53
d83368d
0b27107
7d53b52
7476e7a
fa07b12
8301c22
71ac989
12e1ada
da1aea6
bc63e50
0756a42
a2e6c0f
15b42da
8a9121b
9fabdb5
a4fb98c
be4d767
89d1588
4c73921
6b0b9d9
caaf272
5dfcee8
62ec669
9cf55a9
88748b9
2792ab0
a40dea5
7600830
c4c43ec
007845f
866e786
d52ca9e
5a3a8a3
ad6d965
6493adb
57c111c
e6c0d95
7cd5b63
80f1e48
de1173c
fb0ad63
5e42d10
d728edd
79d97e4
03d5b2d
e4aed78
82a9e8e
76c0d1b
2a5b52f
dc278db
98ad5da
8c953a6
c238b20
0d399b7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -64,13 +64,15 @@ std::vector<std::string> SortedSetDiskBase<T>::files_to_merge() { | |
| } | ||
|
|
||
| template <typename T> | ||
| void SortedSetDiskBase<T>::clear(const std::filesystem::path &tmp_path) { | ||
| void SortedSetDiskBase<T>::clear(const std::filesystem::path &tmp_path, bool remove_files) { | ||
| std::unique_lock<std::mutex> exclusive_lock(mutex_); | ||
| std::unique_lock<std::shared_timed_mutex> multi_insert_lock(multi_insert_mutex_); | ||
| is_merging_ = false; | ||
| // remove the files that have not been requested to merge | ||
| for (const auto &chunk_file : get_file_names()) { | ||
| std::filesystem::remove(chunk_file); | ||
| if (remove_files) { | ||
| // remove the files that have not been requested to merge | ||
| for (const auto &chunk_file : get_file_names()) { | ||
| std::filesystem::remove(chunk_file); | ||
| } | ||
| } | ||
| chunk_count_ = 0; | ||
| l1_chunk_count_ = 0; | ||
|
|
@@ -91,7 +93,7 @@ void SortedSetDiskBase<T>::start_merging_async() { | |
| async_worker_.enqueue([file_names, this]() { | ||
| std::function<void(const T &)> on_new_item | ||
| = [this](const T &v) { merge_queue_.push(v); }; | ||
| merge_files(file_names, on_new_item); | ||
| merge_files(file_names, on_new_item, false); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Who is responsible for cleaning these files up? Does it mean that it never removes the old temp files until all the k-mers are sorted, and thus, the disk usage has grown a lot?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The only files affected by this change are the original collected chunks. These will be indeed deleted after checkpoint 5 instead of after checkpoint 2. The timespan between the files being deleted and the merge queue being emptied is quite short, so the probability of this happening is low, but at the same time I cannot leave this flaw in the code with a clear conscience. Since |
||
| merge_queue_.shutdown(); | ||
| }); | ||
| } | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.