Commit 90e3c0e
committed
ingester: cache list of files, speedup on large queue
Before: Every 5-second cycle calls os.scandir() on the 2M-entry directory.
Each scan enumerates all entries via readdir(), which is extremely slow
on a flat directory that large. Might take 20-30 seconds.
After:
1. scandir() runs once, caching all .json entries
2. Each cycle pops a chunk of up to INGEST_CYCLE_BATCH_SIZE (default 50,000) files from the cache and processes them
3. Re-scan only happens when the cache is fully drained
4. On scandir error, cache is cleared → forces re-scan next cycle
With 2M files: one scan instead of ~40 scans (2M / 50K chunks = 40 cycles of scan-free processing).
If each scandir of 2M entries takes ~30 seconds, that saves ~20 minutes of pure directory enumeration overhead.
The INGEST_CYCLE_BATCH_SIZE(50k default) is configurable via env var if you want to tune the chunk size.
Also note this fixed a latent bug in the old code where json_files would retain stale data from the previous
iteration if scandir threw an exception (the except: pass didn't reset it).
Signed-off-by: Denys Fedoryshchenko <denys.f@collabora.com>1 parent df9a5f7 commit 90e3c0e
File tree
5 files changed
+118
-75
lines changed- backend/kernelCI_app
- constants
- management/commands
- helpers
- tests
- performanceTests
- unitTests/commands/monitorSubmissions
5 files changed
+118
-75
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
43 | 43 | | |
44 | 44 | | |
45 | 45 | | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
46 | 54 | | |
47 | 55 | | |
48 | 56 | | |
| |||
Lines changed: 9 additions & 9 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
5 | 4 | | |
6 | 5 | | |
7 | 6 | | |
| |||
487 | 486 | | |
488 | 487 | | |
489 | 488 | | |
490 | | - | |
| 489 | + | |
491 | 490 | | |
492 | 491 | | |
493 | 492 | | |
| |||
504 | 503 | | |
505 | 504 | | |
506 | 505 | | |
507 | | - | |
| 506 | + | |
508 | 507 | | |
509 | | - | |
510 | | - | |
511 | | - | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
512 | 511 | | |
| 512 | + | |
513 | 513 | | |
514 | 514 | | |
515 | | - | |
516 | | - | |
517 | | - | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
518 | 518 | | |
519 | 519 | | |
520 | 520 | | |
| |||
Lines changed: 69 additions & 30 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| 13 | + | |
13 | 14 | | |
14 | 15 | | |
15 | 16 | | |
| |||
50 | 51 | | |
51 | 52 | | |
52 | 53 | | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
53 | 98 | | |
54 | 99 | | |
55 | 100 | | |
| |||
89 | 134 | | |
90 | 135 | | |
91 | 136 | | |
92 | | - | |
93 | | - | |
94 | | - | |
95 | | - | |
96 | | - | |
97 | | - | |
98 | | - | |
99 | | - | |
100 | | - | |
101 | | - | |
102 | | - | |
103 | | - | |
| 137 | + | |
104 | 138 | | |
105 | 139 | | |
106 | 140 | | |
| |||
120 | 154 | | |
121 | 155 | | |
122 | 156 | | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
123 | 160 | | |
124 | 161 | | |
125 | 162 | | |
126 | | - | |
127 | | - | |
128 | | - | |
129 | | - | |
130 | | - | |
131 | | - | |
132 | | - | |
133 | | - | |
134 | | - | |
135 | | - | |
136 | | - | |
137 | | - | |
138 | | - | |
139 | | - | |
140 | | - | |
141 | | - | |
142 | | - | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
143 | 181 | | |
144 | | - | |
| 182 | + | |
145 | 183 | | |
146 | 184 | | |
147 | 185 | | |
148 | 186 | | |
| 187 | + | |
149 | 188 | | |
150 | 189 | | |
151 | 190 | | |
| |||
Lines changed: 20 additions & 22 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
37 | | - | |
| 37 | + | |
38 | 38 | | |
39 | 39 | | |
40 | | - | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
41 | 43 | | |
42 | 44 | | |
43 | 45 | | |
| |||
102 | 104 | | |
103 | 105 | | |
104 | 106 | | |
105 | | - | |
106 | | - | |
107 | | - | |
| 107 | + | |
108 | 108 | | |
109 | 109 | | |
110 | 110 | | |
| |||
212 | 212 | | |
213 | 213 | | |
214 | 214 | | |
215 | | - | |
216 | | - | |
217 | | - | |
218 | | - | |
| 215 | + | |
| 216 | + | |
219 | 217 | | |
220 | | - | |
221 | | - | |
222 | | - | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
223 | 221 | | |
224 | 222 | | |
225 | 223 | | |
| |||
242 | 240 | | |
243 | 241 | | |
244 | 242 | | |
245 | | - | |
| 243 | + | |
246 | 244 | | |
247 | 245 | | |
248 | 246 | | |
| |||
260 | 258 | | |
261 | 259 | | |
262 | 260 | | |
263 | | - | |
| 261 | + | |
264 | 262 | | |
265 | | - | |
266 | | - | |
267 | | - | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
268 | 266 | | |
269 | 267 | | |
270 | 268 | | |
| |||
279 | 277 | | |
280 | 278 | | |
281 | 279 | | |
282 | | - | |
| 280 | + | |
283 | 281 | | |
284 | 282 | | |
285 | 283 | | |
| |||
365 | 363 | | |
366 | 364 | | |
367 | 365 | | |
368 | | - | |
| 366 | + | |
369 | 367 | | |
370 | | - | |
371 | | - | |
372 | | - | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
373 | 371 | | |
374 | 372 | | |
375 | 373 | | |
| |||
Lines changed: 12 additions & 14 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
790 | 790 | | |
791 | 791 | | |
792 | 792 | | |
793 | | - | |
794 | | - | |
795 | | - | |
| 793 | + | |
| 794 | + | |
796 | 795 | | |
797 | 796 | | |
798 | 797 | | |
799 | 798 | | |
800 | 799 | | |
801 | 800 | | |
802 | 801 | | |
| 802 | + | |
803 | 803 | | |
804 | 804 | | |
| 805 | + | |
805 | 806 | | |
806 | 807 | | |
807 | 808 | | |
| |||
810 | 811 | | |
811 | 812 | | |
812 | 813 | | |
| 814 | + | |
| 815 | + | |
| 816 | + | |
| 817 | + | |
| 818 | + | |
| 819 | + | |
813 | 820 | | |
814 | 821 | | |
815 | 822 | | |
816 | 823 | | |
817 | 824 | | |
818 | 825 | | |
819 | | - | |
820 | | - | |
821 | | - | |
822 | | - | |
823 | | - | |
824 | | - | |
825 | | - | |
| 826 | + | |
826 | 827 | | |
827 | 828 | | |
828 | 829 | | |
| |||
852 | 853 | | |
853 | 854 | | |
854 | 855 | | |
855 | | - | |
856 | | - | |
857 | | - | |
858 | | - | |
| 856 | + | |
859 | 857 | | |
860 | 858 | | |
861 | 859 | | |
| |||
0 commit comments