Optimizing the cutout download procedure for large machine learning datasets

Question

82 views

Hello! I am downloading many thousands of cutouts of galaxies from the DES DR2, and as it stands this is a process that takes multiple days. This is even with parallelization of the downloading procedure by spawning several python processes in a shell script loop. The download process is on the order of a couple of minutes for batches of 20 images, where each image is being downloaded by a separate python process. I have made sure that neither the CPU nor RAM are bottlenecks. Is there some way to optimize this procedure through datalab? I am aware of the existence of pre-created datasets such as through galaxy zoo, however I explicitly must work through FITS files, which is why I have avoided these sets.

asked Nov 18, 2024 by anonymous | 82 views

Optimizing the cutout download procedure for large machine learning datasets

Please log in or register to add a comment.

Your answer

0 Answers

Categories