Best way to pull large subsets of the data?

Question

1 Answer

Answer 1 · 2024-04-19T21:20:00+0000

Sure, here is a quick example:

sub_query = """
SELECT id FROM nsc_dr2.object
    WHERE class_star > 0.9
    AND ndet > 10
    LIMIT 10000
"""

query = f"""
SELECT m.objectid, m.ra, m.dec, m.exposure, m.mjd, m.mag_auto, m.magerr_auto, m.filter, o.class_star
    FROM nsc_dr2.meas AS m, nsc_dr2.object AS o
        WHERE m.objectid IN ({sub_query}) AND m.objectid=o.id
"""

This takes about 12 seconds to run synchronously, but takes a very long time, perhaps forever (I've not sat around waiting long enough to find out), to load the results upon completion of the asynchronous query. Eventually I would like to increase the LIMIT to something like 10^7, or even remove it altogether, but I cannot run such a query synchronously and I can't load the results for 10^4 objects asynchronously.

commented Apr 20, 2024 by adriansh (170 points)
edited Apr 20, 2024 by 0

Hi adriash,

We have a datalab branch that solves this problem, we haven't yet pushed it to the main branch since we want to get more feed back from users.

https://github.com/astro-datalab/datalab/tree/query_async_download

you can do a

git clone -b query_async_download https://github.com/astro-datalab/datalab

cd datalab

and install that version by running

python setup.py install

I tested it by running this query:

SELECT m.objectid, m.ra, m.dec, m.exposure, m.mjd, m.mag_auto, m.magerr_auto, m.filter, o.class_star
    FROM nsc_dr2.meas AS m, nsc_dr2.object AS o
        WHERE m.objectid IN (
SELECT id FROM nsc_dr2.object
    WHERE class_star > 0.9
    AND ndet > 10
    LIMIT 100000
) AND m.objectid=o.id

Notice the subquery's limit is 100k and returned almost 5 million rows. The async query took 16mins and downloading the results about 60s.

Let me us know if this help you.

Thank you,

Igor

commented Apr 22, 2024 by isuarezsola (890 points)

Best way to pull large subsets of the data?

Please log in or register to add a comment.

Your answer

1 Answer

Please log in or register to add a comment.

Categories