0 votes
20 views

I am attempting to retrieve stars near the sun from Gaia data. Logged into the Astro datalab in a browser.

query = """

SELECT parallax

FROM gaia_dr3.gaia_source

WHERE parallax > 500.0 and parallax != 'nan'

LIMIT 200

"""

Successfully retrieves the expected two stars' parallaxes in about 26 seconds.

query="""

SELECT parallax, ra, dec

    FROM gaia_dr3.gaia_source

    WHERE parallax > 500.0

    LIMIT 200

"""

Retrieves the two stars parallax and ra, dec info but also many "NaN" entries.

query="""

SELECT parallax, ra, dec

    FROM gaia_dr3.gaia_source

    WHERE parallax > 500.0 and parallax != 'nan"

    LIMIT 200

"""

This times out after 300 seconds. The first query got the data in 26 seconds so I wouldn't expect this query to take much more than that.

What am I doing wrong?

Thanks,

Mike Burns

by mikeburns (150 points) | 20 views

1 Answer

0 votes
Best answer
Hi Mike,

Thanks for the question.  The timing difference comes down to the request for the extra (ra,dec) values in the query.  In the first case, the database is doing an "Index Only" scan for the query and retrieves the value directly from the index.  When you add the (ra,dec) values it still has to do the same filtering on the parallax but then access the full table for the additional columns but then has to process a check for NaN on each value.

The fastest way I found to exclude the NaN is to instead set an upper bound on the parallax as a real value, e.g. "where parallax > 500.0 and parallax < 1000.", this still permits the index-only scan of the table and returns almost instantly.  I hope this helps, let us know if you still have problems or questions.

Cheers,

-Mike
by datalab (21.5k points)
selected by 0
Excellent. I now understand and have a solution to my query attempt.

THanks,
Mike

440 questions

456 answers

462 comments

656 users

Welcome to Data Lab Help Desk, where you can ask questions and receive answers from other members of the community.

Categories