0 votes
562 views

Hi, 

I am trying to do the following query  in Astro Data Lab jupyter notebook. 

query = """
select TOP 10 a.designation, a.ra, a.dec, a.sigra, a.sigdec, 
a.w1mpro, a.w1sigmpro, a.w1snr, a.w2mpro, a.w2sigmpro, a.w2snr, a.w3mpro, a.w3sigmpro, a.w3snr, a.w4mpro,a.w4sigmpro, a.w4snr,
a.ext_flg, a.var_flg as var_flag, a.ph_qual

from allwise.source a, mydb://wise_new_objs u
where a.designation = u.designation
"""

result = qc.query (sql=query, timeout=600)

I get the following error.

ReadTimeout: HTTPSConnectionPool(host='datalab.noirlab.edu', port=443): Read timed out. (read timeout=600)

When I only select the top five rows in the table wise_new_objs, there is no such error and the query works perfectly. Selecting more than five rows results in the timeout error as above. 

wise_new_objs table consists of 200k rows. I want to select 1000s of rows or all 200k rows at once instead of just five rows to X-match with allwise.source. What is the most effective way to do so? 

by | 562 views

1 Answer

0 votes

Hi, thanks for reaching out. The reason for the slowness and timeouts is that the matching column you are using, 'designation', is not indexed in our database. Indexed columns can be orders of magnitudes faster.

You can check which columns are indexed in our schema browser. All column names printed in bold font are indexed:
https://datalab.noirlab.edu/query.php?name=allwise.source

Would another column work for your case? For instance 'source_id' ?

query = """select ...
where a.source_id = u.source_id"""

This runs in under 5 seconds for a mydb table that is 200K rows long.

Please let us know if this was helpful,

Robert

by robertdemo (8.9k points)
Great!! I can download the data very fast using the 'source_id' column.
Thank you for your help.

449 questions

463 answers

473 comments

658 users

Welcome to Data Lab Help Desk, where you can ask questions and receive answers from other members of the community.

Categories