0 votes


I'm working on a catalog for 4MOST and I need to do an all-sky sample containing crossmatched Legacy Survey, DELVE, VHS and CatWISE (for simplicity I'm using the NSC DR2 match that already exists on the Noirlab database). This is too large task for synchronous queries, so I use asynchronous ones. However, most of the asynchronous queries, not even allsky, but for a small area of the sky (which should be done quickly) do not give any result or error for several days and freeze on "EXECUTING" status. All queries are done through the Jupyter notebook on the Data Lab website. 

Here's an example of one of the "frozen" queries for a small area of the sky (I ran a similar one two days ago and by today nothing has happened, then I aborted and restarted it in the hope that it would change something). I also tried a lot of similar queries for other (and bigger) parts of the sky and/or with left joins.

sql = '''SELECT 
ls.ls_id, ls.ra, ls.dec, ls.flux_g, ls.flux_ivar_g, ls.flux_r, ls.flux_ivar_r, ls.flux_z, ls.flux_ivar_z,
delve.quick_object_id, delve.mag_auto_g, delve.magerr_auto_g, delve.mag_auto_i, 
delve.magerr_auto_i, delve.mag_auto_r, delve.magerr_auto_r, delve.mag_auto_z, delve.magerr_auto_z,
vhs.sourceid AS sourceid_vhs, vhs.ypetromag, vhs.ypetromagerr, vhs.jpetromag, vhs.jpetromagerr, 
vhs.hpetromag, vhs.hpetromagerr, vhs.kspetromag, vhs.kspetromagerr, 
cat.source_id AS source_id_catwise, cat.w1mpro, cat.w1sigmpro, cat.w2mpro, cat.w2sigmpro
FROM nsc_dr2.object AS nsc
INNER JOIN nsc_dr2.x1p5__object__ls_dr9__tractor AS a ON nsc.id=a.id1
INNER JOIN nsc_dr2.x1p5__object__delve_dr2__objects AS b ON nsc.id=b.id1
INNER JOIN nsc_dr2.x1p5__object__vhs_dr5__vhs_cat_v3 AS f ON nsc.id=f.id1 
INNER JOIN nsc_dr2.x1p5__object__catwise2020__main AS j ON nsc.id=j.id1
INNER JOIN ls_dr9.tractor AS ls ON a.id2=ls.ls_id
INNER JOIN delve_dr2.objects AS delve ON b.id2=delve.quick_object_id
INNER JOIN vhs_dr5.vhs_cat_v3 AS vhs ON f.id2=vhs.sourceid
INNER JOIN catwise2020.main AS cat ON j.id2=cat.source_name 
WHERE (nsc.ra>0 OR nsc.ra<1) AND nsc.dec<20'''
jobid13 = qc.query(sql=sql, async_=True, timeout=86400, fmt='csv', 

CPU times: user 25.3 ms, sys: 4.29 ms, total: 29.6 ms
Wall time: 362 ms


JobID                 Start             End               Status              
-----                 -----             ---               ------              
oszigbl8r0r5bgvi      10/28 08:01:08                      EXECUTING

Is there any way to solve this problem? Maybe there are mistakes in my query, or I need to use some other service instead of jupyter notebook for such queries, or it takes longer to execute such tasks? (how to estimate the amount of time needed then?)

asked Oct 28 by vtoptun (340 points) | 25 views

1 Answer

+1 vote
Best answer
Thanks for reaching out. I think your query overall is alright, except that without chunking it might be too big to execute in one go. And when you tried to run a small chunk of ra, you used OR in

WHERE (nsc.ra>0 OR nsc.ra<1)

while it should have been AND, correct?

With AND, you query above ran to completion for me in about 10 minutes, and the resulting file was created on vospace. It is 137MB in size.

So you might want to break up the big query into RA slices indeed, and, if appropriate, use additional constraints in the WHERE clause to reduce the data volume (maybe constraints on SNR, color, etc.) Just jeep in mind that if you specify constraints you want to ensure that they are on indexed columns only. With non-indexed columns in WHERE the query will be much much slower (practically impossible to finish). Indexed columns are the ones shown in bold font on our website Table Browser.

Please give it another try and let us know if you are still seeing problems.

answered Oct 28 by robertdemo (5,120 points)
selected Nov 14 by vtoptun

349 questions

363 answers


2,368 users

Welcome to Data Lab Help Desk, where you can ask questions and receive answers from other members of the community.