0 votes
1.2k views

I'm trying to run a large crossmatch (~100 000 objects) between a user uploaded table of eBOSS objects and the ls_dr7.tractor table. Submitting the query normally results in a "504 Gateway Time-out" error, so I tried submitting the same job with 'async_=True'. When I check the status using the jobId and queryClient.status() it returns "Error", but trying to check the cause using queryClient.error() returns "Error: The job has not yet completed".

How can I see what is going wrong with the query? Thanks.

by mchapman (130 points) | 1.2k views
Hi!

Thank you for contacting us!

Can you try your query again and comment back here when you start it?  This way we can observe it running?  Your query should have just given you an error as  jds.coadd_object_id is not a column in the inline jds table.  

Thanks again!
I just started it again, with that correction.
It has finished now, and the output appears to be correct. Some of the jobs I started when I made my previous comment also finished over the weekend, but that took considerably longer than expected.
Glad it worked!  Do you have an example query of one of the jobs that took longer than expected over the weekend?
I believe the first query was:

reduced_des_nn_query = '''SELECT eB.eboss_target_id,
                  eB.ra as eboss_ra, eB.dec as eboss_dec, jds.ra as des_ra, jds.dec as des_dec,
                  jds.flux_auto_g, jds.flux_auto_r, jds.flux_auto_z,
                  (q3c_dist(eB.ra,eB.dec,jds.ra,jds.dec)*3600) as dist_arcsec
                  FROM mydb://eboss22_elg_v4 AS eB
                  INNER JOIN LATERAL (
                      SELECT ds.ra, ds.dec, ds.flux_auto_g, ds.flux_auto_r, ds.flux_auto_z
                         FROM
                             mydb://reduced_des_dr1_galaxies AS ds
                         WHERE
                             q3c_join(eB.ra, eB.dec, ds.ra, ds.dec, 0.001388889)
                         ORDER BY q3c_dist(eB.ra, eB.dec, ds.ra, ds.dec)
                         ASC LIMIT 1
                  ) as jds ON true;'''

Where reduced_des_dr1_galaxies was a table I had made using an ra and dec cut on des_dr1.galaxies to reduce the number of objects. Is it possible that this job was blocking the other ones from running while it completed?
Thank you for that.  The longest executed query in the last week against reduced_des_dr1_galaxies took duration: 52085.396 ms or about 52 seconds.

It is possible a job will be blocked.  A query against a MyDB table has to finish before another can run against it.

In our next sprint we start today, we will develop a new feature in MyDB (with no interruption of service) that will allow multiple queries at the same time against the same MyDB table.

Hope this helps you.
That's good to know, I will keep that in mind when submitting queries. Thank you for all your help.

1 Answer

0 votes
Hi there,

Can you try running the query again?  For a reason that we don't quite understand, the query wasn't taking advantage of the Q3C spatial index, so was doing a scan of the full LS DR7 table.  But our DB architect has made a change to the database that should make sure that the index is used for this query.
by kolsen (2.7k points)
Yesterday at approximately 5:30pm (EST) I resubmitted the query, this time crossmatching using a sample of only 1000 of my eBOSS objects. I submitted it first as an asynchronous query, with the same strange error report detailed in my initial question, and afterwards as a regular query, which again received a "504 Gateway Time-out" error.

As of 11pm (EST) last night there were no results. This morning at around 9:30am (EST) the regular query had finished, while when trying to check the status of the asynchronous query the response is "Unable to complete your request: not found: not found: dm4vko7s1efwh06p" (where the final part is the jobId).

Do you know if the database change was successful, and I should expect similar run times when trying to complete the rest of the crossmatch, or if this was only the result of the smaller sample?
So I just ran a test of crossmatching a 100,000 object table in mydb against a table in Data Lab (my query was 100,000 objects from smash_dr1.object vs. all of gaia_dr2.gaia_source), and the all match query completed in 19 seconds, while the nearest neighbor query in ~5 seconds.  If the Q3C index is getting properly used, you should expect performance of that order of magnitude.  For the problems with your asynchronous query, we found old jobs running in the database, which we have terminated.  These may have been causing the trouble.  In any case, there shouldn't be a need to run an asynchronous job for this crossmatch query.
Thank you, it appears to be working properly now for the full nearest neighbour crossmatch.
Glad to hear it!
The next part of my analysis is to perform the same sort of crossmatch on DES DR1 data, but this query is taking a large amount of time again. I noticed in the Query Interface column information for des_dr1 that the descriptions for the ra and dec columns say they are optimized to be indexed, but they aren't bolded to show this is the case. Are they not in fact indexed, and that's why the query is taking longer?

I've tried running it against des_dr1.main, and against a table I have stored in mydb which is a cut in RA and DEC of the des_dr1.galaxies table to limit it to my area of interest (reduces number for rows by about factor of 10), to try to speed it up by just running it against a smaller table.
So the Query Interface index information is sometimes not captured correctly, and this seems to be one of those cases.  We'll look at why that is.  But the query should run quickly.  I tried this query:

sql = '''SELECT  o.id, gg.coadd_object_id, (q3c_dist(o.ra,o.dec,gg.ra,gg.dec)*3600.0) as dist_arcsec
         FROM mydb://smash100000 AS o
         LEFT JOIN LATERAL (
               SELECT g.*
                    FROM
                        des_dr1.main AS g
                    WHERE
                        q3c_join(o.ra, o.dec, g.ra, g.dec, 0.01)
                    ORDER BY
                        q3c_dist(o.ra,o.dec,g.ra,g.dec)
                    ASC LIMIT 1
               ) as gg ON true;'''


which takes a 100,000 row table in my mydb and does a nearest neighbor search against des_dr1.main, and it runs in 1.5 seconds.  Could you post your query here?
My query is:

des_nn_query = '''SELECT eB.eboss_target_id, jds.coadd_object_id,
                  eB.ra as eboss_ra, eB.dec as eboss_dec, jds.ra as des_ra, jds.dec as des_dec,
                  jds.flux_auto_g, jds.flux_auto_r, jds.flux_auto_z,
                  (q3c_dist(eB.ra,eB.dec,jds.ra,jds.dec)*3600) as dist_arcsec
                  FROM mydb://eboss22_elg_v4 AS eB
                  INNER JOIN LATERAL (
                      SELECT ds.ra, ds.dec, ds.flux_auto_g, ds.flux_auto_r, ds.flux_auto_z
                         FROM
                             des_dr1.main AS ds
                         WHERE
                             q3c_join(eB.ra, eB.dec, ds.ra, ds.dec, 0.001388889)
                         ORDER BY q3c_dist(eB.ra, eB.dec, ds.ra, ds.dec)
                         ASC LIMIT 1
                  ) as jds ON true;'''

which I run using the line:

des_nn_job = qc.query(sql = des_nn_query, out="mydb://eboss22_full_des_nn_xmatch")

The first time I ran it the second table 'des_dr1.main AS ds' was instead 'mydb://reduced_des_dr1_galaxies AS ds', and I didn't save the coadd_object_id, as I explained above.

It seems identical to your query, excepting of course the different columns saved and that I used an INNER JOIN to only keep the matched rows. Is there some additional step or clean up I'm missing?

449 questions

463 answers

473 comments

658 users

Welcome to Data Lab Help Desk, where you can ask questions and receive answers from other members of the community.

Categories