0 votes
1 view

Hello Data Lab team,

I am trying to crossmatch three user-uploaded coordinate tables against ls_dr10.tractor using async/MyDB. I would appreciate your advice on whether this workflow is supported, what chunk size/concurrency limits I should use, and whether there may currently be a TAP async or MyDB metadata issue affecting my account.

Input MyDB tables:

mlspecz_north_trainval              395,097 rows
mlspecz_edfn_layered              1,045,139 rows
mlspecz_wide_north_application    9,918,626 rows

Columns: 
object_id, ra, dec, sample_name, xmatch_chunk

Target catalog: ls_dr10.tractor

The full async crossmatch query for the smallest table failed immediately with "502 Bad Gateway". No async job ID was returned. This suggests the failure occurred during request handling, query planning, or async job creation, rather than during normal job execution.

The query was:

SELECT
  u.object_id,
  u.ra AS input_ra,
  u.dec AS input_dec,
  u.xmatch_chunk,
  q3c_dist(u.ra, u.dec, t.ra, t.dec) * 3600.0 AS sep_arcsec,
  t.ls_id,
  t.ra,
  t.dec,
  t.type,
  t.brick_primary,
  t.release,
  t.brickid,
  t.objid,
  t.maskbits,
  t.fitbits,
  t.flux_g,
  t.flux_r,
  t.flux_i,
  t.flux_z,
  t.flux_w1,
  t.flux_w2,
  t.flux_w3,
  t.flux_w4,
  t.flux_ivar_g,
  t.flux_ivar_r,
  t.flux_ivar_i,
  t.flux_ivar_z,
  t.flux_ivar_w1,
  t.flux_ivar_w2,
  t.flux_ivar_w3,
  t.flux_ivar_w4,
  t.wisemask_w1,
  t.wisemask_w2
FROM mydb://mlspecz_north_trainval AS u
JOIN ls_dr10.tractor AS t
  ON q3c_join(u.ra, u.dec, t.ra, t.dec, 0.0002777777777777778)
WHERE t.brick_primary = 1

The crossmatch radius is 1 arcsec. I also tried rewriting the join using a WHERE-style q3c condition, but got the same 502 error.

To reduce query size, I added object-count chunking using xmatch_chunk and attempted to create MyDB indexes with:

qc.mydb_index(table, 'xmatch_chunk', async_=True)
qc.mydb_index(table, q3c='ra,dec', cluster=True, async_=True)

However, my first chunked notebook submitted many async jobs quickly, which may have overloaded or congested the service. I have now aborted all jobs submitted by me and stopped submitting new chunk batches.

I then checked: https://datalab.noirlab.edu/tap/async

There are still many jobs in QUEUED status, but they appear to be mostly from other users. There are currently no jobs with status EXECUTING. Whenever I submit a new job now, it remains in QUEUED.


There is also a separate MyDB/Data Explorer issue. 

After I tried to remove/re-upload or clean up my uploaded MyDB tables, the MyDB panel in the Data Explorer web interface became empty. The MyDB section is visible, but no table names are listed. However, I can still access the tables from Jupyter notebooks, so the tables do not appear to be lost.

Also, I previously removed this table:

mlspecz_lsdr10_north_trainval

but this table name still appears in qc.mydb.list(), while the table itself is no longer accessible. This looks like a Data Explorer/MyDB metadata listing issue, stale cache, or possible MyDB registry inconsistency.

Could you please advise on the following?

1. Is a MyDB-to-ls_dr10.tractor Q3C crossmatch of this size, with user tables larger than 10^5 rows, expected to work through Data Lab async?

2. If the full-table query is not feasible, what chunk size and async-job concurrency limit do you recommend?

3. Can you check whether the TAP async queue is currently stalled or not dispatching jobs?

4. Can you check whether my MyDB metadata is inconsistent, given that my tables are accessible from notebooks but no longer appear in the Data Explorer MyDB panel?

For now I have paused the full workflow to avoid submitting more jobs until I know the recommended limits and whether the current queue/MyDB behaviour is normal.

Thank you very much.

by rudolffu (120 points) | 1 view
---------------------------------------------------------------------------
queryClientError                          Traceback (most recent call last)
Cell In[60], line 1
----> 1 job0 = qc.query(adql=query1, async_=True, fmt='votable')

File /data0/sw/anaconda3/lib/python3.10/site-packages/dl/Util.py:84, in MultiMethod.__call__(self, *args, **kw)
     82     return function(self.obj, *args, **kw)
     83 else:
---> 84     return function(*args, **kw)

File /data0/sw/anaconda3/lib/python3.10/site-packages/dl/queryClient.py:627, in query(token, adql, sql, fmt, out, async_, drop, profile, **kw)
    495 @multimethod('qc',0,False)
    496 def query(token=None, adql=None, sql=None, fmt='csv', out=None, async_=False, drop=False,
    497            profile='default', **kw):
    498     '''Send an SQL or ADQL query to the database or TAP service.
    499
    500     Usage::
   (...)
    625           314.1519366421579,37.33533842266872
    626     '''
--> 627     return qc_client._query (token=def_token(token), adql=adql, sql=sql,
    628                              fmt=fmt, out=out, async_=async_, drop=drop,
    629                              profile=profile,
    630                              **kw)

File /data0/sw/anaconda3/lib/python3.10/site-packages/dl/queryClient.py:2185, in queryClient._query(self, token, adql, sql, fmt, out, async_, drop, profile, **kw)
   2183 r = requests.get (dburl, headers=headers, timeout=timeout)
   2184 if r.status_code != 200:
-> 2185     raise queryClientError (r.text)
   2187 # N.B. Previously we converted the response to string from, presumably,
   2188 # byte string, here, but sometimes the response content is a file
   2189 # in byte format that can't be necessarily converted to string. So
   2190 # now the conversion happens downstream on an as-needed basis.
   2192 resp = r.content

queryClientError: Error: <html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.28.0</center>
</body>
</html>

1 Answer

0 votes

Hi, thanks for your question. I ran several tests using our async service with the query you provided and encountered the same behavior, either timeouts or jobs that run indefinitely. I’m not sure what is currently causing the issue with async queries, but I was able to find a workaround that seems to work well.

The approach is to first use the X-match Service tool to retrieve only the minimum required columns:

  • object_id from mlspecz_north_trainval
  • ls_id, ra, and dec from ls_dr10.tractor

Save that X-match result to your MyDB, then in a Jupyter notebook run a double JOIN query between:

  1. the X-match table,
  2. ls_dr10.tractor, and
  3. your original mlspecz_north_trainval table

That lets you pull in the rest of the columns much more efficiently.

Example query:

q = """
SELECT
  m.t1_object_id,
  m.dist_arcsec,
  t.ls_id,
  t.ra,
  t.dec,
  t.type,
  t.brick_primary,
  t.release,
  t.brickid,
  t.objid,
  t.maskbits,
  t.fitbits,
  t.flux_g,
  t.flux_r,
  t.flux_i,
  t.flux_z,
  t.flux_w1,
  t.flux_w2,
  t.flux_w3,
  t.flux_w4,
  t.flux_ivar_g,
  t.flux_ivar_r,
  t.flux_ivar_i,
  t.flux_ivar_z,
  t.flux_ivar_w1,
  t.flux_ivar_w2,
  t.flux_ivar_w3,
  t.flux_ivar_w4,
  t.wisemask_w1,
  t.wisemask_w2,
  m0.sample_name,
  m0.xmatch_chunk
FROM mydb://mlspecz_north_trainval_xmatch AS m
JOIN ls_dr10.tractor AS t
  ON m.ls_id = t.ls_id
JOIN mydb://mlspecz_north_trainval AS m0
  ON m0.object_id = m.t1_object_id
WHERE t.brick_primary = 1
"""

I tested this with a MyDB table of about 400,000 rows. The X-match step finished in under 5 minutes, and the double JOIN query took less than a minute.

Give that workflow a try and see if it works for you.

by ajacques (2.0k points)

544 questions

556 answers

517 comments

723 users

Welcome to Data Lab Help Desk, where you can ask questions and receive answers from other members of the community.

Categories