0 votes
1 view

Hello Data Lab team,

I am trying to crossmatch three user-uploaded coordinate tables against ls_dr10.tractor using async/MyDB. I would appreciate your advice on whether this workflow is supported, what chunk size/concurrency limits I should use, and whether there may currently be a TAP async or MyDB metadata issue affecting my account.

Input MyDB tables:

mlspecz_north_trainval              395,097 rows
mlspecz_edfn_layered              1,045,139 rows
mlspecz_wide_north_application    9,918,626 rows

Columns: 
object_id, ra, dec, sample_name, xmatch_chunk

Target catalog: ls_dr10.tractor

The full async crossmatch query for the smallest table failed immediately with "502 Bad Gateway". No async job ID was returned. This suggests the failure occurred during request handling, query planning, or async job creation, rather than during normal job execution.

The query was:

SELECT
  u.object_id,
  u.ra AS input_ra,
  u.dec AS input_dec,
  u.xmatch_chunk,
  q3c_dist(u.ra, u.dec, t.ra, t.dec) * 3600.0 AS sep_arcsec,
  t.ls_id,
  t.ra,
  t.dec,
  t.type,
  t.brick_primary,
  t.release,
  t.brickid,
  t.objid,
  t.maskbits,
  t.fitbits,
  t.flux_g,
  t.flux_r,
  t.flux_i,
  t.flux_z,
  t.flux_w1,
  t.flux_w2,
  t.flux_w3,
  t.flux_w4,
  t.flux_ivar_g,
  t.flux_ivar_r,
  t.flux_ivar_i,
  t.flux_ivar_z,
  t.flux_ivar_w1,
  t.flux_ivar_w2,
  t.flux_ivar_w3,
  t.flux_ivar_w4,
  t.wisemask_w1,
  t.wisemask_w2
FROM mydb://mlspecz_north_trainval AS u
JOIN ls_dr10.tractor AS t
  ON q3c_join(u.ra, u.dec, t.ra, t.dec, 0.0002777777777777778)
WHERE t.brick_primary = 1

The crossmatch radius is 1 arcsec. I also tried rewriting the join using a WHERE-style q3c condition, but got the same 502 error.

To reduce query size, I added object-count chunking using xmatch_chunk and attempted to create MyDB indexes with:

qc.mydb_index(table, 'xmatch_chunk', async_=True)
qc.mydb_index(table, q3c='ra,dec', cluster=True, async_=True)

However, my first chunked notebook submitted many async jobs quickly, which may have overloaded or congested the service. I have now aborted all jobs submitted by me and stopped submitting new chunk batches.

I then checked: https://datalab.noirlab.edu/tap/async

There are still many jobs in QUEUED status, but they appear to be mostly from other users. There are currently no jobs with status EXECUTING. Whenever I submit a new job now, it remains in QUEUED.


There is also a separate MyDB/Data Explorer issue. 

After I tried to remove/re-upload or clean up my uploaded MyDB tables, the MyDB panel in the Data Explorer web interface became empty. The MyDB section is visible, but no table names are listed. However, I can still access the tables from Jupyter notebooks, so the tables do not appear to be lost.

Also, I previously removed this table:

mlspecz_lsdr10_north_trainval

but this table name still appears in qc.mydb.list(), while the table itself is no longer accessible. This looks like a Data Explorer/MyDB metadata listing issue, stale cache, or possible MyDB registry inconsistency.

Could you please advise on the following?

1. Is a MyDB-to-ls_dr10.tractor Q3C crossmatch of this size, with user tables larger than 10^5 rows, expected to work through Data Lab async?

2. If the full-table query is not feasible, what chunk size and async-job concurrency limit do you recommend?

3. Can you check whether the TAP async queue is currently stalled or not dispatching jobs?

4. Can you check whether my MyDB metadata is inconsistent, given that my tables are accessible from notebooks but no longer appear in the Data Explorer MyDB panel?

For now I have paused the full workflow to avoid submitting more jobs until I know the recommended limits and whether the current queue/MyDB behaviour is normal.

Thank you very much.

ago by rudolffu (160 points) | 1 view
---------------------------------------------------------------------------
queryClientError                          Traceback (most recent call last)
Cell In[60], line 1
----> 1 job0 = qc.query(adql=query1, async_=True, fmt='votable')

File /data0/sw/anaconda3/lib/python3.10/site-packages/dl/Util.py:84, in MultiMethod.__call__(self, *args, **kw)
     82     return function(self.obj, *args, **kw)
     83 else:
---> 84     return function(*args, **kw)

File /data0/sw/anaconda3/lib/python3.10/site-packages/dl/queryClient.py:627, in query(token, adql, sql, fmt, out, async_, drop, profile, **kw)
    495 @multimethod('qc',0,False)
    496 def query(token=None, adql=None, sql=None, fmt='csv', out=None, async_=False, drop=False,
    497            profile='default', **kw):
    498     '''Send an SQL or ADQL query to the database or TAP service.
    499
    500     Usage::
   (...)
    625           314.1519366421579,37.33533842266872
    626     '''
--> 627     return qc_client._query (token=def_token(token), adql=adql, sql=sql,
    628                              fmt=fmt, out=out, async_=async_, drop=drop,
    629                              profile=profile,
    630                              **kw)

File /data0/sw/anaconda3/lib/python3.10/site-packages/dl/queryClient.py:2185, in queryClient._query(self, token, adql, sql, fmt, out, async_, drop, profile, **kw)
   2183 r = requests.get (dburl, headers=headers, timeout=timeout)
   2184 if r.status_code != 200:
-> 2185     raise queryClientError (r.text)
   2187 # N.B. Previously we converted the response to string from, presumably,
   2188 # byte string, here, but sometimes the response content is a file
   2189 # in byte format that can't be necessarily converted to string. So
   2190 # now the conversion happens downstream on an as-needed basis.
   2192 resp = r.content

queryClientError: Error: <html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.28.0</center>
</body>
</html>

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
Anti-spam verification:
To avoid this verification in future, please log in or register.

544 questions

549 answers

517 comments

717 users

Welcome to Data Lab Help Desk, where you can ask questions and receive answers from other members of the community.

Categories