Query works synchronously but not async

Question

236 views

Hello,

I have a large query (see below) which works on a limited subset of the table, but breaks with async_=True on the same subset. The query is too long for the smaller timeout of the blocking query, so I need it to run async. I am running on the query_async_download branch of datalab.

The query:

wsquery = f'''

WITH RECURSIVE contiguous_subsets AS (

SELECT

mjd,

objectid,

ROW_NUMBER() OVER (PARTITION BY objectid ORDER BY mjd) AS rn

FROM

nsc_dr2.meas

WHERE

objectid IN (SELECT id

FROM nsc_dr2.object

WHERE ndet > 20

AND class_star > 0.9

LIMIT 1000)

),

recursive_cte AS (

SELECT

mjd,

objectid,

rn,

1 AS subset_num,

1 AS subset_count,

mjd AS subset_start

FROM

contiguous_subsets

WHERE

rn = 1

UNION ALL

SELECT

cs.mjd,

cs.objectid,

cs.rn,

CASE

WHEN cs.mjd - rc.mjd <= 10.0 THEN rc.subset_num

ELSE rc.subset_num + 1

END AS subset_num,

CASE

WHEN cs.mjd - rc.mjd <= 10.0 THEN rc.subset_count + 1

ELSE 1

END AS subset_count,

CASE

WHEN cs.mjd - rc.mjd <= 10.0 THEN rc.subset_start

ELSE cs.mjd

END AS subset_start

FROM

contiguous_subsets cs

JOIN

recursive_cte rc ON cs.objectid = rc.objectid AND cs.rn = rc.rn + 1

)

SELECT

objectid,

MIN(mjd) AS subset_start,

MAX(mjd) AS subset_end,

COUNT(*) AS subset_count

FROM

recursive_cte

GROUP BY

objectid, subset_num

HAVING

MAX(mjd) - MIN(mjd) >= 50

AND COUNT(*) >= 5;

'''

This runs in 495ms with:

result = qc.query(sql=wsquery, fmt='pandas', timeout=600)

And fails for this:

bigjob = qc.query(sql=wsquery, async_=True, timeout=10000, fmt='pandas')

with the following error:

...
queryClientError: Error: <html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.26.0</center>
</body>
</html>

asked May 16, 2024 by Duncan Wood | 236 views

Query works synchronously but not async

Please log in or register to add a comment.

Your answer

1 Answer

Please log in or register to add a comment.

Categories