Hello,
I have a large query (see below) which works on a limited subset of the table, but breaks with async_=True on the same subset. The query is too long for the smaller timeout of the blocking query, so I need it to run async. I am running on the query_async_download branch of datalab.
The query:
wsquery = f'''
WITH RECURSIVE contiguous_subsets AS (
SELECT
mjd,
objectid,
ROW_NUMBER() OVER (PARTITION BY objectid ORDER BY mjd) AS rn
FROM
nsc_dr2.meas
WHERE
objectid IN (SELECT id
FROM nsc_dr2.object
WHERE ndet > 20
AND class_star > 0.9
LIMIT 1000)
),
recursive_cte AS (
SELECT
mjd,
objectid,
rn,
1 AS subset_num,
1 AS subset_count,
mjd AS subset_start
FROM
contiguous_subsets
WHERE
rn = 1
UNION ALL
SELECT
cs.mjd,
cs.objectid,
cs.rn,
CASE
WHEN cs.mjd - rc.mjd <= 10.0 THEN rc.subset_num
ELSE rc.subset_num + 1
END AS subset_num,
CASE
WHEN cs.mjd - rc.mjd <= 10.0 THEN rc.subset_count + 1
ELSE 1
END AS subset_count,
CASE
WHEN cs.mjd - rc.mjd <= 10.0 THEN rc.subset_start
ELSE cs.mjd
END AS subset_start
FROM
contiguous_subsets cs
JOIN
recursive_cte rc ON cs.objectid = rc.objectid AND cs.rn = rc.rn + 1
)
SELECT
objectid,
MIN(mjd) AS subset_start,
MAX(mjd) AS subset_end,
COUNT(*) AS subset_count
FROM
recursive_cte
GROUP BY
objectid, subset_num
HAVING
MAX(mjd) - MIN(mjd) >= 50
AND COUNT(*) >= 5;
'''
This runs in 495ms with:
result = qc.query(sql=wsquery, fmt='pandas', timeout=600)
And fails for this:
bigjob = qc.query(sql=wsquery, async_=True, timeout=10000, fmt='pandas')
with the following error:
...
queryClientError: Error: <html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.26.0</center>
</body>
</html>