0 votes
82 views

Hello, 

I have a large query (see below) which works on a limited subset of the table, but breaks with async_=True on the same subset. The query is too long for the smaller timeout of the blocking query, so I need it to run async. I am running on the query_async_download branch of datalab.

The query:

wsquery = f'''

WITH RECURSIVE contiguous_subsets AS (

    SELECT

        mjd,

        objectid,

        ROW_NUMBER() OVER (PARTITION BY objectid ORDER BY mjd) AS rn

    FROM

        nsc_dr2.meas

    WHERE

        objectid IN (SELECT id 

                     FROM nsc_dr2.object 

                     WHERE ndet > 20

                        AND class_star > 0.9

                        LIMIT 1000)

),

recursive_cte AS (

    SELECT

        mjd,

        objectid,

        rn,

        1 AS subset_num,

        1 AS subset_count,

        mjd AS subset_start

    FROM

        contiguous_subsets

    WHERE

        rn = 1

    UNION ALL

    SELECT

        cs.mjd,

        cs.objectid,

        cs.rn,

        CASE 

            WHEN cs.mjd - rc.mjd <= 10.0 THEN rc.subset_num 

            ELSE rc.subset_num + 1

        END AS subset_num,

        CASE 

            WHEN cs.mjd - rc.mjd <= 10.0 THEN rc.subset_count + 1

            ELSE 1

        END AS subset_count,

        CASE 

            WHEN cs.mjd - rc.mjd <= 10.0 THEN rc.subset_start

            ELSE cs.mjd

        END AS subset_start

    FROM

        contiguous_subsets cs

    JOIN

        recursive_cte rc ON cs.objectid = rc.objectid AND cs.rn = rc.rn + 1

)

SELECT

    objectid,

    MIN(mjd) AS subset_start,

    MAX(mjd) AS subset_end,

    COUNT(*) AS subset_count

FROM

    recursive_cte

GROUP BY

    objectid, subset_num

HAVING

    MAX(mjd) - MIN(mjd) >= 50

    AND COUNT(*) >= 5;

'''

This runs in 495ms with:

 result = qc.query(sql=wsquery, fmt='pandas', timeout=600)

And fails for this:

bigjob = qc.query(sql=wsquery, async_=True, timeout=10000, fmt='pandas')

with the following error:

...
queryClientError: Error: <html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.26.0</center>
</body>
</html>

by | 82 views

1 Answer

0 votes
Hi Duncan,

CASE / WHEN / ELSE clauses are currently not supported by ADQL (which is what is used for async queries). Maybe you can try to re-write your complex query into a number of smaller, more specific ones? You can store any intermediate results as tables in your MyDB for instance.
by robertdemo (8.5k points)
Hi Robert, thanks for your response. I changed my query to avoid the CASE clauses. Unfortunately it seems both async and mydb functionality are down. Even before these issues started, though, async failed to write out to mydbs.

I tried to test it using query's 'aqdl' option, and got this error:

queryClientError: Error: IllegalArgumentException: net.sf.jsqlparser.JSQLParserException JSQLParserException:  ParseException: Encountered "contiguous_subsets" at line 1, column 16.
Was expecting one of:
    "AS" ...
    "(" ...

441 questions

456 answers

462 comments

655 users

Welcome to Data Lab Help Desk, where you can ask questions and receive answers from other members of the community.

Categories