0 votes
517 views

This is the code I am trying to run and it has worked before:

%%time
# Call query manager
response = qc.query(token, adql=query, fmt='csv')

# Reformat output into a table
result = Table.read(response, format='csv')  #dictionary
print("Number of Objects: ")
print(len(result))

print('Time')

This is part of the error code I am receiving:

queryClientError: Error: IllegalArgumentException: net.sf.jsqlparser.JSQLParserException JSQLParserException:  ParseException: Encountered "DISTINCT" at line 1, column 20.
Was expecting one of:
    "NULL" ...
    "CASE" ...
    "REPLACE" ...
    <S_DOUBLE> ...
    <S_INTEGER> ...
    <S_IDENTIFIER> ...
    <S_CHAR_LITERAL> ...
    <S_QUOTED_IDENTIFIER> ...
    "(" ...
    "*" ...
    "?" ...
    "+" ...
    "-" ...
    "{d" ...
    "{t" ...
    "{ts" ...
    "{fn" ...
by kayliglidic (160 points) | 517 views

3 Answers

0 votes
Hi, can you please also post here the exact query string you are supplying? Thank you.
by datalab (20.7k points)
query = ("""SELECT DISTINCT
            L.ra, L.dec, L.type,
            S.z, S.plug_ra, S.plug_dec, S.class, S.specobjid,
            P.dered_mag_g, P.dered_mag_r, P.dered_mag_w1, P.dered_mag_w2, P.dered_mag_w3,
            P.dered_mag_w4, P.dered_mag_z
            FROM ls_dr7.x_sdss_dr14_specobj_1p5 as X
            JOIN ls_dr7.dr7_specobj_dr14 as L ON L.decals_id = X.decals_id
            JOIN sdss_dr14.specobj as S ON S.specobjid = X.specobjid
            JOIN ls_dr7.tractor_primary as P ON P.decals_id = X.decals_id
            WHERE S.z > 0 and S.z < 0.02 and S.class != 'STAR' and L.ra BETWEEN %s and %s and L.dec
            BETWEEN %s and %s limit 2000000""") %(100,270,-5,30)
0 votes
Your query begins with "SELECT TOP 2000000 DISTINCT L.ra, L.dec, L.type,........".  However, DISTINCT is a function in ADQL that takes an argument (e.g. "SELECT DISTINCT(id) FROM ....".  Is it possible that earlier you had submitted the query using the 'sql' parameter instead of 'adql'?
by datalab (20.7k points)
No, I have always submitted it with adql. I tried changing adql to sql and got the same error.
The problem seems to be with the query manager.

The initial error the code given is:

---------------------------------------------------------------------------
queryClientError                          Traceback (most recent call last)
<ipython-input-9-0012da64cbeb> in <module>()
----> 1 get_ipython().run_cell_magic(u'time', u'', u'# Call query manager\nresponse = qc.query(token, adql=query, fmt=\'csv\')\n\n# Reformat output into a table\nresult = Table.read(response, format=\'csv\')  #dictionary\nprint("Number of Objects: ")\nprint(len(result))\n\nprint(\'Time\')')

/data0/sw/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in run_cell_magic(self, magic_name, line, cell)
   2115             magic_arg_s = self.var_expand(line, stack_depth)
   2116             with self.builtin_trap:
-> 2117                 result = fn(magic_arg_s, cell)
   2118             return result
   2119
From https://www.gaia.ac.uk/data/gaia-data-release-1/adql-cookbook, I think the ADQL syntax is correct.  I wonder if the ADQL parser has changed?  I sat down with Kayla and Dara, and there are other problems as well:
- running as SQL instead of ADQL, the line:
            JOIN sdss_dr14.specobj as S ON S.specobjid = X.specobjid
  barfs because specobjid is type CHAR in ls_dr7.x_sdss_dr14_specobj_1p5, but type BIGINT in sdss_dr14.specobj
- After CASTing the specobjid to BIGINT, it barfs because plug_ra and plug_dec don't exist in sdss_dr14.specobj.
- After changing to ra, dec instead, query times out in 600 seconds.  Running with async=True produced another error, so the user is currently stuck.
The ADQL parser has not changed in a loooong time.  What may be new in the last few months is the automatic re-write of "limit" to 'top' when submitting ADQL, and perhaps also the contents of especially the crossmatch tables.  Does an ADQL query without the limit show the same cast/colname problems?
0 votes

In a release of new code several months ago, the Query Manager was updated to support asynchronous SQL queries.  It (currently) does this by rewriting the SQL to ADQL (e.g. converting 'LIMIT' to 'TOP'), but it also does this for all input strings so that valid SQL or ADQL is passed to the service either way.  

Your query string contains a LIMIT clause which is being converted to 

    SELECT TOP 2000000 DISTINCT .....

which is invalid ADQL.  Proper syntax would be:

    SELECT DISTINCT TOP 2000000 ......

We've fixed the code but until it is deployed the simple workaround is to rewrite your query string using the "SELECT DISTINCT TOP 2000000....." syntax rather than using the LIMIT clause (which is allowed by our version of the parser but isn't strictly allowed by ADQL)

by datalab (20.7k points)

416 questions

434 answers

440 comments

636 users

Welcome to Data Lab Help Desk, where you can ask questions and receive answers from other members of the community.

Categories