0 votes
160 views

My student and I are trying to run code in a Jupyter notebook that previously had no errors, however now the join query is failing.  We have diagnosed the problem to be a CHANGE in the type of the specobjid parameter in one of the catalog tables we are using.   we begin with pulling FROM ls_dr7.x_sdss_dr14_specobj_1p5 as X. In data lab, this dataset's column name specobjid is under datatype as CHAR. Then we try to JOIN sdss_dr14.specobj as S ON S.specobjid = X.specobjid. However, in this dataset, has the column name specobjid is under datatype as BIGINT. This difference in datatype doesn't allow these two datasets to be joined.  We have tried to include in thee query changing the specobjid parameter from CHAR to BIGINT during the query, but this added step times out the query (also when run in the background).   Thus because of this recent change to the catalogs (also with no warning), we are unable to complete the analysis that was started in the last semester.  

Is there a work around that would allow us to use the above tables as are (I.e., with mismatched types) that will not time out the query ?  OR can the types in the 2 catalogs for the SAME PARAMETER be made to be the same?  

The search is reproduced below:

Code for the Query (problems highlighted in red):

query = ("""SELECT DISTINCT
            L.ra, L.dec, L.type,
            S.z, S.plug_ra, S.plug_dec, S.class, S.specobjid,
            P.dered_mag_g, P.dered_mag_r, P.dered_mag_w1, P.dered_mag_w2,                P.dered_mag_w3, P.dered_mag_w4, P.dered_mag_z
            FROM ls_dr7.x_sdss_dr14_specobj_1p5 as X
            JOIN ls_dr7.dr7_specobj_dr14 as L ON L.decals_id = X.decals_id
            JOIN sdss_dr14.specobj as S ON S.specobjid = X.specobjid
            JOIN ls_dr7.tractor_primary as P ON P.decals_id = X.decals_id
            WHERE S.z > 0 and S.z < 0.02 and S.class != 'STAR' and L.ra                            BETWEEN %s and %s and L.dec BETWEEN %s and %s
            limit 2000000""") %(100,270,-5,30)  #Large region
 

by | 160 views

1 Answer

0 votes
Hi, thanks for reaching out.

The specobjid columns in the SDSS surveys are originally of "CHAR' datatype. It's unfortunate because this is inefficient. At Data Lab we have created a bunch of x-match tables, where the the specobjid columns are of type BIGINT. We failed to propagate that back into the original tables, and are doing so now.

Thanks for bringing it to our attention. We'll report back here once this is done.

Best regards,

Robert for the Data Lab team
by robertdemo (8.0k points)

416 questions

434 answers

440 comments

636 users

Welcome to Data Lab Help Desk, where you can ask questions and receive answers from other members of the community.

Categories