Why have the data types changed recently for 'specobjid'?

Question

4.2k views

My student and I are trying to run code in a Jupyter notebook that previously had no errors, however now the join query is failing. We have diagnosed the problem to be a CHANGE in the type of the specobjid parameter in one of the catalog tables we are using. we begin with pulling FROM ls_dr7.x_sdss_dr14_specobj_1p5 as X. In data lab, this dataset's column name specobjid is under datatype as CHAR. Then we try to JOIN sdss_dr14.specobj as S ON S.specobjid = X.specobjid. However, in this dataset, has the column name specobjid is under datatype as BIGINT. This difference in datatype doesn't allow these two datasets to be joined. We have tried to include in thee query changing the specobjid parameter from CHAR to BIGINT during the query, but this added step times out the query (also when run in the background). Thus because of this recent change to the catalogs (also with no warning), we are unable to complete the analysis that was started in the last semester.

Is there a work around that would allow us to use the above tables as are (I.e., with mismatched types) that will not time out the query ? OR can the types in the 2 catalogs for the SAME PARAMETER be made to be the same?

The search is reproduced below:

Code for the Query (problems highlighted in red):

query = ("""SELECT DISTINCT
L.ra, L.dec, L.type,
S.z, S.plug_ra, S.plug_dec, S.class, S.specobjid,
P.dered_mag_g, P.dered_mag_r, P.dered_mag_w1, P.dered_mag_w2, P.dered_mag_w3, P.dered_mag_w4, P.dered_mag_z
FROM ls_dr7.x_sdss_dr14_specobj_1p5 as X
JOIN ls_dr7.dr7_specobj_dr14 as L ON L.decals_id = X.decals_id
JOIN sdss_dr14.specobj as S ON S.specobjid = X.specobjid
JOIN ls_dr7.tractor_primary as P ON P.decals_id = X.decals_id
WHERE S.z > 0 and S.z < 0.02 and S.class != 'STAR' and L.ra BETWEEN %s and %s and L.dec BETWEEN %s and %s
limit 2000000""") %(100,270,-5,30) #Large region

asked Sep 30, 2019 by Dara | 4.2k views

1 Answer

Answer 1 · 2019-11-11T19:17:06+0000

Hi, thanks for reaching out.

The specobjid columns in the SDSS surveys are originally of "CHAR' datatype. It's unfortunate because this is inefficient. At Data Lab we have created a bunch of x-match tables, where the the specobjid columns are of type BIGINT. We failed to propagate that back into the original tables, and are doing so now.

Thanks for bringing it to our attention. We'll report back here once this is done.

Best regards,

Robert for the Data Lab team

Why have the data types changed recently for 'specobjid'?

Please log in or register to add a comment.

Your answer

1 Answer

Please log in or register to add a comment.

Categories