My student and I are trying to run code in a Jupyter notebook that previously had no errors, however now the join query is failing. We have diagnosed the problem to be a CHANGE in the type of the specobjid parameter in one of the catalog tables we are using. we begin with pulling FROM ls_dr7.x_sdss_dr14_specobj_1p5 as X. In data lab, this dataset's column name specobjid is under datatype as CHAR. Then we try to JOIN sdss_dr14.specobj as S ON S.specobjid = X.specobjid. However, in this dataset, has the column name specobjid is under datatype as BIGINT. This difference in datatype doesn't allow these two datasets to be joined. We have tried to include in thee query changing the specobjid parameter from CHAR to BIGINT during the query, but this added step times out the query (also when run in the background). Thus because of this recent change to the catalogs (also with no warning), we are unable to complete the analysis that was started in the last semester.
Is there a work around that would allow us to use the above tables as are (I.e., with mismatched types) that will not time out the query ? OR can the types in the 2 catalogs for the SAME PARAMETER be made to be the same?
The search is reproduced below:
Code for the Query (problems highlighted in red):
query = ("""SELECT DISTINCT
L.ra, L.dec, L.type,
S.z, S.plug_ra, S.plug_dec, S.class, S.specobjid,
P.dered_mag_g, P.dered_mag_r, P.dered_mag_w1, P.dered_mag_w2, P.dered_mag_w3, P.dered_mag_w4, P.dered_mag_z
FROM ls_dr7.x_sdss_dr14_specobj_1p5 as X
JOIN ls_dr7.dr7_specobj_dr14 as L ON L.decals_id = X.decals_id
JOIN sdss_dr14.specobj as S ON S.specobjid = X.specobjid
JOIN ls_dr7.tractor_primary as P ON P.decals_id = X.decals_id
WHERE S.z > 0 and S.z < 0.02 and S.class != 'STAR' and L.ra BETWEEN %s and %s and L.dec BETWEEN %s and %s
limit 2000000""") %(100,270,-5,30) #Large region