0 votes
Hi there! I have a list of specific galaxies with known DESI targetids (from desi_edr.zpix) that I want to cross-match with the ls_dr9.apflux table (I'm doing some aperture correction). The naive way I thought to do this was to query both the ls_dr9.apflux and desi_edr.photometry tables with queryClient in order to match the 'targetid' field to the 'ls_id' field in desi_edr.photometry, and use that 'ls_id' to retrieve the proper rows in 'ls_dr9.apflux'. However, my queries are timing out when I attempt to retrieve the ls_dr9.apflux table, so this isn't working. I was thinking that the timeout wouldn't happen if I was able to retrieve entries corresponding to my desired targetids or ls_ids directly (it's only a few thousand galaxies). Is there a good way to do this? Apologies, I'm relatively inexperienced working with large databases like this.
by amizener (120 points)
edited by 0 | 72 views

1 Answer

0 votes

Hello amizener, thanks for contacting us but apologies for the delay in responding. We have tested a method that worked with an example case of 10,000 targetid's selected from desi_edr.zpix and we include the steps below:

1) Read or convert the sample of interest into a pandas dataframe, making sure there is a `targetid` column as it will be used for joining later.

df_user = [your sample here]

2) Place (import) the dataframe to a local database table, where you can replace the name of the table and the dataframe created above:

qc.mydb_import('desi_user_table', df_user)

3) Write a query with two JOIN statements using your small table as the "left" table for speed:

query = '''SELECT m.targetid, p.ls_id, 

                  a.apflux_r_1, a.apflux_r_2, a.apflux_r_3

           FROM mydb://desi_user_table AS m

           LEFT JOIN desi_edr.photometry as p

           ON m.targetid = p.targetid

           LEFT JOIN ls_dr9.apflux as a

           ON p.ls_id = a.ls_id'''

Above, you can add your columns of interest for each table ("m" for your table after it was saved to mydb, "p" for the photometry table, "a" for the aperture apflux table).

The above example query ran in less than 4 seconds for a table with 10,000 targetid's from desi_edr.zpix. I ran it and saved the output as an Astropy table like so:

result = qc.query(sql=query, fmt='table')


427 questions

441 answers


644 users

Welcome to Data Lab Help Desk, where you can ask questions and receive answers from other members of the community.