Thank you for reaching out. May I offer a few general suggestions / best practices, and also a few specific ones to your proposed workflow. Maybe some, or a combination of them, will help in achieving your goal.
Cross-matches of entire large surveys are very compute-intensive. For most science use cases they are also not the first step (I think). How many des_dr2 objects do you have to start with? How many sources are approx. in each des_patch1 ?
* Is there a way where you could reduce significantly the number of objects you want to crossmatch? Maybe a pre-selection of DES DR2 objects based on colors, SNR, magnitude limits? (your science case drives this of course). The goal is not to have to crossmatch each and every object in two or three giant tables...
* If such a pre-selection can be done, one way to proceed is to save this pre-selected table to your MyDB, e.g. using:
query = """SELECT list,of,the,columns,i,need
FROM des_dr.main as d
WHERE d.mag_auto_g BETWEEN 23 and 24.5
AND d.snr_g > 30
AND any-other-conditions"""
result = qc.query(query,out="mydb://des_dr2_mytable")
You can then later access that smaller table, e.g. using
result = qc.query("SELECT * from mydb://des_dr2_mytable")
Ideally, your science use case can also dictate similar quality cuts on the other two tables (VHS, Catwise) to also reduce their initial sizes significantly, before proceeding to crossmatch them.
* As you see in the pseudo-example, it's good to only ask for the columns one really needs (which you are already specifying in your queries). Asking for all columns using '*' just explodes the data volume enormously.
* This is important: don't crossmatch by looping over every single source. Crossmatch using an entire sample at once (maybe the sample in a single patch, or maybe the entire table -- though in this case this would likely not succeed due to the sheer sizes of the initial tables). Our notebook "How to crossmatch tables" shows some examples:
https://github.com/astro-datalab/notebooks-latest/blob/master/04_HowTos/CrossmatchTables/How_to_crossmatch_tables.ipynb
(in section "Run the crossmatch"),
* The notebook also shows an example how you can return just the single nearest neighbor using SQL (since in your workflow you seem to be only keeping the nearest neighbor.) Specifying this in the SQL query prevents all the matching rows from having to be transferred back to your script.
* Note that you can run queries synchronously (that's the default), with up to 600 seconds timeout (default is 300s; set with qc.query(....timeout=600)), or asynchronously, with a runtime up to 24 hours (if memory serves). Handling async queries is a bit different though. Please see for instance the last section in this notebook: https://github.com/astro-datalab/notebooks-latest/blob/master/01_GettingStartedWithDataLab/01_JupyterPythonSQL101.ipynb
Various other notebooks in that repository also use async queries.
* For q3c functions to be more efficient (and this can make a big difference): if one of the two tables being crossmatched is much larger, and has already a q3c index computed on its ra/dec columns (like all object tables hosted at Data Lab do), then that large&indexed table should come second in many q3c functions (q3c_radial_query). See the Q3C documentation for more details: https://github.com/segasai/q3c
And also examples in the DL User Manual: https://datalab.noirlab.edu/docs/manual/Appendices/ExampleQueries/ExampleQueries.html
* If you are looping (maybe over patches or whatever), take the identical stuff outside the loop. For instance in your example (and using Pandas), code like this can be run just once, after the loop has finished:
sources['dec_comp'] = sources['dec2000'] - dec0
If 'dec0' is instead a column in the table, the one-shot line is then:
sources['dec_comp'] = sources['dec2000'] - sources['dec0']
This would do it for an entire column an once. And so on for other similar commands.
* Finally: if you like working with Pandas dataframes, and the returned table from a qc.query() call is not many millions/billions of rows lon, you can ask qc.query() to return a dataframe directly:
query = "SELECT ..."
df = qc.query(query,fmt='pandas')
Please let us know whether these hints were helpful in your quest, and please don't hesitate to contact us if you have further questions.