Hi, thanks for reaching out.
The reason for the slowness is likely the multi-column JOIN (your example is not actually crossmatching, but JOINing tables).
You will have likely seen that the ls_dr8.tractor table has a column 'ls_id', which combines release,brickid, and objid to a single unique ID. The solution to your goal is to compute such a column for your own 3-millions-rows table, too, and then do a simple JOIN on it with ls_dr8.tractor.ls_id.
The formal prescription to generate ls_id from release,brickid,objid is:
(release << 40) | (brickid << 16) | (objid)
If you want to generate it by running a query on our DB, you can use something like this (in this example I'm only creating 1000 rows):
from dl import queryClient as qc
q = """SELECT release, brickid, objid, ra, dec, random_id, ls_id,
( (CAST(release AS BIGINT) << 40) |
(CAST(brickid AS BIGINT) << 16) |
(CAST(objid AS BIGINT))
) AS my_ls_id
FROM ls_dr8.tractor LIMIT 1000
But you can also generate the my_ls_id column locally, and then add it to your table. Then upload the new table, and crucially, index the my_ls_id column:
Now you can do the JOIN:
q = """SELECT m.legacy_id,
l.ra, l.dec, l.mag_w1, l.mag_w2, l.mag_w3,
l.psfsize_z, l.fracdev, l.shapedev_r,
l.shapedev_e1, l.shapedev_e2, l.shapeexp_r, l.shapeexp_e1, l.shapeexp_e2
FROM ls_dr8.tractor AS l
JOIN mydb://mytable AS m
ON l.ls_id = m.my_ls_id"""
res = qc.query(sql=q,fmt='csv',out='vos://dr2_allls')
I've tried this with example tables in mydb with up to 100K rows in size, and it's much faster: 7 seconds for a 100K rows table.
We hope this helps,