Hi, thanks for reaching out.
If you want to work in a NB on the DL Jupyter NB server, please first upload your CSV, e.g. to the NB space. The format should have one header line, followed by the data lines, e.g.:
ra,dec,source_id
2.503706331038843,53.16659819583169,396011034259836416
2.5165612839280316,53.171095728972396,396011072920766848
2.519798375886651,53.17298647736945,396011072916668800
Then, in the NB server, upload the table to your MyDB (that's the user relational database). The reason to do so is that then you can run the xmatch directly on the DB, which is faster and should work for 20M rows, without the need to split things. Here's how:
from dl import queryClient as qc
qc.mydb_import('mytable','./mycsvfile.csv') # table name in mydb, csv file name on local disk
How to write the xmatch queries is shown e.g. in this DL example NB:
https://github.com/astro-datalab/notebooks-latest/blob/master/04_HowTos/CrossmatchTables/How_to_crossmatch_tables.ipynb
(in the section "Run the crossmatch"). Details on writing the SQL string will depend whether you want to keep all matches within the 1 arcsec or just the nearest one, and whether you want to keep rows that have no counterpart in gaia_edr.gaia_source table or not.
Once you have written an xmatch query, you can run it in the NB and write out the result to a new CSV file like this:
xmatchsql = "SELECT ..." # your xmatch query SQL, see the example NB
qc.query(xmatchsql,timeout=600,out='./myxmatch.csv')
BTW, I tried these steps with a 20M rows CSV file (from gaia_edr3 itself), and the qc.mydb_import() step took 22 minutes for me, while a crossmatch query on it vs gaia_edr3.gaia_source took 13 minutes.
Please don't hesitate to contact us again if needed.
Robert