There are a couple of issues here that would explain what you are seeing.
First, your MyDB table name should not contain a ".csv" extension, that makes it invalid.
Second, the query uses the 'mergedclass' column as a constraint, however this column is not indexed in the table and so a scan of the full 2TB table (1.4B rows) is required and is inherently slow. For the moment, I've added an index to that column to try to speed things up but it doesn't solve the problem completely. That still means that your query result would have ~207M rows, even though it's a small number of columns.
When you submit a job through TOPCAT you are talking directly to the TAP service. A large part of the time required is taken by simply serializing those 207M rows into the XML VOTable that gets returned directly to the client (TOPCAT) , that is assuming the TAP service doesn't fail due to an out-of-memory because it is also servicing other jobs. When submit an ASync job from the queryClient, however, the TAP service still has to serialize the VOTable but that's passed back to our querymanager service which then has to de-serialize it into the CSV returned to the client or then uploaded into the MyDB . That brings up a number of timeout issues as well as memory constraints.
We are aware that work needs to be done to better handle large results sets such as this, but in the meantime the recommended solution is to break the query up into smaller pieces, e.g. stripes of Declination, to limit the size of each result. This can also be done such that the stripes can be run as Sync queries -- the advantage here is that is bypasses the VOTable serialization and works directly in the database but the downside is that you then have to deal with multiple output tables.
For example, the following code runs in ~45min and queries the complete VHS catalog:
from dl import queryClient as qc
from datetime import datetime
# Format the query into a 3deg Dec strips based on the input value and
# create a unique table name.
query = '''SELECT ra2000, dec2000, japermag3, mergedclass
WHERE japermag3 between 18 and 19
AND (mergedclass=-1 OR mergedclass=-2)
AND (dec2000 < %g and dec2000 >= %g)
''' % (float(d),float(max(-90,d-3)))
tname = 'mydb://vhs_%c%02d' % ('m' if d < 0.0 else 'p',abs(d))
# Submit the query synchronously. Dec strip size chosen to complete
# within the timeout based on earlier tests. Select row count to validate
# query completed properly, log the progress.
print(str(datetime.now()) + ' DEC beteen: %d and %d' % (d,d-3))
result = qc.query(sql=query, out=tname, verbose=1, timeout=600)
nrows = qc.query(sql='select count(*) from %s' % tname).split()
print(str(datetime.now()) + ' %s nrows = %s' % (tname, nrows))
# Loop over the VHS catalog in 3deg strips. Max Dec is about +1.5, so move
# southward in each strip.
for d in range (2,-90,-3):
It's clear we need to add an 'append' option to the query to allow output tables to be appended in this type of processing, but or the moment you'll at least have result tables you can query individually or concatenate yourself.
As for your other questions: If you use the 'wait' option then the query() remains active so closing the tab or stopping the kernel is not recommended. However, without the wait option the query() returns a 'jobID' value that can be used to check status or retrieve results even if you've stopped the notebook.
Hope this helps, post back if you still have questions.