0 votes
195 views

I can't find a method/function in the API documentation about how to find out about the size of a myDB table or how many rows it holds. Is there some kind of tbl_stat for myDB tables? I tried

query = "SELECT * FROM tbl_stat WHERE schema='mydb' AND tbl_name='mytable'"
info = qc.query(sql=query)
print(info)

 but it doesn't work. Since the table is very long, a "SELECT COUNT(*) …" doesn't work. Have I missed something?

by martinkb (720 points) | 195 views

3 Answers

0 votes
Best answer
Thanks for reaching out. The tbl_stat table is approximate, and also not guaranteed to be up-to-date at any given time. In some cases it might e.g. be full of zeros for a particular entry. That is, your queries and analysis should not depend on it. And, as you noticed, there is no tbl_stat for user mydb tables.

The count() function often runs faster when run on a single column, e.g. count(ra) versus count(*).

Finally, a count() query can be also run asynchronously. Async queries allow much longer timeouts (whereas sync queries are limited to 10 minutes).

Hope this is helpful,
Robert
by datalab (20.7k points)
selected by 2231
> The count() function often runs faster when run on a single column …

> Finally, a count() query can be also run asynchronously. …

Thanks for these tips. :)

I just copy&pasted the count solution from a forum discussion which timed out because of being run synchronously. Didn't realize back then that it could be done asynchronously as well. D'oh. Same goes for limiting the count to just one column.
0 votes

At the present time, there is no equivalent to the 'tbl_stat' table for MyDB tables, although it would be a reasonable thing to add to enhance the output of the mydb_list() method.  For the moment, you'll need to do a 'select count(*).....' of the table to get the number of rows.  This may be slower than for some published tables, however it should work in all cases.  If you encounter a timeout condition you may need to submit the query as an async job.

If the mydb table in question here is 'catwxvhs', the number of rows is 29145358
 

by datalab (20.7k points)
> If the mydb table in question here is 'catwxvhs' …

It is, so thanks for your foresight and helping out. :)
0 votes

How would I find out about the size in (giga)bytes of the table 'catwxvhs'? The first email from 'Data Lab Admin' states that "Personal MyDB database storage" quota is 250 GB.

If the mydb table in question here is 'catwxvhs', the number of rows is 29145358

I want to crossmatch this myDB table (4 columns) with VHS DR5 (1,374,207,485 rows) and have no experience if 29 million rows is a ridiculous high number that is guaranteed to time out in a crossmatch with a large table like VHS DR5?

Should I may be split up the myDB table based on ra values (e.g. 0-90, 90-180, …) or do you have additional tips on how to successfully crossmatch with qc3_radial_query()?

by martinkb (720 points)
Sorry, somehow only now I saw your comment.

First, the 250 GB quota is soft, i.e. I wouldn't worry about it (for now ;-))

Second, the best way to save space is to keep in the crossmatch query only the columns your really want (i.e. not all the columns).

Finally, if you can apply constraints (with WHERE) during or before the crossmatch query, that can reduce the number of rows, potentially significantly. For instance, maybe you are interested only in rows with enough SNR? Or only in objects that are redder than some value?...

As to the question "how big will my table be"? If you're counting columns, you could assign 4 bytes per value for each small int and each float. And 8 bytes for doubles and big ints. varchars are hard to estimate. I think a good but quick upper bound is Ncols x 8 bytes x Nrows. (Note that if you were to write out a table to e.g. a CSV file, the resulting file size is unknown, since much depends on details such as encoding, presence of char-valued fields, etc.

Best,
Robert
> the best way to save space is to keep in the crossmatch query only the columns your really want

Thanks for the tip. I intent to save only one column from VHS. That makes five columns total with the mydb table.

The four 'catwxvhs' columns are all double_precision and VHS japermag3 is listed as 'REAL'. So Ncols x 8 bytes x Nrows seems to fit very well. In the worst case of all 29,145,358 rows of 'catwxvhs' being in the crossmatch (very unlikely) this would mean 5 columns * 8 bytes * 29,145,358 rows = 1,165,814,320 bytes (1.16 GB), right?

> …if you can apply constraints (with WHERE) during or before the crossmatch query, that can reduce the number of rows, potentially significantly.

I have indeed multiple contraints which I've concatenated with AND ( WHERE … AND … AND …). Did I get that right that you're suggesting to move the q3c_radial_query to the end of the line?

414 questions

430 answers

436 comments

635 users

Welcome to Data Lab Help Desk, where you can ask questions and receive answers from other members of the community.

Categories