0 votes
2 views
Today I issued the following with timeout of 300 seconds:

select ra,dec,av,pmracosd,pmdec,vrad,mu0,label as evol_label,logte as logT,logg,logl as logL,z as Z,umag,gmag,rmag,imag,zmag,ymag from lsst_sim.simdr2 where ring256 in (611564).

It returned successfully in less than a minute with about 2.4 million rows. Then I tried

select ra,dec,av,pmracosd,pmdec,vrad,mu0,label as evol_label,logte as logT,logg,logl as logL,z as Z,umag,gmag,rmag,imag,zmag,ymag from lsst_sim.simdr2 where ring256 in (613612, 611564)

It took perhaps a little longer than a minute and returned about 3.9 million rows.  Finally I increased the timeout value to 590 seconds and issued

select ra,dec,av,pmracosd,pmdec,vrad,mu0,label as evol_label,logte as logT,logg,logl as logL,z as Z,umag,gmag,rmag,imag,zmag,ymag from lsst_sim.simdr2 where ring256 in (613612, 612588, 612587, 611564)

The result should have been about 8 million rows. This raised an exception with the Gateway Time-out error in 3 or 4 minutes, well under the timeout value I supplied.

I can split the query into two smaller queries, but why do I need to? Is the total amount of data to be returned (about 8 million rows by 18 columns) the issue?
by jrbogart (160 points) | 2 views

1 Answer

0 votes
Hi, if the query process dies before the timeout, it's usually one of a few reasons: DB query crashed, or one of the intermediate servers crashed. If related to your specific query, this usually indicates that the returned set size was very large (Nrows, Ncolumns). If the server(s) was/were already overloaded (potentially by other users), this can compound.
We work to minimize such occurrences, but they do happen sometimes.
Please let us know if you were able to run your full query to completion since then.
ago by datalab (22.1k points)
Since I could not reliably run my query, I broke it up into several queries.   The initial query had a spatial condition so this wasn't especially difficult to do, but I have to put heuristics in my code to guess at the partitioning: fine enough that all queries would complete successfully but not much finer, since  using finer partitioning than necessary leads to some inefficiency.   It would help if error reporting made it clear that the problem had to do with returned data size.

500 questions

511 answers

498 comments

681 users

Welcome to Data Lab Help Desk, where you can ask questions and receive answers from other members of the community.

Categories