Why am I only getting snr_g = 3.00000* from ls_r7ls.tractor_primary?

Question

1 Answer

Answer 1 · 2021-06-18T23:08:58+0000

Hi Bob, thanks for reaching out. We are very happy that Data Lab has proven useful to you!

A query like the one in the notebook you mentioned only grabs whatever 400K rows the DB decides to return. They only need to obey the constraints given in the query (e.g. here the snr constraints). The returned rows don't even need to be the same every time one runs the same query.

If you want a random sample, we recommend that one adds a constraint on a column called random_id. This is a uniformly distributed floating point number between 0 and 100, and distributed over all rows. The query in question here would then for instance look like this:

query = """
        SELECT dered_mag_g as gmag, dered_mag_r as rmag, dered_mag_z as zmag, 
               dered_mag_w1 as w1mag, dered_mag_w2 as w2mag, type,
               snr_g, snr_r, snr_z, snr_w1, ra, dec 
        FROM ls_dr7.tractor_primary
        WHERE (snr_g>3 and snr_r>3 and snr_z>3)
        AND random_id BETWEEN 20 AND 30
        LIMIT 400000

"""

The "AND random_id BETWEEN 20 AND 30" would pick 400K rows with random_id within that range (if there are 400K rows within that range... but for ls_dr7.tractor_primary that is the case).

If you wanted another random sample, but with guaranteed different rows, you could change the range, e.g. "AND random_id between 40 and 50", etc.

Your helpdesk request makes a good point that we should update this notebook, and possibly others, to include the random_id constraint by default.

Hope this is helpful,
Robert

Why am I only getting snr_g = 3.00000* from ls_r7ls.tractor_primary?

Please log in or register to add a comment.

Your answer

1 Answer

Please log in or register to add a comment.

Categories