0 votes
188 views
I'm running the StarGalaxyQSOExample.ipynb. it specifies that snr for g, r and z be > 5, but snr_g is whacky - it's all 3.000001 for the first 50 or so, then becomes 2 becomes 3 ... after every 50 iterations or so. Is that an issue with the data set? It doesn't appear to be a true snr_g.  While I'm here, thank you thank you thank you for this site and for all the wonderful tutorials and examples! This makes learning so much more fun and efficient - I really appreciate what you folks have done.
asked Jun 16 by Bob Abel | 188 views

1 Answer

0 votes

Hi Bob, thanks for reaching out. We are very happy that Data Lab has proven useful to you!

A query like the one in the notebook you mentioned only grabs whatever 400K rows the DB decides to return. They only need to obey the constraints given in the query (e.g. here the snr constraints). The returned rows don't even need to be the same every time one runs the same query.

If you want a random sample, we recommend that one adds a constraint on a column called random_id. This is a uniformly distributed floating point number between 0 and 100, and distributed over all rows. The query in question here would then for instance look like this:

query = """
        SELECT dered_mag_g as gmag, dered_mag_r as rmag, dered_mag_z as zmag, 
               dered_mag_w1 as w1mag, dered_mag_w2 as w2mag, type,
               snr_g, snr_r, snr_z, snr_w1, ra, dec 
        FROM ls_dr7.tractor_primary
        WHERE (snr_g>3 and snr_r>3 and snr_z>3)
        AND random_id BETWEEN 20 AND 30
        LIMIT 400000

"""

The "AND random_id BETWEEN 20 AND 30" would pick 400K rows with random_id within that range (if there are 400K rows within that range... but for ls_dr7.tractor_primary that is the case).

If you wanted another random sample, but with guaranteed different rows, you could change the range, e.g. "AND random_id between 40 and 50", etc.

Your helpdesk request makes a good point that we should update this notebook, and possibly others, to include the random_id constraint by default.

Hope this is helpful,
Robert

answered Jun 19 by robertdemo (2,340 points)

271 questions

273 answers

311 comments

1,836 users

Welcome to Data Lab Help Desk, where you can ask questions and receive answers from other members of the community.