As suggested by my esteemed colleague Dino, one can use the random_id column to help generate sub-samples of a catalog. Since there is a strong gradient of sources with galactic latitude, it can be helpful to break the query into low and high latitude. But something like this:
rmag > 17 AND rmag < 17.3
(galb > 30 OR galb < -30)
random_id < 50
generates 700k stars and I can easily trim that further to make the distribution more uniform.