1.1.3. Survey datasets

Most Data Lab workflows will begin by posing a question for which the first step in getting an answer will involve a query of one or more survey catalog datasets. In the Data Lab, catalogs are stored in databases, with any given catalog consisting of a number of separate but linked tables. These tables are accessed via Structured Query Language (SQL) or its variant, Astronomical Data Query Language (ADQL). From the beginning, users are thus presented with a set of challenges:

  • Learning what measurements the tables from a given survey dataset contain and what they are named

  • Learning how to construct a database query that will retrieve all of the measurements needed for a given question

  • If measurements from more than one table or more than one survey are needed, learning how to join tables in such a way that all of the information is retrieved

  • For complex questions in particular, learning how to optimize the database query for performance

For many users, the first step in answering a question through the Data Lab will thus be to learn about the particular datasets that it contains.

1.1.3.1. What kinds of datasets does the Data Lab contain?

1.1.3.1.1. Core datasets

These are large high-value datasets served by the Data Lab, possibly providing value-added data such as pre-computed columns or external-table crossmatches. Tables are optimized/indexed to support the most common science cases. Examples of current and coming datasets are DECaLS and the DESI Targeting Surveys, DES, the DESI survey, and the pixel data contained in the NOIRLab’s Astro Data Archive.

1.1.3.1.2. Hosted datasets

These are smaller-scale, Survey Team, or PI datasets where a delivered high-level data product collection is provided by users who want to share the data via Data Lab services. These are relatively static in terms of release frequency/versions but imply some level of Data Lab operational support in order to be made available to community users. Examples are SMASH, GOGREEN and GCLASS, and GNIRS-DQS.

1.1.3.1.3. Reference datasets

These are large external datasets, mirrored through the Data Lab because of their value as photometric, spectroscopic, or astrometric references. Examples are SDSS, AllWISE, unWISE, Gaia, and USNO A/B.

1.1.3.2. Guidance on understanding table schema

Given the variety of datasets available through the Data Lab database, learning how to identify the tables and table columns of interest can be a challenge. There are several tools to help with this:

  • The Data Lab query webpage contains a schema browser through which you can browse the available datasets, their tables, and the column descriptions.

  • The datalab command has a schema method that will display the schema and table descriptions.

  • The Survey Data webpage contains full dataset descriptions and links to survey documentation.

In general, the survey datasets hosted by the Data Lab contain a few kinds of tables:

  • Overview: These are tables that provide summary information of the survey, such as the spatial organization of the catalog data. These tables generally have many fewer rows than the main catalog tables, as they do not contain individual objects.

  • Object: These are typically the main catalog tables, and contain aggregated information for the astronomical objects identified by the survey. There are often views of these main tables that apply a constraint to yield subsets of objects with similar properties, e.g. the specobj view of SDSS DR17. The object tables are sometimes broken into several tables, each with different columns of information but linked by a unique object identifier.

  • Measurement: These are typically tables containing time-stamped individual measurements of objects in the main catalog tables, in general organized by having one row for every individual epoch of every individual object. While the number of columns in these tables is typically smaller than for the object tables, the number of rows can be much larger, and thus care should be exercised when pulling data from them.

  • Neighbors: These are specialized tables that contain information on all the internal spatial matches within a specified radius of all objects in the object table. Depending on the density of the objects on the sky and matching radius, these tables can be very large.

  • Crossmatch: These tables typically contain the spatially matched cross-identifications of the main object table with object catalogs from one or more external surveys.

  • Exposure: These tables typically contain metadata, such as calibration information, airmass, etc., for every individual exposure taken during the survey. By joining these data through the measurement and object tables, users can assign these metadata values to their objects of interest.

  • Chip: These tables are similar to the Exposure tables, but contain metadata relevant to the individual chips in the mosaics that make up the exposures, e.g. chip-dependent photometric calibration information.