Section author: Robert Nikutta <robert.nikutta@noirlab.edu>, Alice Jacques <alice.jacques@noirlab.edu>

Version: 20211119

1.7.1. The datalab Command

1.7.1.1. Data Lab Command Line Client

The Data Lab Command Line Client (DCLC) is a Python-based package that provides an alternate way to interact with the various Data Lab services. It can be installed with:

pip install --ignore-installed --no-cache-dir noaodatalab

It is invoked via the datalab command. The list of supported Data Lab tasks can be obtained via datalab help:

> datalab help

Usage:

   % datalab <task> [task_options]

where <task> is one of:

              cp - copy a file in Data Lab
             get - get a file from Data Lab
              ln - link a file in Data Lab
           login - Login to the Data Lab
          logout - Logout of the Data Lab
              ls - list a location in Data Lab
           mkdir - create a directory in Data Lab
              mv - move a file in Data Lab
       mydb_copy - Rename a user MyDB table
     mydb_create - Create a user MyDB table
       mydb_drop - Drop a user MyDB table
     mydb_import - Import data into a user MyDB table
      mydb_index - Index data in a MyDB table
     mydb_insert - Insert data into a user MyDB table
       mydb_list - List the user MyDB tables
     mydb_rename - Rename a user MyDB table
   mydb_truncate - Truncate a user MyDB table
        profiles - List the available Query Manager profiles
             put - Put a file into Data Lab
        qresults - Get the async query results
         qstatus - Get an async query job status
           query - Query a remote data service in the Data Lab
              rm - delete a file in Data Lab
           rmdir - delete a directory in Data Lab
          schema - Print data service schema info
        services - Print available data services
          status - Report on the user status
        svc_urls - Print service URLs in use
             tag - tag a file in Data Lab
         version - Print task version
          whoami - Print the current active user

All subcommands take the optional arguments:

debug - print debug level log messages [optional]

verbose - print verbose level log messages [optional]

warning - print warning level log messages [optional]

If a required argument is not specified on the command line, a prompt will be given for it. If you are specifying an argument on the command line then you need to put two dashes -- in front of the argument name and an equals before the value of the argument:

-—argument=value.

1.7.1.2. Referencing files in Data Lab

When you want to refer to a file in the Data Lab (also called a remote file), you need to put a 'vos://' prefix before it so that the DCLC knows it is a remote file you are referring to. If you want to be really precise, you can use the full identifier for the file (also known as the VOSpace identifier) which would be something like:

vos://datalab.noirlab.edu!vospace/nodes/sarah/data/table1.vot

However, you can also just use the location within your virtual storage area, in this case - 'vos://data/table1.vot' - and the DCLC will translate this into the proper form for you. Note that if you need to identify a file within someone else’s virtual storage, e.g., a data file that a collaborator is sharing with you, then you will need to use the full VOSpace identifier to refer to it.

1.7.1.3. Task reference

• cp

This copies a remote file between the two specified locations. (The get and put commands are used to copy files between Data Lab virtual storage and local storage.) It takes the following additional arguments:

from - Source location in Data Lab [required]

to - Destination location in Data Lab [required]

> datalab cp -—from=vos://dbs/test.vot -—to=vos://results/test.vot

• get

This retrieves the specified remote file and optionally saves it to a local file. It takes the following additional arguments:

fr - Remote Data Lab file name [required]

to - Local disk file name [optional]

> datalab get -—fr=vos://data/test.vot -—to=test.vot

• ln

This creates a (soft) link to the specified file at the given location. It takes the following additional arguments:

fr - location in Data Lab of link from[required]

to - location linked points to [required]

> datalab ln -—fr=vos://dbs -—to=http://some/data/file

• login

This logs a user into a Data Lab session and optionally mounts their remote storage space (see mount). It takes the following additional arguments:

user - username of account in Data Lab [required]

password - password for account in Data Lab [required]

mount - mountpoint of remote Virtual Storage [optional]

> datalab login -—user=sarah -—password=herr1ng -—mount=/tmp/vospace
Welcome to the Data Lab, sarah
Initializing mount

• logout

This logs out the user from a Data Lab session and optionally unmounts their remote storage space. It takes the following additional arguments:

unmount - mountpoint of remote Virtual Storage [optional]

> datalab logout -—unmount=/tmp/vospace
Unmounting remote space
You are now logged out of the Data Lab

• ls

This lists a remote directory. It takes the following additional arguments:

name - Location in Data Lab to list [optional]

format - Format for listing (ascii|csv|raw) [optional]

> datalab ls -—name=...

• mkdir

This creates the specified directory. It takes the following additional arguments:

name - directory in Data Lab to create [required]

> datalab mkdir -—name=vos://test

• mv

This moves the specified remote file/directory between the two locations. It takes the following additional arguments:

from - location in Data Lab to move from [required]

to - location in Data Lab to move to [required]

> datalab mv -—from=vos://data/test.vot -—to=vos://work/test.vot

• mydb_copy

This copies a MyDB table that currently exists in a user’s MyDB database. It takes the following additional arguments:

source - name of table to copy [required]

target - name of new table with copied data from source table [required]

> datalab mydb_copy --source="test" --target="test_copy"

• mydb_create

This creates a new empty MyDB table with a user-provided schema file. It takes the following additional arguments:

table - name of the new MyDB table to create [required]

schema - location and name of the schema definition to be in the table [required]

The schema definition is stored in a text file. The schema definition file is a CSV-formatted file that contains column name and (Postgres) data type, one row per column. The general format is:

Columnname1,datatype1\nColumnname2,datatype2\nColumnname3,datatype3

> datalab mydb_create --table="createdtable" --schema="./schema.txt"

• mydb_drop

This removes a MyDB table from a user’s MyDB database. It takes the following additional argument:

table - name of the table we wish to remove from MyDB database [required]

> datalab mydb_drop --table="test"

• mydb_import

This imports data saved on a local computer or imports data from VOSpace into a MyDB data table. The data must be in the form of either a CSV file or Pandas Dataframe object in order to load it into MyDB. It takes the following additional arguments:

table - name of the new MyDB table to create with the imported data [required]

data - location and name of the data to import [required]

> datalab mydb_import --table="result_table" --data="./result_table.csv"

• mydb_insert

This inserts data saved on a local computer or inserts data from VOSpace into a pre-existing MyDB table. The data must be in the form of either a CSV file or Pandas Dataframe object in order to load it into MyDB. It takes the following additional arguments:

table - name of the pre-existing MyDB table in which to insert the data [required]

data - location and name of the data to insert into the table [required]

> datalab mydb_insert --table="result_table" --data="./other_results.csv"

• mydb_list

This lists all of the MyDB tables currently in a user’s database. This function can also list the schema and schema’s datatype in a specified MyDB table. It takes the following additional argument:

table - name of the MyDB table [optional]

> datalab mydb_list
result_table,created:2020-11-25 13:39:15 MST
usno_objects,created:2020-09-16 14:22:23 MST

> datalab mydb_list --table="result_table"
id,integer,
gmag,double precision,
imag,double precision,
rmag,double precision,

• mydb_rename

This renames a MyDB table. It takes the following additional arguments:

old - name of table to rename [required]

new - new name of table [required]

> datalab mydb_rename --old="result_table" --new="new_result_table"

• mydb_truncate

This truncates a MyDB table, i.e. drops all rows but keeps the table definition (schema). It takes the following additional arguments:

table - name of the MyDB table to truncate [required]

> datalab mydb_truncate --table="new_result_table"

• profiles

This lists the available Query Manager and Storage Manager profiles.

> datalab profiles

Query Manager Profiles:
-----------------------
   GALEX-DR6   GALEX DR6 TAP service (29 Tables, TAP Only)
        GAVO   GAVO Data Center TAP service (149 Tables, TAP Only)
     HEASARC   HEASARC Xamin TAP Service (921 Tables, TAP Only)
        IRSA   IRSA TAP Service (478 Tables, TAP Only)
    SDSS-DR9   SDSS DR9 TAP service (92 Tables, TAP Only)
      SIMBAD   SIMBAD TAP service (47 Tables, TAP Only)
STScI-RegTAP   STScI Registry TAP service (18 Tables, TAP Only)
      Vizier   TAP Vizier query engine (34381 Tables, TAP Only)

• put

This uploads a local file to the remote storage space. It takes the following additional arguments:

fr - Local disk file name [required]

to - Remote Data Lab file name [required]

> datalab put -—fr=/home/sarah/simulations/run5.txt -—to=vos://dbs/simul1.dat

• qresults

Returns the async query results. It takes the following additional arguments:

jobId - Query Job ID [required]

> datalab qresults -—jobId=...

• qstatus

Returns the async query job status. It takes the following additional arguments:

jobId - Query Job ID [required]

> datalab qstatus -—jobId=...

• query

This runs a query against either the db directly (synchronous) or via the TAP service (asynchronous). It takes the following additional arguments:

adql - ADQL statement [optional]

sql - input SQL filename [optional]

fmt - requested output format [optional] - ‘csv’, ‘ascii’, ‘votable’

out - output filename [required]

async - asynchronous query [optional]

profile - Service profile to use [optional]

timeout - Requested query timeout [optional]

Note that tables within your MyDB need to be identified with a mydb:// prefix in either the query or the output argument.

> datalab query -—adql="select cphotid, ra, dec, g_tot, r_tot from gogreen_dr1.photo limit 100" -—fmt="csv" -—out="vos://green.csv"
> datalab query --adql="select cphotid, ra, dec, g_tot, r_tot from gogreen_dr1.photo limit 100" --out="mydb://table1"
> datalab query -—adql="select id, g_i, i_z from lsdr2.stars l, mydb://table1 m where l.id = m.id"  -—fmt="csv"
> datalab query -—sql=complex.sql -—async=true -—fmt='csv'

• rm

This deletes a remote file. It takes the following additional arguments:

name - file in Data Lab to delete [required]

> datalab rm -—name=vos://dbs/test.vot

• rmdir

This deletes a remote directory. It takes the following additional arguments:

name - directory in Data Lab to delete [required]

> datalab rmdir -—name=vos://dbs

• schema

Prints data service schema info:

val - Value to list ([[<schema>][.<table>][.<col>]]) [optional]

format - Output format (csv|text|json) [optional]

profile - Service profile [optional]

> datalab schema

Schema Name   Description
 -----------   -----------
     allwise   The AllWISE data release
     dad_dr1   DECam Asteroid Database -- Data Release 1
     dad_dr2   DECam Asteroid Database -- Data Release 2
  decaps_dr1   DECaPS Survey Data Release 1
   delve_dr1   First data release of the DELVE survey (Drlica-Wagner et a
               l. 2020)
     des_dr1   Dark Energy Survey (DES) Data Release 1
     des_dr2   Dark Energy Survey (DES) Data Release 2
    des_sva1   DES SVA1 Data Products
      dltest
    gaia_dr1   GAIA Data Release 1
    gaia_dr2   Gaia Data Release 2 -- April 25, 2018
   gaia_edr3   The early installment of the third Gaia data release
   ...

• services

This lists the available data services for particular tables. “scs” = simple cone search, “sia” = simple image access, “vos” = virtual storage.

> datalab services

• status

This shows the status of the current user: whether they are logged in or not, the list of current jobs/queries.

> datalab status
User sarah is logged into the Data Lab

• svc_urls

This shows the service URLs in use.

> datalab svc_urls
    Auth Mgr:  https://datalab.noao.edu/auth
   Query Mgr:  https://datalab.noao.edu/query
 Storage Mgr:  https://datalab.noao.edu/storage
Resource Mgr:  https://datalab.noao.edu/res

• tag

This tags a remote file with a user-specified label. It takes the following additional arguments:

file - file in Data Lab to tag [required]

tag - tag to add to file [required]

> datalab tag -—file=vos://dbs/votable.vot -—tag="A crucial data file"

• version

This shows what task version is currently running on a user’s computer.

> datalab version
Task Version:  2.19.1

• whoami

This shows the current active user:

> datalab whoami
sarah