Section author: Robert Nikutta <robert.nikutta@noirlab.edu>, Alice Jacques <alice.jacques@noirlab.edu>
Version: 20240618
1.7.1. The datalab Command¶
1.7.1.1. Data Lab Command Line Client¶
The Data Lab Command Line Client (DCLC) is a Python-based package that provides an alternate way to interact with the various Data Lab services. It can be installed with:
pip install --ignore-installed --no-cache-dir astro-datalab
It is invoked via the datalab
command. The list of supported Data Lab tasks can be obtained via datalab --help
:
> datalab --help
Usage:
% datalab <task> [task_options]
where <task> is one of:
cp - copy a file in Data Lab
get - get a file from Data Lab
ln - link a file in Data Lab
login - Login to the Data Lab
logout - Logout of the Data Lab
ls - list a location in Data Lab
mkdir - create a directory in Data Lab
mv - move a file in Data Lab
mydb_copy - Rename a user MyDB table
mydb_create - Create a user MyDB table
mydb_drop - Drop a user MyDB table
mydb_import - Import data into a user MyDB table
mydb_index - Index data in a MyDB table
mydb_insert - Insert data into a user MyDB table
mydb_list - List the user MyDB tables
mydb_rename - Rename a user MyDB table
mydb_truncate - Truncate a user MyDB table
profiles - List the available Query Manager profiles
put - Put a file into Data Lab
qresults - Get the async query results
qstatus - Get an async query job status
query - Query a remote data service in the Data Lab
rm - delete a file in Data Lab
rmdir - delete a directory in Data Lab
schema - Print data service schema info
services - Print available data services
status - Report on the user status
svc_urls - Print service URLs in use
tag - tag a file in Data Lab
version - Print task version
whoami - Print the current active user
All subcommands take the optional arguments:
debug - print debug level log messages [optional]
verbose - print verbose level log messages [optional]
warning - print warning level log messages [optional]
If a required argument is not specified on the command line, a prompt will be given for it. If you are specifying an argument on the command line then you need to put two dashes --
in front of the argument name and an equals before the value of the argument:
-—argument=value.
1.7.1.2. Referencing files in Data Lab¶
When you want to refer to a file in the Data Lab (also called a remote file), you need to put a 'vos://'
prefix before it so that the DCLC knows it is a remote file you are referring to. If you want to be really precise, you can use the full identifier for the file (also known as the VOSpace identifier) which would be something like:
vos://datalab.noirlab.edu!vospace/nodes/sarah/data/table1.vot
However, you can also just use the location within your virtual storage area, in this case - 'vos://data/table1.vot'
- and the DCLC will translate this into the proper form for you. Note that if you need to identify a file within someone else’s virtual storage, e.g., a data file that a collaborator is sharing with you, then you will need to use the full VOSpace identifier to refer to it.
1.7.1.3. Task reference¶
• cp
This copies a remote file between the two specified locations. (The get
and put
commands are used to copy files between Data Lab virtual storage and local storage.) It takes the following additional arguments:
from - Source location in Data Lab [required]
to - Destination location in Data Lab [required]
> datalab cp -—from=vos://dbs/test.vot -—to=vos://results/test.vot
• get
This retrieves the specified remote file and optionally saves it to a local file. It takes the following additional arguments:
fr - Remote Data Lab file name [required]
to - Local disk file name [optional]
> datalab get -—fr=vos://data/test.vot -—to=test.vot
• ln
This creates a (soft) link to the specified file at the given location. It takes the following additional arguments:
fr - location in Data Lab of link from[required]
to - location linked points to [required]
> datalab ln -—fr=vos://dbs -—to=http://some/data/file
• login
This logs a user into a Data Lab session and optionally mounts their remote storage space (see mount). It takes the following additional arguments:
user - username of account in Data Lab [required]
password - password for account in Data Lab [required]
mount - mountpoint of remote Virtual Storage [optional]
> datalab login -—user=sarah -—password=herr1ng -—mount=/tmp/vospace
Welcome to the Data Lab, sarah
Initializing mount
• logout
This logs out the user from a Data Lab session and optionally unmounts their remote storage space. It takes the following additional arguments:
unmount - mountpoint of remote Virtual Storage [optional]
> datalab logout -—unmount=/tmp/vospace
Unmounting remote space
You are now logged out of the Data Lab
• ls
This lists a remote directory. It takes the following additional arguments:
name - Location in Data Lab to list [optional]
format - Format for listing (ascii|csv|raw) [optional]
> datalab ls -—name=...
• mkdir
This creates the specified directory. It takes the following additional arguments:
name - directory in Data Lab to create [required]
> datalab mkdir -—name=vos://test
• mv
This moves the specified remote file/directory between the two locations. It takes the following additional arguments:
from - location in Data Lab to move from [required]
to - location in Data Lab to move to [required]
> datalab mv -—from=vos://data/test.vot -—to=vos://work/test.vot
• mydb_copy
This copies a MyDB table that currently exists in a user’s MyDB database. It takes the following additional arguments:
source - name of table to copy [required]
target - name of new table with copied data from source table [required]
> datalab mydb_copy --source="test" --target="test_copy"
• mydb_create
This creates a new empty MyDB table with a user-provided schema file. It takes the following additional arguments:
table - name of the new MyDB table to create [required]
schema - location and name of the schema definition to be in the table [required]
The schema definition is stored in a text file. The schema definition file is a CSV-formatted file that contains column name and (Postgres) data type, one row per column. The general format is:
Columnname1,datatype1\nColumnname2,datatype2\nColumnname3,datatype3
> datalab mydb_create --table="createdtable" --schema="./schema.txt"
• mydb_drop
This removes a MyDB table from a user’s MyDB database. It takes the following additional argument:
table - name of the table we wish to remove from MyDB database [required]
> datalab mydb_drop --table="test"
• mydb_import
This imports data saved on a local computer or imports data from VOSpace into a MyDB data table. The data must be in the form of either a CSV file or Pandas Dataframe object in order to load it into MyDB. It takes the following additional arguments:
table - name of the new MyDB table to create with the imported data [required]
data - location and name of the data to import [required]
> datalab mydb_import --table="result_table" --data="./result_table.csv"
• mydb_insert
This inserts data saved on a local computer or inserts data from VOSpace into a pre-existing MyDB table. The data must be in the form of either a CSV file or Pandas Dataframe object in order to load it into MyDB. It takes the following additional arguments:
table - name of the pre-existing MyDB table in which to insert the data [required]
data - location and name of the data to insert into the table [required]
> datalab mydb_insert --table="result_table" --data="./other_results.csv"
• mydb_list
This lists all of the MyDB tables currently in a user’s database. This function can also list the schema and schema’s datatype in a specified MyDB table. It takes the following additional argument:
table - name of the MyDB table [optional]
> datalab mydb_list
result_table,created:2020-11-25 13:39:15 MST
usno_objects,created:2020-09-16 14:22:23 MST
> datalab mydb_list --table="result_table"
id,integer,
gmag,double precision,
imag,double precision,
rmag,double precision,
• mydb_rename
This renames a MyDB table. It takes the following additional arguments:
old - name of table to rename [required]
new - new name of table [required]
> datalab mydb_rename --old="result_table" --new="new_result_table"
• mydb_truncate
This truncates a MyDB table, i.e. drops all rows but keeps the table definition (schema). It takes the following additional arguments:
table - name of the MyDB table to truncate [required]
> datalab mydb_truncate --table="new_result_table"
• profiles
This lists the available Query Manager and Storage Manager profiles.
> datalab profiles
Query Manager Profiles:
-----------------------
GALEX-DR6 GALEX DR6 TAP service (29 Tables, TAP Only)
GAVO GAVO Data Center TAP service (149 Tables, TAP Only)
HEASARC HEASARC Xamin TAP Service (921 Tables, TAP Only)
IRSA IRSA TAP Service (478 Tables, TAP Only)
SDSS-DR9 SDSS DR9 TAP service (92 Tables, TAP Only)
SIMBAD SIMBAD TAP service (47 Tables, TAP Only)
STScI-RegTAP STScI Registry TAP service (18 Tables, TAP Only)
Vizier TAP Vizier query engine (34381 Tables, TAP Only)
• put
This uploads a local file to the remote storage space. It takes the following additional arguments:
fr - Local disk file name [required]
to - Remote Data Lab file name [required]
> datalab put -—fr=/home/sarah/simulations/run5.txt -—to=vos://dbs/simul1.dat
• qresults
Returns the async query results. It takes the following additional arguments:
jobId - Query Job ID [required]
> datalab qresults -—jobId=...
• qstatus
Returns the async query job status. It takes the following additional arguments:
jobId - Query Job ID [required]
> datalab qstatus -—jobId=...
• query
This runs a query against either the db directly (synchronous) or via the TAP service (asynchronous). It takes the following additional arguments:
adql - ADQL statement [optional]
sql - input SQL filename [optional]
fmt - requested output format [optional] - ‘csv’, ‘ascii’, ‘votable’
out - output filename [required]
async_ - asynchronous query [optional]
profile - Service profile to use [optional]
timeout - Requested query timeout [optional]
Note that tables within your MyDB need to be identified with a mydb:// prefix in either the query or the output argument.
> datalab query -—adql="select cphotid, ra, dec, g_tot, r_tot from gogreen_dr1.photo limit 100" -—fmt="csv" -—out="vos://green.csv"
> datalab query --adql="select cphotid, ra, dec, g_tot, r_tot from gogreen_dr1.photo limit 100" --out="mydb://table1"
> datalab query -—adql="select id, g_i, i_z from lsdr2.stars l, mydb://table1 m where l.id = m.id" -—fmt="csv"
> datalab query -—sql=complex.sql -—async_=true -—fmt='csv'
• rm
This deletes a remote file. It takes the following additional arguments:
name - file in Data Lab to delete [required]
> datalab rm -—name=vos://dbs/test.vot
• rmdir
This deletes a remote directory. It takes the following additional arguments:
name - directory in Data Lab to delete [required]
> datalab rmdir -—name=vos://dbs
• schema
Prints data service schema info:
val - Value to list ([[<schema>][.<table>][.<col>]]) [optional]
format - Output format (csv|text|json) [optional]
profile - Service profile [optional]
> datalab schema
Schema Name Description
----------- -----------
allwise The AllWISE data release
dad_dr1 DECam Asteroid Database -- Data Release 1
dad_dr2 DECam Asteroid Database -- Data Release 2
decaps_dr1 DECaPS Survey Data Release 1
delve_dr1 First data release of the DELVE survey (Drlica-Wagner et a
l. 2020)
des_dr1 Dark Energy Survey (DES) Data Release 1
des_dr2 Dark Energy Survey (DES) Data Release 2
des_sva1 DES SVA1 Data Products
dltest
gaia_dr1 GAIA Data Release 1
gaia_dr2 Gaia Data Release 2 -- April 25, 2018
gaia_edr3 The early installment of the third Gaia data release
...
• services
This lists the available data services for particular tables. “scs” = simple cone search, “sia” = simple image access, “vos” = virtual storage.
> datalab services
• status
This shows the status of the current user: whether they are logged in or not, the list of current jobs/queries.
> datalab status
User sarah is logged into the Data Lab
• svc_urls
This shows the service URLs in use.
> datalab svc_urls
Auth Mgr: https://datalab.noirlab.edu/auth
Query Mgr: https://datalab.noirlab.edu/query
Storage Mgr: https://datalab.noirlab.edu/storage
Resource Mgr: https://datalab.noirlab.edu/res
• tag
This tags a remote file with a user-specified label. It takes the following additional arguments:
file - file in Data Lab to tag [required]
tag - tag to add to file [required]
> datalab tag -—file=vos://dbs/votable.vot -—tag="A crucial data file"
• version
This shows what task version is currently running on a user’s computer.
> datalab version
Task Version: 2.19.1
• whoami
This shows the current active user:
> datalab whoami
sarah