1.5.5. DL Interface

This document describes the Data Lab (DL) Python Interface, which is a user-friendly way of interacting with the various Data Lab services.

A full “How To” Jupyter notebook is available as

More on Jupyter notebooks.

Below are the main types of interactions with the DL services and some example commands.

1.5.5.1. Overview

In this era of large astronomical data, Data Lab provides services for working with those large data sets. Currently, the main DL services are authentication, database queries, virtual storage (VOSpace) and a personal database (MyDB). In the future, more advanced features such as VO “capabilities” and job management will be added.

1.5.5.2. Help

The most important thing to know when starting to use a new application or program is how to get help when you are stuck. The DL Interface has a lot of documentation built-in. Just type dl.help() to get a list of all the commands.

> dl.help()
The Data Lab python interface.

The available commands are:

dl.help()      - Helpful information
Use dl.help(<command>) for specific help on a command.

-- Login and authentication --
dl.login()          - Login to the Data Lab
dl.logout()         - Logout of the Data Lab
dl.status()         - Report on the user status
dl.whoami()         - Print the current active user
dl.servicestatus()  - Report on the status of the DL services

-- File system operations --
dl.ls()        - List a location in Data Lab VOSpace
dl.get()       - Get a file from Data Lab VOSpace
dl.put()       - Put a file into Data Lab VOSpace
dl.cp()        - Copy a file in Data Lab VOSpace
dl.mv()        - Move a file in Data Lab VOSpace
dl.rm()        - Delete a file in Data Lab VOSpace
dl.mkdir()     - Create a directory in Data Lab VOSpace
dl.rmdir()     - Delete a directory in Data Lab VOSpace
dl.ln()        - Link a file in Data Lab VOSpace
dl.load()      - Load data from a local or VOSpace
dl.save()      - Save data to a local or VOSpace file
dl.copyurl()   - Copy a file from a URL to Data Lab VOSpace

-- Query and database operations --
dl.query()          - Query a remote data service in the Data Lab
dl.queryhistory()   - List history of queries made
dl.queryresults()   - Get the async query results
dl.querystatus()    - Get an async query job status
dl.queryprofiles()  - List the available query profiles
dl.schema()         - Get information on database schemas
dl.droptable()      - Drop a user MyDB table
dl.exporttable()    - Copy a user MyDB table to a VOSpace CSV file
dl.listdb()         - List the user MyDB tables
dl.siaquery()       - Query a SIA service in the Data Lab

To get more documentation on how a specific command works just type dl.help(<COMMAND>).

> dl.help('cp')
Help on method cp in module dl.dlinterface:

cp(self, source=None, destination=None, verbose=True) method of dl.dlinterface.Dlinterface instance
    Copy a file in Data Lab VOSpace.

    Parameters
    ----------
    source : str
         The name of the file in VOSpace to copy, e.g. ``file1.txt``.

    destination : str
         The new name of the file in VOSpace, e.g. ``newfile1.txt``.

    Example
    -------

    Copy the file ``file.txt`` to ``newfile.txt``.

    .. code-block:: python

        dl.ls()
        file1.txt

        dl.cp('file1.txt','newfile.txt')

        dl.ls()
        file1.txt, newfile.txt

You can also use the standard Python methods to get help using help(dl.cp) or dl.cp?.

1.5.5.3. Database Queries

Data Lab hosts many large datasets that can be easily accessed via the query tools. The dl.query() command will execute the query (via the DL Query service) and return the results directly to Python in whatever format you specify. A simple query looks like this:

> res = dl.query('select * from smash_dr1.object limit 1000',fmt='pandas')
Returning Pandas dataframe
> len(res)
1000

In this case, the results of 1000 records/rows of the SMASH DR1 object table are returned as a Pandas data frame. The basic SQL syntax is SELECT <columns> FROM <catalog>.<table> WHERE <constraints> LIMIT <limit> (see SQL tutorial for more help). Regular SQL (default) and ADQL are supported (specify with qtype='sql|adql').

To see what catalogs are available use the dl.schema() to print them out:

> dl.schema()


  Schema Name   Description
  -----------   -----------
     gaia_dr1   GAIA Data Release 1
         ivoa   IVOA ObsCore tables
     des_sva1   DES SVA1 Data Products
   tap_schema   TAP Schema Tables
         usno   USNO Astrometry Catalogs
    sdss_dr13
      neo_dr1   NEO Survey Data Release 1
    smash_dr1   SMASH Data Release 1
       ls_dr3   The DECam Legacy Survey Data Release 3

The available output formats are csv, ascii (tab-delimited), numpy array, numpy structured array (structarray), pandas, Astropy table, and Astropy votable.

It’s also possible to save query results directly to your VOSpace or MyDB. That is done with the out= option. Use a vos:// prefix for VOSpace and mydb:// for MyDB. For VOSpace output the fits and hdf5 formats are also available.

> dl.query('select * from smash_dr1.object limit 1000',fmt='fits',out='vos://smashtest.fits')

> dl.query('select * from smash_dr1.object limit 1000',out='mydb://smashtest')

For longer queries use an async query that will run in the background. Just set async=True and the returned value will be the JobID. The status can be checked with dl.querystatus(JobID) and the results can be retrived with res=dl.queryresults(JobID).

> jobid = dl.query('select * from smash_dr1.object limit 8000',async=True,fmt='table')
Asynchronous query JobID = vc7qn7h3fokzpysy

> dl.querystatus(jobid)
COMPLETED

> tab = dl.queryresults(jobid)
Returning Astropy Table

You can also see all of your queries (from this session) using dl.queryhistory().

> dl.queryhistory()
-------------------------------------------------------------------------------------------------------------------
QID          DATE        Type  A/SYNC  Format       Status          JobID                   Query
-------------------------------------------------------------------------------------------------------------------
1    2017-06-03 12:37:49  sql   SYNC   pandas       1000        --                  'select * from smash_dr1.object limit 1000'
2    2017-06-03 13:40:29  sql   ASYNC  table        COMPLETED   vc7qn7h3fokzpysy    'select * from smash_dr1.object limit 8000'
-------------------------------------------------------------------------------------------------------------------

The QID query ID can also be used to retreive results (instead of JobID) or to rerun a previous query.

1.5.5.4. Simple Image Access Queries:

The images in the Science Data Archive can also be searched using dl.siaquery() for a point in the sky (ra, dec) and a search radius.

> itab = dl.siaquery(0.5,10.0,0.1)
The image list contains 6 entries

The search returns an Astropy table with information on the images including the URL (access_url) for downloading the file. You can download a file with a URL to VOSpace with dl.copyurl().

> dl.copyurl(itab['access_url'][0],'im1.fits')

This file can be loaded into memory with the dl.load() command.

> im,head = dl.load('vos://im1.fits')

1.5.5.5. MyDB Tools

The tables in MyDB can also be queried the same way as the standard DL catalogs. Just add a mydb:// prefix to the table in the SQL query.

> tab = dl.query('select * from mydb://smashtest where ra > 100',fmt='table')
Returning Astropy Table

There are also a handful of commands for manipulating MyDB tables: listdb, droptable, and exporttable.

> dl.listdb()
test2 smashtest test5

The exporttable is handy for downloading a table from MyDB to VOSpace. The supported output formats are csv, fits and hdf5.

> dl.exporttable('smashtest','smashtest.fits')

1.5.5.6. Virtual Storage

There are several unix-like commands for manipulating files in VOSpace: ls, cp, mv, rm, mkdir, rmdir, and ln.

> dl.ls(verbose=True)
-rw-rw----  dnidever    7.2M  03 Jun 2017 14:19:01  im1.fits
drw-rw----  dnidever    512B  02 Feb 2017 00:00:00  public/
-rw-rw-r--  dnidever  250.1K  03 Jun 2017 12:35:15  test.fits
-rw-rw----  dnidever  163.1K  02 Jun 2017 15:21:29  test1.fits
-rw-rw-r--  dnidever  256.7K  03 Jun 2017 12:38:02  test2.fits
drw-rw----  dnidever    512B  02 Feb 2017 00:00:00  tmp/

You can easily copy files from your local file system to VOSpace or vice versa with the put and get commands.

# Copy a file to VOSpace
> dl.put('test1.fits','vos://')
(1 / 1) test1.fits -> vos://test1.fits

# Get a file from VOSpace
dl.get('vos://test1.fits','localtest1.fits')
!ls localtest1.fits
(1/1) [====================] [ 163.1K] test1.fits
localtest1.fits

It’s easy to load data into python from a local file or a file in VOSpace using the load command. The same output formats as are available in the query command are supported.

# Load file vos://output.fits into a pandas data frame
df = dl.load('vos://test10.fits',fmt='pandas')
type(df)
pandas.core.frame.DataFrame

# Load the FITS image file im1.fits
im,head = dl.load('vos://im1.fits')

Finally, to save any data object to a local file or a file in VOSpace using the save command.

# Save a pandas data frame 'df' to file vos://results.hdf5
dl.save(df, 'vos://results2.hdf5')