1.5.5. DL Interface¶
This document describes the Data Lab (DL) Python Interface, which is a user-friendly way of interacting with the various Data Lab services.
A full “How To” Jupyter notebook is available as
More on Jupyter notebooks.
Below are the main types of interactions with the DL services and some example commands.
1.5.5.1. Overview¶
In this era of large astronomical data, Data Lab provides services for working with those large data sets. Currently, the main DL services are authentication, database queries, virtual storage (VOSpace) and a personal database (MyDB). In the future, more advanced features such as VO “capabilities” and job management will be added.
1.5.5.2. Help¶
The most important thing to know when starting to use a new application or program is how to
get help when you are stuck. The DL Interface has a lot of documentation built-in. Just
type dl.help()
to get a list of all the commands.
> dl.help()
The Data Lab python interface.
The available commands are:
dl.help() - Helpful information
Use dl.help(<command>) for specific help on a command.
-- Login and authentication --
dl.login() - Login to the Data Lab
dl.logout() - Logout of the Data Lab
dl.status() - Report on the user status
dl.whoami() - Print the current active user
dl.servicestatus() - Report on the status of the DL services
-- File system operations --
dl.ls() - List a location in Data Lab VOSpace
dl.get() - Get a file from Data Lab VOSpace
dl.put() - Put a file into Data Lab VOSpace
dl.cp() - Copy a file in Data Lab VOSpace
dl.mv() - Move a file in Data Lab VOSpace
dl.rm() - Delete a file in Data Lab VOSpace
dl.mkdir() - Create a directory in Data Lab VOSpace
dl.rmdir() - Delete a directory in Data Lab VOSpace
dl.ln() - Link a file in Data Lab VOSpace
dl.load() - Load data from a local or VOSpace
dl.save() - Save data to a local or VOSpace file
dl.copyurl() - Copy a file from a URL to Data Lab VOSpace
-- Query and database operations --
dl.query() - Query a remote data service in the Data Lab
dl.queryhistory() - List history of queries made
dl.queryresults() - Get the async query results
dl.querystatus() - Get an async query job status
dl.queryprofiles() - List the available query profiles
dl.schema() - Get information on database schemas
dl.droptable() - Drop a user MyDB table
dl.exporttable() - Copy a user MyDB table to a VOSpace CSV file
dl.listdb() - List the user MyDB tables
dl.siaquery() - Query a SIA service in the Data Lab
To get more documentation on how a specific command works just type dl.help(<COMMAND>)
.
> dl.help('cp')
Help on method cp in module dl.dlinterface:
cp(self, source=None, destination=None, verbose=True) method of dl.dlinterface.Dlinterface instance
Copy a file in Data Lab VOSpace.
Parameters
----------
source : str
The name of the file in VOSpace to copy, e.g. ``file1.txt``.
destination : str
The new name of the file in VOSpace, e.g. ``newfile1.txt``.
Example
-------
Copy the file ``file.txt`` to ``newfile.txt``.
.. code-block:: python
dl.ls()
file1.txt
dl.cp('file1.txt','newfile.txt')
dl.ls()
file1.txt, newfile.txt
You can also use the standard Python methods to get help using help(dl.cp)
or dl.cp?
.
1.5.5.3. Database Queries¶
Data Lab hosts many large datasets that can be easily accessed via the query tools.
The dl.query()
command will execute the query (via the DL Query service) and return the
results directly to Python in whatever format you specify. A simple query looks like this:
> res = dl.query('select * from smash_dr1.object limit 1000',fmt='pandas')
Returning Pandas dataframe
> len(res)
1000
In this case, the results of 1000 records/rows of the SMASH DR1 object
table are returned as a Pandas data frame. The basic SQL syntax is SELECT <columns> FROM <catalog>.<table> WHERE
<constraints> LIMIT <limit>
(see SQL tutorial for more help). Regular
SQL (default) and ADQL are supported (specify with qtype='sql|adql'
).
To see what catalogs are available use the dl.schema()
to print them out:
> dl.schema()
Schema Name Description
----------- -----------
gaia_dr1 GAIA Data Release 1
ivoa IVOA ObsCore tables
des_sva1 DES SVA1 Data Products
tap_schema TAP Schema Tables
usno USNO Astrometry Catalogs
sdss_dr13
neo_dr1 NEO Survey Data Release 1
smash_dr1 SMASH Data Release 1
ls_dr3 The DECam Legacy Survey Data Release 3
The available output formats are csv
, ascii
(tab-delimited), numpy array
, numpy structured array
(structarray
), pandas
, Astropy table
, and Astropy votable
.
It’s also possible to save query results directly to your VOSpace or MyDB. That is done with the out=
option.
Use a vos://
prefix for VOSpace and mydb://
for MyDB. For VOSpace output the fits
and hdf5
formats
are also available.
> dl.query('select * from smash_dr1.object limit 1000',fmt='fits',out='vos://smashtest.fits')
> dl.query('select * from smash_dr1.object limit 1000',out='mydb://smashtest')
For longer queries use an async
query that will run in the background. Just set async=True
and the returned
value will be the JobID
. The status can be checked with dl.querystatus(JobID)
and the results can be
retrived with res=dl.queryresults(JobID)
.
> jobid = dl.query('select * from smash_dr1.object limit 8000',async=True,fmt='table')
Asynchronous query JobID = vc7qn7h3fokzpysy
> dl.querystatus(jobid)
COMPLETED
> tab = dl.queryresults(jobid)
Returning Astropy Table
You can also see all of your queries (from this session) using dl.queryhistory()
.
> dl.queryhistory()
-------------------------------------------------------------------------------------------------------------------
QID DATE Type A/SYNC Format Status JobID Query
-------------------------------------------------------------------------------------------------------------------
1 2017-06-03 12:37:49 sql SYNC pandas 1000 -- 'select * from smash_dr1.object limit 1000'
2 2017-06-03 13:40:29 sql ASYNC table COMPLETED vc7qn7h3fokzpysy 'select * from smash_dr1.object limit 8000'
-------------------------------------------------------------------------------------------------------------------
The QID
query ID can also be used to retreive results (instead of JobID) or to rerun a previous query.
1.5.5.4. Simple Image Access Queries:¶
The images in the Science Data Archive can also be searched using dl.siaquery()
for a point in the sky (ra, dec) and a search
radius.
> itab = dl.siaquery(0.5,10.0,0.1)
The image list contains 6 entries
The search returns an Astropy table with information on the images including the URL (access_url
) for downloading
the file. You can download a file with a URL to VOSpace with dl.copyurl()
.
> dl.copyurl(itab['access_url'][0],'im1.fits')
This file can be loaded into memory with the dl.load()
command.
> im,head = dl.load('vos://im1.fits')
1.5.5.5. MyDB Tools¶
The tables in MyDB can also be queried the same way as the standard DL catalogs. Just add a mydb://
prefix to the
table in the SQL query.
> tab = dl.query('select * from mydb://smashtest where ra > 100',fmt='table')
Returning Astropy Table
There are also a handful of commands for manipulating MyDB tables: listdb
, droptable
, and exporttable
.
> dl.listdb()
test2 smashtest test5
The exporttable
is handy for downloading a table from MyDB to VOSpace. The supported output formats are
csv
, fits
and hdf5
.
> dl.exporttable('smashtest','smashtest.fits')
1.5.5.6. Virtual Storage¶
There are several unix-like commands for manipulating files in VOSpace: ls
, cp
, mv
, rm
,
mkdir
, rmdir
, and ln
.
> dl.ls(verbose=True)
-rw-rw---- dnidever 7.2M 03 Jun 2017 14:19:01 im1.fits
drw-rw---- dnidever 512B 02 Feb 2017 00:00:00 public/
-rw-rw-r-- dnidever 250.1K 03 Jun 2017 12:35:15 test.fits
-rw-rw---- dnidever 163.1K 02 Jun 2017 15:21:29 test1.fits
-rw-rw-r-- dnidever 256.7K 03 Jun 2017 12:38:02 test2.fits
drw-rw---- dnidever 512B 02 Feb 2017 00:00:00 tmp/
You can easily copy files from your local file system to VOSpace or vice versa with the put
and get
commands.
# Copy a file to VOSpace
> dl.put('test1.fits','vos://')
(1 / 1) test1.fits -> vos://test1.fits
# Get a file from VOSpace
dl.get('vos://test1.fits','localtest1.fits')
!ls localtest1.fits
(1/1) [====================] [ 163.1K] test1.fits
localtest1.fits
It’s easy to load data into python from a local file or a file in VOSpace using the load
command. The same
output formats as are available in the query
command are supported.
# Load file vos://output.fits into a pandas data frame
df = dl.load('vos://test10.fits',fmt='pandas')
type(df)
pandas.core.frame.DataFrame
# Load the FITS image file im1.fits
im,head = dl.load('vos://im1.fits')
Finally, to save any data object to a local file or a file in VOSpace using the save
command.
# Save a pandas data frame 'df' to file vos://results.hdf5
dl.save(df, 'vos://results2.hdf5')