.. _sec_DLInterface: ************ DL Interface ************ This document describes the Data Lab (DL) Python Interface, which is a user-friendly way of interacting with the various Data Lab services. A full "How To" Jupyter notebook is available as * `active notebook on Data Lab's notebook server `_ * `rendered HTML view `_ More on :ref:`Jupyter notebooks `. Below are the main types of interactions with the DL services and some example commands. .. _sec_DLInterface_overview: Overview ======== In this era of large astronomical data, Data Lab provides services for working with those large data sets. Currently, the main DL services are authentication, database queries, virtual storage (VOSpace) and a personal database (MyDB). In the future, more advanced features such as VO "capabilities" and job management will be added. .. _sec_DLInterface_help: Help ==== The most important thing to know when starting to use a new application or program is how to get help when you are stuck. The DL Interface has a lot of documentation built-in. Just type ``dl.help()`` to get a list of all the commands. :: > dl.help() The Data Lab python interface. The available commands are: dl.help() - Helpful information Use dl.help() for specific help on a command. -- Login and authentication -- dl.login() - Login to the Data Lab dl.logout() - Logout of the Data Lab dl.status() - Report on the user status dl.whoami() - Print the current active user dl.servicestatus() - Report on the status of the DL services -- File system operations -- dl.ls() - List a location in Data Lab VOSpace dl.get() - Get a file from Data Lab VOSpace dl.put() - Put a file into Data Lab VOSpace dl.cp() - Copy a file in Data Lab VOSpace dl.mv() - Move a file in Data Lab VOSpace dl.rm() - Delete a file in Data Lab VOSpace dl.mkdir() - Create a directory in Data Lab VOSpace dl.rmdir() - Delete a directory in Data Lab VOSpace dl.ln() - Link a file in Data Lab VOSpace dl.load() - Load data from a local or VOSpace dl.save() - Save data to a local or VOSpace file dl.copyurl() - Copy a file from a URL to Data Lab VOSpace -- Query and database operations -- dl.query() - Query a remote data service in the Data Lab dl.queryhistory() - List history of queries made dl.queryresults() - Get the async query results dl.querystatus() - Get an async query job status dl.queryprofiles() - List the available query profiles dl.schema() - Get information on database schemas dl.droptable() - Drop a user MyDB table dl.exporttable() - Copy a user MyDB table to a VOSpace CSV file dl.listdb() - List the user MyDB tables dl.siaquery() - Query a SIA service in the Data Lab To get more documentation on how a specific command works just type ``dl.help()``. :: > dl.help('cp') Help on method cp in module dl.dlinterface: cp(self, source=None, destination=None, verbose=True) method of dl.dlinterface.Dlinterface instance Copy a file in Data Lab VOSpace. Parameters ---------- source : str The name of the file in VOSpace to copy, e.g. ``file1.txt``. destination : str The new name of the file in VOSpace, e.g. ``newfile1.txt``. Example ------- Copy the file ``file.txt`` to ``newfile.txt``. .. code-block:: python dl.ls() file1.txt dl.cp('file1.txt','newfile.txt') dl.ls() file1.txt, newfile.txt You can also use the standard Python methods to get help using ``help(dl.cp)`` or ``dl.cp?``. .. _sec_DLInterface_queries: Database Queries ================ Data Lab hosts many large datasets that can be easily accessed via the query tools. The ``dl.query()`` command will execute the query (via the DL Query service) and return the results directly to Python in whatever format you specify. A simple query looks like this: :: > res = dl.query('select * from smash_dr1.object limit 1000',fmt='pandas') Returning Pandas dataframe > len(res) 1000 In this case, the results of 1000 records/rows of the `SMASH DR1 `_ object table are returned as a Pandas data frame. The basic SQL syntax is ``SELECT FROM . WHERE LIMIT `` (see `SQL tutorial `_ for more help). Regular SQL (default) and ADQL are supported (specify with ``qtype='sql|adql'``). To see what catalogs are available use the ``dl.schema()`` to print them out: :: > dl.schema() Schema Name Description ----------- ----------- gaia_dr1 GAIA Data Release 1 ivoa IVOA ObsCore tables des_sva1 DES SVA1 Data Products tap_schema TAP Schema Tables usno USNO Astrometry Catalogs sdss_dr13 neo_dr1 NEO Survey Data Release 1 smash_dr1 SMASH Data Release 1 ls_dr3 The DECam Legacy Survey Data Release 3 The available output formats are ``csv``, ``ascii`` (tab-delimited), numpy ``array``, numpy structured array (``structarray``), ``pandas``, Astropy ``table``, and Astropy ``votable``. It's also possible to save query results directly to your VOSpace or MyDB. That is done with the ``out=`` option. Use a ``vos://`` prefix for VOSpace and ``mydb://`` for MyDB. For VOSpace output the ``fits`` and ``hdf5`` formats are also available. :: > dl.query('select * from smash_dr1.object limit 1000',fmt='fits',out='vos://smashtest.fits') > dl.query('select * from smash_dr1.object limit 1000',out='mydb://smashtest') For longer queries use an ``async`` query that will run in the background. Just set ``async=True`` and the returned value will be the ``JobID``. The status can be checked with ``dl.querystatus(JobID)`` and the results can be retrived with ``res=dl.queryresults(JobID)``. :: > jobid = dl.query('select * from smash_dr1.object limit 8000',async=True,fmt='table') Asynchronous query JobID = vc7qn7h3fokzpysy > dl.querystatus(jobid) COMPLETED > tab = dl.queryresults(jobid) Returning Astropy Table You can also see all of your queries (from this session) using ``dl.queryhistory()``. :: > dl.queryhistory() ------------------------------------------------------------------------------------------------------------------- QID DATE Type A/SYNC Format Status JobID Query ------------------------------------------------------------------------------------------------------------------- 1 2017-06-03 12:37:49 sql SYNC pandas 1000 -- 'select * from smash_dr1.object limit 1000' 2 2017-06-03 13:40:29 sql ASYNC table COMPLETED vc7qn7h3fokzpysy 'select * from smash_dr1.object limit 8000' ------------------------------------------------------------------------------------------------------------------- The ``QID`` query ID can also be used to retreive results (instead of JobID) or to rerun a previous query. .. _sec_DLInterface_siaquery: Simple Image Access Queries: ============================ The images in the Science Data Archive can also be searched using ``dl.siaquery()`` for a point in the sky (ra, dec) and a search radius. :: > itab = dl.siaquery(0.5,10.0,0.1) The image list contains 6 entries The search returns an Astropy table with information on the images including the URL (``access_url``) for downloading the file. You can download a file with a URL to VOSpace with ``dl.copyurl()``. :: > dl.copyurl(itab['access_url'][0],'im1.fits') This file can be loaded into memory with the ``dl.load()`` command. :: > im,head = dl.load('vos://im1.fits') .. _sec_DLInterface_mydb: MyDB Tools ========== The tables in MyDB can also be queried the same way as the standard DL catalogs. Just add a ``mydb://`` prefix to the table in the SQL query. :: > tab = dl.query('select * from mydb://smashtest where ra > 100',fmt='table') Returning Astropy Table There are also a handful of commands for manipulating MyDB tables: ``listdb``, ``droptable``, and ``exporttable``. :: > dl.listdb() test2 smashtest test5 The ``exporttable`` is handy for downloading a table from MyDB to VOSpace. The supported output formats are ``csv``, ``fits`` and ``hdf5``. :: > dl.exporttable('smashtest','smashtest.fits') .. _sec_DLInterface_storage: Virtual Storage =============== There are several unix-like commands for manipulating files in VOSpace: ``ls``, ``cp``, ``mv``, ``rm``, ``mkdir``, ``rmdir``, and ``ln``. :: > dl.ls(verbose=True) -rw-rw---- dnidever 7.2M 03 Jun 2017 14:19:01 im1.fits drw-rw---- dnidever 512B 02 Feb 2017 00:00:00 public/ -rw-rw-r-- dnidever 250.1K 03 Jun 2017 12:35:15 test.fits -rw-rw---- dnidever 163.1K 02 Jun 2017 15:21:29 test1.fits -rw-rw-r-- dnidever 256.7K 03 Jun 2017 12:38:02 test2.fits drw-rw---- dnidever 512B 02 Feb 2017 00:00:00 tmp/ You can easily copy files from your local file system to VOSpace or vice versa with the ``put`` and ``get`` commands. :: # Copy a file to VOSpace > dl.put('test1.fits','vos://') (1 / 1) test1.fits -> vos://test1.fits # Get a file from VOSpace dl.get('vos://test1.fits','localtest1.fits') !ls localtest1.fits (1/1) [====================] [ 163.1K] test1.fits localtest1.fits It's easy to load data into python from a local file or a file in VOSpace using the ``load`` command. The same output formats as are available in the ``query`` command are supported. :: # Load file vos://output.fits into a pandas data frame df = dl.load('vos://test10.fits',fmt='pandas') type(df) pandas.core.frame.DataFrame # Load the FITS image file im1.fits im,head = dl.load('vos://im1.fits') Finally, to save any data object to a local file or a file in VOSpace using the ``save`` command. :: # Save a pandas data frame 'df' to file vos://results.hdf5 dl.save(df, 'vos://results2.hdf5')