.. sectionauthor:: Mike Fitzpatrick .. index:: single: exception handling .. _sec_ExceptionHandlinginDataLabServices: ####################################### Exception Handling in Data Lab Services ####################################### .. note:: This document is currently in a DRAFT stage. This document will describe the recommended handling of errors and exceptions in the Python-based Data Lab middleware (i.e. the various 'manager' services). The Java-based servlets (i.e. VOSpace and DALServer) implement IVOA protocols that describe the required exception handling for those protocols. These are not covered in detail except when/how those protocol exceptions should be handled by the Data Lab server and client codes. Exceptions vs. return values ============================ The two philosophies about how to handle errors are sometimes described as *EAFP* (`Easier to Ask for Forgiveness than Permission`) and *LBYL* (`Look Before You Leap`). Under the LBYL model, the application using the interface, the interface itself and the service each must anticipate *all* potential error conditions (either by catching system library exceptions and/or writing code to check in advance that parameters are valid, files exist, etc) and respond with an error code. With the EAFP model, exceptions can *either* be handled, ignored or dealt with at a more appropriate level in the application. The EAFP model also has readability benefits and so is the model adopted for Data Lab Exceptions are, by definition, unexpected behavior of the code, but they can also be raised in response to improper use of the client or service (e.g. missing or invalid input). We require that all interface methods (client and server) describe their calling arguments as well as what (if anything) is returned by the method or service in the method docstrings. ..... Service Architecture Overview ============================= The existing Data Lab service and client code all have a similar structure, however since they were developed at different times in the project there remain a number of inconsistencies between the services. This section describes the target structure we wish to have for all code following a throrough code review, notes will be used to identify known problems to be addressed in the current release. Server-side Code ================ Data Lab services are implemented using the Python `Flask `_ microframework. These services define a middleware layer that clients (web-based, programmatic or command-line) access from their interfaces. As middleware, these services, depending on their function, may in turn call lower-level services (e.g. VOSpace, TAP, SIA, etc) or access a resource such as a database directly. The return value of each service is documented in the service implementation docstring. .. note:: The code review process is intended to discover those docstrings that don't yet provide the required service documentation. As an example, a simple 'echo' service might look something like: .. code-block:: python @app.route('/echo') def echo(arg): ''' ECHO - A simple echo service endpoint Parameters ---------- arg : str The argument to be echoed. Returns ------- (Status 200) A string that echoes the argument (Status 400) An error message string ''' if arg is not None: return "Hello %s!\n" % arg else raise Exception('Missing "arg" parameter') In this example, simply raising the generic ``Exception`` will cause the service to return a 500 (Internal Server Error) to the caller. In order to return a specific error message we need to define an ``errorhandler()`` method for the Flask application. For example, .. code-block:: python @app.errorhandler (Exception) def handle_invalid_request (error): return app.make_response(('Error: '+error.message, 400, '')) This is better in that it returns a proper HTTP response with the specified error message, but the status code is fixed at 400 (or whatever value chosen). The solution is to create an exception subclass in the server code (and an associated Flask ``errorhandler()``) that allows us to set the message, status code, and optionally an error payload: .. code-block:: python class dlInvalidRequest(Exception): def __init__(self, message, status_code=None, payload=None): Exception.__init__(self, message) self.message = message self.status_code = (status_code if status_code is not None else 400) self.payload = payload def to_dict(self): """ Method to return a JSON formatting of the error. """ rv = dict(self.payload or ()) rv['message'] = self.message rv['code'] = self.status_code return rv @app.errorhandler(dlInvalidRequest) def handle_invalid_request(error): return app.make_response(('Error: ' + error.message, error.status_code, '')) The service code then looks like: .. code-block:: python if arg is not None: return "Hello %s!\n" % arg else raise dlInvalidRequest('Missing argument', 400) When no argument is provided, the service will return a status 400 response with a specific error message useful to the client. Service Return Codes -------------------- As RESTful web services, the standard set of `HTTP return codes `_ are available to communicate status back to the calling client in addition to any returned data. This provides the flexibility needed to return error messages that provide detail on why the service failed when a standard status status message may be ambiguous. Exceptions in the server code should follow a few simple guidelines. Services will return: Status 200 (Successful) When the service performs the requested action without error. Status 400 (Bad Request) When the call fails due to missing or invalid input to the service. When a backend service returns an error status, that status should be returned to the client when it provides a more detailed explanation of the error. Status 403 (Forbidden) When the client does not present the identity token required to access or modify a requested resource. Status 404 (Not Found) When the service cannot access a requested `static` resource (e.g. a VOSpace URI). Status 503 (Service Unavailable) When the service requires access to a backend resource that cannot be reached (e.g. a database or storage system), preventing the entire service from executing as required. Backend-services (e.g. TAP, VOSpace, database) may have return codes specified by the protocols used, but these will be handled by the service when determining whether the call succeeded. For example, when deleting a file from VOSpace, the protocol requires a 204 status code response indicating the file was deleted, however the service should return a 200 status to the client because the ``storeClient.rm()`` method succeeded. In the event of an error in the VOSpace service that returns a non-204 code, the ``rm()`` service can handle or ignore the error or else return a 400 (Bad Request) error along with the specific error message from VOSpace. Similarly, a query that requires a user MyDB table that doesn't exist will return the error message from the database that identifies the missing table without requiring that every possible database message map into a corresponding HTTP status code. By limiting the number of trapped error status codes, the client has fewer specific exceptions to catch explicitly and can raise the error to the calling method more easily. Client-side Code ================ The basic layout of a Data Lab *Client* interface is something like the following (using the `AuthManager` as an example): .. code-block:: python def login (user, password): # Module method return ac_client.login(user, password) class authClient (Object): # AuthManager Object class def __init__(self): pass def login (self, user, password): # Class method resp = requests.get (svc_url, headers=hdrs) return resp def getClient(): # Get a new instance of the authClient return authClient() ac_client = getClient() # Create a default client object .. note:: The use of MultiMethod signatures is ignored here for brevity. This structure is intended to allow applications to directly access the *module methods* when using the default client instance created by the ``import`` of the module, but also the ability to create additional clients when necessary (e.g. when using DEV instances of Data Lab services, or when using different service profiles without requiring a resetting of the profile before each use in the default client). For example, .. code-block:: python from dl import queryClient as qc # standard import gp04 = qc.getClient(profile='gp04') # get new client instance with # 'gp04' profile res1 = qc.query (sql='....') # query default service res2 = gp04.query (sql='....') # query 'gp04' service .. note:: As of this writing, the use of MultiMethod signatures prevents new client instances (e.g. the 'gp04' client above) from working correctly. The solution is understood and will be implemented as part of the MultiMethod docstring work to come. Client-side exception handling ------------------------------ You can see from the client code example that the module methods are simply calls to the default client's class method which performs all the work of the client interface. Our goal is to catch and/or handle exceptions in this class module and simply raise them to the calling procedure. When writng a default client *module method*, the code may look something like: .. code-block:: python def login(user, password): try: resp = ac_client.login(user, password) except Exception as e: raise return resp In this way, an exception either returned by the service or raised by the class method is simply passed to the caller. On success, the normal return value of the method is returned. The *class method* that actually calls the service should use an appropriate ``try-except`` block to raise exceptions in the client code or returned by the service. Data Lab client interfaces use the ``requests`` module to make service calls where all exceptions that Requests raises inherit from the ``requests.exceptions.RequestException`` class, making it possible to trap the specific errors returned by a service individually as well as HTTP connection-related issues. For example, a class method might look something like: .. code-block:: python def login(user, passwd): resp = None try: resp = requests.get(url, params={'username':user,'password':passwd}) resp.raise_for_status() except requests.exceptions.RequestException as err: if resp is None: raise Exception (str(err)) # connection error else: raise Exception (resp.content) # service error return resp.content There are a few things to note in this example: - The ``raise_for_status()`` is used to raise HTTP errors that inherit from the ``RequestException`` object used by the ``requests`` module. Service errors (e.g. TimeOut) are returned using HTTP status codes and can be caught in the same block. - When the response ``resp`` is `None` during the ``except`` handling, the exception message is returned to indicate the specific HTTP error. However, if we have a valid ``resp`` object in an exception it was generated by the server and we return the error message in the response content to pass the message back from the service. The ``resp`` is initialized to let us differentiate the two exception types so that we can handle HTTP connection problems and service problem differently when needed. - Here we raise the generic ``Exception`` but in production code we generally create a throwable exception class to be used. For Py2/Py3 compatability this allows us to assure the 'string' type on the error message currently assumed by legacy code and can be removed later once a full transition to Py3 is complete. Client code of course does other processing, e.g. validating input parameters, processing return values and so on. These steps may themselves use ``try-except`` blocks to do additional error handling and should follow similar concepts when determining which exceptions are raised. .. note:: The ``requests`` call is so common to each client method that a utility method should be implemented to avoid code duplication and provide a central location for *all* service-related exception handling. Similar utility methods could be envisioned for paramter validation, ensuring standard string-type conversion, etc. Coding Style: Examples vs. Applications ======================================= In ``HowTo`` notebooks, science examples and general documentation the intent is often to convey an example use of an interface or service, implicitly assuming the example will always succeed. On the other hand, application code should be written to catch exceptions that might be raised or that risk aborting a task entirely. As an example, consider the ``authClient.login()`` method that returns an authorization token for the user. In `example code` this might be written as .. code-block:: python token = authClient.login ('foobar', getpass()) where we assume a valid user and password are entered and the ``token`` variable then contains a valid auth token for the application. In `application code` we would want to protect this call to catch a login error with something like: .. code-block:: python try: token = authClient.login ('foobar', getpass()) except Exception as e: print ('Login Error: ' + str(e)) else: print ("Logged in as user '%s'" % token.split('.')[0]) The added ``try-except`` block in this code snippet, however, distracts from the example use of ``login()`` being demonstrated. We recommend therefore that its use should be limited either to examples that show explicitly how errors are to be handled or when writing production-quality code. .. note:: Client API Return Values ------------------------ In many cases existing client API code now returns a mix of an 'OK' string, and error message, or the (valid) return data from the service. We wish to have *all* methods throw exceptions on an error and return either nothing or valid data from the service. Client API methods may return objects of various types. Two issues still to be settled are: - proper handling of boolean return values, i.e. ensure the Python True/False type is returned and not strings - proper handling of string return types, i.e. enforce 'string' or allow for return of 'byte' types under Py3 that may require decoding.