Section author: Mike Fitzpatrick <mike.fitzpatrick@noirlab.edu>
3.4. Data Lab Python Style Guide¶
Note
This document was derived from the version 6.0 of the LSST/DM Python Coding Standard (https://github.com/lsst-dm/dm_dev_guide/blob/master/python/style.rst). Changes have been made to reflect the coding styles allowed within Data Lab that are to be used as the basis for code reviews. External documents referenced in the original LSST/DM document have been partially imported as needed for clarity, or else now reference similarly modified Data Lab documents. Similar documents for coding style guides in other languages will be prepared.
Contents
-
-
A mutable object MUST NOT be used as a keyword argument default
Context managers (
with
) SHOULD be used for resource allocationAvoid
dict.keys()
when iterating over keys or checking membershipIterators and generators SHOULD be used to iterate over large data sets efficiently
Python exceptions SHOULD be raised and checked in Python code
3.4.1. Introduction¶
3.4.1.1. Stringency Levels¶
In this style guide we use RFC-2119-style vocabulary to rank the importance of conforming to a specific recommendation.
- REQUIRED
The Rule is an absolute requirement of the specification. Explicit approval to contravene the Rule must be sought by the developer before proceeding and the exception must be noted in code documentation and unit tests.
- MUST and SHALL
Mean that there may exist valid reasons in particular circumstances to ignore a particular Rule, but the full implications must be understood and carefully weighed before choosing a different course. Explicit approval to contravene the Rule must be sought by the developer before proceeding and the exception must be noted in code documentation and unit tests.
- SHOULD, RECOMMENDED and MAY
There are valid reasons in particular circumstances to ignore a particular Rule. The developer is expected to personally consider the full implications before choosing a different course.
- PROHIBITED
The opposite of REQUIRED.
- MUST NOT and SHALL NOT
The opposites of MUST and SHALL.
- SHOULD NOT, NOT RECOMMENDED and MAY NOT
The opposites of SHOULD, RECOMMENDED and MAY.
3.4.2. 0. Python Version¶
3.4.2.1. All Data Lab Python client-side code MUST work with Python 3¶
All Python client-side code (application interfaces and commandline tools) written by Data Lab MUST be runnable using Python 3 (Py3). Where possible, support for Python 2 (Py2) SHOULD be provided, otherwise features that are Py3-only should be clearly documented and the code structured so as not to cripple an entire interface or application (e.g. isolate import statements or methods to separate files or with tests of sys.version_info.major).
The current baseline versions for client code are Python 3.6 and Python 2.7.
3.4.2.2. Data Lab Python server-side code SHOULD work with Python 3¶
All new service code MUST be written to be runnable under Python 3 and avoid these Py2 dependency packages. Where possible, support for Python 2 SHOULD be provided but is not required if the service can be deployed using Python 3.
The current baseline versions for service code are Python 3.6 and Python 2.7.
3.4.3. 1. PEP 8 is the Baseline Coding Style¶
Data Lab’s Python Coding Style is based on the PEP 8 Style Guide for Python Code with modifications and exceptions specified in this document.
PEP 8 is used throughout the Python community and should feel familiar to Python developers. Deviations from PEP 8 style are intended to permit flexibility in personal coding styles so not all of the PEP 8 guidlines are enforced. Additional guidelines are included in this document to address specific requirements of code as needed.
3.4.3.1. Exceptions to PEP 8¶
The following table summarizes all PEP 8 guidelines that are not followed by this style guide. These exceptions are organized by error codes that may be ignored by the flake8 linter (see Code MAY be validated with flake8).
- E133
Closing bracket is missing indentation. This pycodestyle error (via flake8) is not part of PEP 8.
- E211
Whitespace before ‘(‘. See Binary operators SHOULD be surrounded by a single space except for [*, /, **, //, %].
- E221
Multiple whitespaces before operator. See Binary operators SHOULD be surrounded by a single space except for [*, /, **, //, %].
- E223
Tab before operator. See Binary operators SHOULD be surrounded by a single space except for [*, /, **, //, %].
- E226
Missing whitespace around arithmetic operator. See Binary operators SHOULD be surrounded by a single space except for [*, /, **, //, %].
- E228
Missing whitespace around bitwise or shift operator. See Binary operators SHOULD be surrounded by a single space except for [*, /, **, //, %].
- Maximum line length
Line length MUST be less than or equal to 79 columns.
The style checker in pycodestyle
also provides warnings that can be used
to request a specific style that is ambiguous in PEP 8.
These codes should be ignored to choose the Data Lab preferred style:
- W504
Line break after binary operator. Disabling this enables W503 that checks that line breaks do not occur before binary operators.
Additionally, packages should disable the following rules:
- N802
Function name should be lowercase.
- N803
Argument name should be lowercase.
- N806
Variable in function should be lowercase.
3.4.3.2. Code MAY be validated with flake8¶
The flake8 tool may be used to validate Python source code against the portion of PEP 8 adopted by Data Lab. Additionally, flake8 statically checks Python for code errors. The separate pep8-naming plugin validates names according to this style guide.
Note
Flake8 only validates code against PEP 8 specifications. This style guide includes additional guidelines that are not automatically linted.
3.4.3.2.1. Flake8 installation¶
Linters are installable with pip:
pip install flake8
pip install pep8-naming
3.4.3.2.2. Flake8 command line invocation¶
flake8 --ignore=E133,E211,E221,E223,E226,E228 .
This command lints all Python files in the current directory.
Alternatively, individual files can be specified in place of .
.
The ignored error codes are explained above. N802, N803, and N806 can be added to this list for some packages.
3.4.3.2.3. Flake8 configuration files¶
flake8 can be invoked without arguments when a configuration file
is present. This configuration, included in a setup.cfg
file at the
root of code repositories, is consistent with the style guide:
[flake8]
max-line-length = 79
ignore = E133, E211, E221, E223, E226, E228, N802, N803, N806, W504
exclude =
bin,
doc,
lib,
tests,
**/*/__init__.py,
**/*/__version__.py,
**/*/version.py
The exclude
field lists paths that are not usefully linted by
flake8 in Data Lab code repositories. Auto-generated Python should
not be linted. We also discourage linting __init__.py
modules due
to the abundance of PEP 8 exceptions typically involved.
3.4.3.3. Lines that intentionally deviate from DM’s PEP 8 MUST include a noqa
comment¶
Lines of code may intentionally deviate from our application of PEP 8 because
of limitations in flake8. In such cases, authors must append a # noqa
comment to the line that includes the specific error code being ignored.
See the flake8 documentation for details .
This prevents the line from triggering false flake8 warnings to other
developers, while also linting unexpected errors.
For example, to import a module without using it (to build a namespace, as in
a __init__.py
):
from .module import AClass # noqa: F401
3.4.3.4. autopep8 MAY be used to fix PEP 8 compliance¶
Many PEP 8 issues in existing code can be fixed with autopep8 version 1.2 or newer:
autopep8 . --in-place --recursive \
--ignore E133,E211,E221,E223,E226,E228,N802,N803,N806,W504
The .
specifies the current directory.
Together with --recursive
, the full tree of Python files will be processed
by autopep8. Alternatively, a single file can be specified in
place of .
.
autopep8ʼs changes must always be validated before committing.
Style changes SHOULD be encapsulated in a distinct commit.
Note
autopep8 only fixes PEP 8 issues and does not address other guidelines listed here.
3.4.4. 2. Layout¶
See also
Guidelines for the layout of docstrings.
3.4.4.1. Service Configuration Information MUST be stored externally¶
Service code that requires configuration information MUST use an external file to store any data specific to the deployment of the service. This is intended to allow code to be distributed from public repositories without releasing local configuration details.
A default configuration (or path to a configuration file) SHOULD be specified in the source file to prevent errors resulting from undefined values, however these MUST be overridden by configuration data read from the external file when the code is deployed if the default parameters are insufficient to define a fully working service. Service code MAY allow alternate configuration files to be specified at runtime, e.g. such as when used in a development environment.
3.4.4.2. Client-side Configuration Information¶
Client-side code requiring configuration information MUST have the configuration parameters set automatically by the package installer. User-configurable data (e.g. Preferences) SHALL be allowed provided an adequate description of each option is included with the software documentation.
3.4.4.3. Packages SHOULD import only what is necessary¶
In order to minimize startup times, Data Lab code SHOULD import only the necessary modules from 3rd-party packages. For example,
Yes:
from bigpackage.utils import convert
No:
import bigpackage
3.4.4.4. Exportable Python packages MUST contain a standard license file¶
A Data Lab copyright and license file MUST be include with every code
repository or exportable package. License information should be in a file
called LICENSE
, if separate copyright information is required it should
be in a file called COPYRIGHT
.
The text of the default Data Lab license is:
……``TBD``
3.4.4.5. Docstring and comment line length MUST be less than or equal to 79 columns¶
Limit all docstring and comment lines to a maximum of 79 characters.
This differs from the PEP 8 recommendation of 72 characters and the numpydoc recommendation of 75 characters but maintains readability and compatibility with default terminal widths while providing the maximum space.
3.4.5. 3. Whitespace¶
Follow the PEP 8 whitespace style guidelines, with the following adjustments.
3.4.5.1. The minimum number of parentheses needed for correctness and readability SHOULD be used¶
Yes:
a = b(self.config.nSigmaToGrow*sigma + 0.5)
Less readable:
a = b((self.config.nSigmaToGrow*sigma) + 0.5)
3.4.5.2. Binary operators SHOULD be surrounded by a single space except for [*
, /
, **
, //
, %
]¶
Always surround these binary operators with a single space on either side; this helps the user see where one token ends and another begins:
assignment (
=
),augmented assignment (
+=
,-=
, etc.),comparisons (
==
,<
,>
,!=
,<>
,<=
,>=
,in
,not in
,is
,is not
),Booleans (
and
,or
,not
).
Use spaces around these arithmetic operators when they improve readability:
addition (
+
),subtraction (
-
)multiplication (
*
),division (
/
),
Never surround these binary arithmetic operators with whitespace:
exponentiation (
**
),floor division (
//
),modulus (
%
). Note that a single space must always surround%
when used for string formatting.
For example:
i = i + 1
submitted += 1
x = x*2 - 1
hypot2 = x*x + y*y
c = (a + b)*(a - b)
print('Hello %s' % 'world!')
This deviates from PEP 8, which allows whitespace around these arithmetic operators if they appear alone. Error codes: E226 and E228.
3.4.7. 5. Documentation Strings (docstrings)¶
Use Numpydoc to format the content of all docstrings. The page Documenting Python API with Docstrings authoritatively describes this format. Its guidelines should be treated as an extension of this Python Style Guide.
See also
The ReStructured Text Style Guide—and the RestructuredText Formatting Conventions section in particular—provide guidelines on reStructuredText in general.
3.4.7.1. Docstrings SHOULD be written for all public modules, functions, classes, and methods¶
Write docstrings for all public modules, functions, classes, and methods. See Documenting Python API with Docstrings.
Docstrings are not necessary for non-public methods, but you should have a comment that describes what the method does.
This comment should appear after the def
line.
3.4.8. 6. Naming Conventions¶
We follow PEP 8ʼs naming conventions, with exceptions listed here.
All Data Lab Python source code is consistent with PEP 8 naming in the following ways:
class names are
CamelCase
with leading uppercase,method names are
camelCase
with leading lowercase,module variables used as module global constants are
UPPERCASE_WITH_UNDERSCORES
,
Packages developed prior to the this document may not fully adhere to PEP 8. As part of the code-review process and normal development, PEP 8 standards will be introduced with future releases.
Naming consistency within a package MUST be maintained. Within these stated constraints new packages SHOULD use PEP 8 naming conventions.
Names MAY be decorated with leading and/or trailing underscores.
Class Attribute Names SHOULD be camelCase with leading lowercase (Error code: N803).
Module methods (free functions) SHOULD be camelCase with leading lowercase (Error code: N802)
Compound variable names SHOULD be camelCase with leading lowercase (Error code: N806).
3.4.8.1. Modules which contain class definitions SHOULD be named after the class name¶
Modules which contain class definitions should be named after the class name (one module per class).
3.4.8.2. User-defined names SHOULD NOT shadow python built-in functions¶
Names which shadow a python built-in function may cause confusion for readers
of the code. Creating a more specific identifier is suggested to avoid
collisions. For example, in the case of filter, filter_name
may be
appropriate; for filter objects, something like filter_obj
might be
appropriate. In other cases, a leading or trailing underscore on the
identifier may be sufficient to avoid confusion.
3.4.8.3. Names l (lowercase: el), O (uppercase: oh), I (uppercase: eye) MUST be avoided¶
Never use these characters as single character variable names:
l
(lowercase letter el),O
(uppercase letter oh), orI
(uppercase letter eye).
In some fonts, these characters are indistinguishable from the numerals one and zero.
When tempted to use l
, use L
instead.
Note
This matches the PEP 8 standard but is repeated here for emphasis.
3.4.8.4. Always use cls
for the first argument to metaclass instance methods¶
For regular classes self
is used, but for class methods and hence also
for metaclass instance methods, cls
should be used instead.
Note
This is consistent with the naming conventions in PEP 8 as indicated explicitly by upstream.
3.4.9. 7. Source Files & Modules¶
3.4.9.1. A Python source file name SHOULD be camelCase-with-leading-lowercase and ending in ‘.py’¶
A module containing a single class should be a
camelCase
-with-leading-lowercase transliteration of the class’s name.
Test files MUST have the form test_{description}.py
for compatibility with
Pytest. The name of a test case should be descriptive without the need for
a trailing numeral to distinguish one test case from another.
3.4.9.2. File Header Comments¶
Each Python source file MUST include comments or a docstring at the top of the file summarizing the contents or purpose of the file. If the file contains a service or programmatic interface of some kind, a one-line description of each interface/service method contained in the file SHOULD be included as a summary of the interface or service. Docstring comments for each service/interface method are REQUIRED to contain a description of the method sufficient to auto-generate user documentation. Docstrings should describe the purpose of the method, each of the arguments and provide an example usage.
3.4.9.3. ASCII Encoding MUST be used for new code¶
Always use ASCII for new Python code.
Do not include a coding comment (as described in PEP 263) for ASCII files.
Existing code using Latin-1 encoding (a.k.a. ISO-8859-1) is acceptable so long as it has a proper coding comment. All other code must be converted to ASCII or Latin-1 except for 3rd party packages used “as is.”
3.4.9.4. Standard code order SHOULD be followed¶
Within a module, follow the order:
Shebang line,
#! /usr/bin/env python
(only for executable scripts)Module-level comments (such as the license statement)
Module-level docstring
from __future__ import ....
statement, if present__authors__ = ['First Last <email>, ....]
statement, if present__version__ = [yyyymmdd]
statement, if present__all__ = [...]
statement, if present (usually just __init__.py file)Imports
Private module variables (names start with underscore)
Private module functions and classes (names start with underscore)
Public module variables
Public functions and classes
3.4.10. 8. Classes¶
See also
Designing for Inheritance in PEP 8 describes naming conventions related to public and private class APIs.
3.4.10.1. super
MAY be used to call parent class methods¶
If you are overriding a method from a parent class, use super()
to
call the parent class’s method. For example:
class B(object):
def method(self, arg):
self.foo = arg
class C(B):
def method(self, arg):
super().method(arg)
do_something()
C().method(arg)
Using super()
ensures a consistent Method Resolution Order, and
prevents inherited methods from being called multiple times. In Python 3,
super()
does not require naming the class that it is part of,
making its use simpler and removing a maintenance issue.
See also
PEP 3135 discusses the use of super() and should be consulted when writing code supporting both Python 2 and 3.
3.4.10.1.1. super() and Multiple Inheritance¶
In the presence of multiple inheritance (two or more parents, e.g.
class C(A, B)
), the trickiest issue with the use of super()
is
that the class author generally doesn’t know a priori which overridden method
will be called in what order. In particular, this means that the calling
signature (arguments) for all versions of a method must be compatible. As
a result, there are a few argument-related caveats about the use of
super()
in multiple inheritance hierarchies:
Only pass
super()
the exact arguments you received.When you use it on methods whose acceptable arguments can be altered on a subclass via addition of more optional arguments, always accept
*args
,**kwargs
, and callsuper()
likesuper().currentmethod(arg1, arg2, ..., *args, **kwargs)
. If you don’t do this, document that addition of optional arguments in subclasses is forbidden.Do not use positional arguments in
__init__
or__new__
. Instead, use keyword args in the declarations, always call them using keywords, and always pass all keywords on, e.g.super().__init__(**kwargs)
.
To use super()
with multiple inheritance, all base classes in
Python’s Method Resolution Order need to use super()
; otherwise
the calling chain gets interrupted. If your class may be used in multiple
inheritance, ensure that all relevant classes use super()
including documenting requirements for subclasses.
For more details, see the super() documentation
, the astropy coding guide, and this article from Raymond Hettinger.
3.4.11. 9. Comparisons¶
3.4.11.1. is
and is not
SHOULD only be used for determining if two variables point to same object¶
Use is
or is not
only for the case that you need to know that two variables point to the exact same object.
To test for equality in value, use ==
or !=
instead.
3.4.11.2. is
and is not
SHOULD be used when comparing to None
¶
There are two reasons:
is None
works with NumPy arrays, whereas== None
does not;is None
is idiomatic.
This is also consistent with PEP 8, which states:
Comparisons to singletons like
None
should always be done withis
oris not
, never the equality operators.
For sequences, (str
, list
, tuple
), use the fact that empty
sequences are False
.
Yes:
if not seq:
pass
if seq:
pass
No:
if len(seq):
pass
if not len(seq):
pass
3.4.12. 10. Idiomatic Python¶
Strive to write idiomatic Python. Writing Python with accepted patterns makes your code easier for others to understand and often prevents bugs.
Fluent Python by Luciano Ramalho is an excellent guide to writing idiomatic Python.
Idiomatic Python also reduces technical debt. For more information see the online book Supporting Python 3 by Lennart Regebro.
3.4.12.1. A mutable object MUST NOT be used as a keyword argument default¶
Never use a mutable object as default value for a keyword argument in a function or method.
When a mutable is used as a default keyword argument, the default can change from one call to another leading to unexpected behavior. This issue can be avoided by only using immutable types as defaults.
For example, rather than provide an empty list as a default:
def proclist(alist=[]):
pass
this function should create a new list in its internal scope:
def proclist(alist=None):
if alist is None:
alist = []
3.4.12.2. Context managers (with
) SHOULD be used for resource allocation¶
Use the with
statement to simplify resource allocation.
For example, to be sure a file will be closed when you are done with it:
with open('/data/foo.dat', 'r') as f:
for line in f:
pass
3.4.12.3. Avoid dict.keys()
when iterating over keys or checking membership¶
For iterating over keys, iterate over the dictionary itself, e.g.:
for x in mydict:
pass
To test for inclusion use in
:
if key in myDict:
pass
This is preferred over keys()
. Use keys()
when
storing the keys for later access:
k = list(mydict.keys())
where list
ensures that a view or iterator is not being retained.
Note
When writing code that supports both Python 2 and Python 3, note that under Python 2 the keys()
return a list type, where under Python 3 it returns a dict_keys type. Both types allow iteration over the key set, however under Python 3 the dict_keys type will not allow you to take an index.
3.4.12.4. The subprocess
module SHOULD be used to spawn processes¶
Use the subprocess
module to spawn processes.
3.4.12.5. lambda
SHOULD NOT be used¶
Avoid the use of lambda.
You can almost always write clearer code by using a named function or using the functools
module to wrap a function.
3.4.12.6. The set
type SHOULD be used for unordered collections¶
Use the set
type for unordered collections of objects.
3.4.12.7. Iterators and generators SHOULD be used to iterate over large data sets efficiently¶
Use iterators, generators (classes that act like iterators) and generator expressions (expressions that act like iterators) to iterate over large data sets efficiently.
3.4.12.8. if False:
and if True:
SHOULD NOT be used¶
Code must not be placed inside if False:
or if True:
blocks, nor
left commented out. Instead, debugging code and alternative implementations
must be placed inside a “named” if
statement.
Such blocks should have a comment describing why they are disabled.
They may have a comment describing the conditions under which said code can
be removed (like the completion of a ticket or a particular date).
For example, for code that will likely be removed in the future, once testing
is completed:
# Delete old_thing() and the below "if" statement once all unittests
# are finished (Jira-1234).
use_old_method = False
if use_old_method:
old_thing()
else:
new_thing()
It is often beneficial to lift such debugging flags into the method’s keyword arguments to allow users to decide which branch to run. For example:
def foo(x, debug_plots=False):
do_thing()
if debug_plots:
plot_thing()
3.4.12.9. Python exceptions SHOULD be raised and checked in Python code¶
When raising an exception in Python code, consideration should be given to defining a module-specific exception for increased precision. Such an exception SHOULD inherit from an appropriate standard Python exception. If a module-specific exception is not used, then the appropriate standard Python exception SHOULD be raised.
When writing an except
clause, the exception type caught SHOULD be, in
order of preference, a module-specific or standard Python exception.
3.4.6. 4. Comments¶
Source code comments should follow PEP 8’s recommendations with the following additional requirements.
3.4.6.1. Comments MUST always remain up-to-date with code changes¶
Comment blocks should be updated when code is modified. Code reviews should include consistency checks of the comments.
3.4.6.2. Block comments SHOULD reference the code following them and SHOULD be indented to the same level¶
Block comments generally apply to some (or all) code that follows them, and are indented to the same level as that code. Each line of a block comment starts with a
#
and a single space (unless it is indented text inside the comment).Paragraphs inside a block comment are separated by a line containing a single
#
.3.4.6.3. To-do / FIXME comments SHOULD include a Jira issue key¶
If the commented code is a workaround for a known issue, this rule makes it easier to find and remove the workaround once the issue has been resolved. If the commented code itself is the problem, this rule ensures the issue will be reported on Jira, making it more likely to be fixed in a timely manner. For example,
# TODO: workaround for DL-6789
# TODO: DL-12345 is triggered by this line