This guide documents coding and style conventions for contributing to FragileTech Python projects.
- Contributing to existing external projects - do not change the style.
- Follow the Python Style guide.
- Use PEP8 with 99 char line length limit.
- Class methods order:
__init__
- Attributes
- Static methods
- Class methods
- Public methods
- Protected (
_
) methods - Private (
__
) methods
- Use double quotes
"
. When a string contains single or double quote characters, however, use the other one to avoid backslashes in the string. - Favor f-strings when printing variables inside a string.
- Do not use single letter argument names; use X and Y only in Scikit-learn context.
- Use Google style for docstrings.
- Format of TODO and FIXME:
# TODO(mygithubuser): blah-blah-blah
. - Add Type hinting when possible.
- Use standard argparse for CLI interactions. Click is also allowed when it improves the readability and maintainability of the code.
-
Each function should do only one thing. If you find yourself writing a long function that does a lot of stuff, consider splitting it into different functions.
-
Give variables a meaningful name. If names became too long, use abbreviations. This abbreviations should be explained in comments when defining the variable for the first time.
-
Keep in mind that coding is creating abstractions that hide complexity. This means that you should be able to get an idea of what a function does just by reading its documentation.
-
Avoid meaningless comments. Assume the person who is reading your code already know how to code in python, and take advantage of the syntax of the language to avoid using comments. For example, a comment is welcome when it can save you reading several lines of code that do stuff which is difficult to understand.
-
Document the functions, and make sure that it is easy to understand what all the parameters are. When working with tensors and vectors, specify its dimensions when they are not obvious.
-
Follow the Zen of Python, it is your best friend.
-
A well documented function lets you know what it does and how to use it without having to take a look at its code. Document all the functions! It is a pain in the ass but it pays off.
- We use
black
for formatting the code. Runblack .
before committing to automatically format the code in a consistent way.
Although using blank lines to separate code blocks may seem like a good idea, it has the following drawbacks:
- It does not offer any information regarding how and why you are defining different blocks of code.
- It makes code reviews more difficult:
- It forces the reviewer to make assumptions about why you decided to create the different blocks.
- It removes context when showing possible suggestions about changes in the code.
- Sparse code makes adds unnecessary scrolling time when reading the code.
- Sparse code makes the code diffs less reliable.
If you want to separate different code blocks inside the same function there are better alternatives:
-
Write a comment explaining what the code block you are defining with a blank line matters:
- It helps the reviewer understand why you are separating different code blocks.
- If the comment is meaningless you'll realize that it was an unnecessary line break.
For example, imagine you are defining a fancy neural network as a
pytorch.nn.Module
:
super().__init__() self.device = device self.layer_1 = torch.nn.Linear(in_dim, out_dim) self.layer_2 = torch.nn.Linear(out_dim, out_dim) self.layer_3 = torch.nn.Linear(in_dim, out_dim) self.layer_4 = torch.nn.Linear(out_dim, in_dim)
To understand if the blank lines you wrote to separate different code blocks are useful, you could write a comment to separate the different blocks:
super().__init__() # Device definition self.device = device # Encoder layers of my fancy DNN model self.layer_1 = torch.nn.Linear(in_dim, out_dim) self.layer_2 = torch.nn.Linear(out_dim, out_dim) # Decoder layers of my fancy DNN model self.layer_3 = torch.nn.Linear(in_dim, out_dim) self.layer_4 = torch.nn.Linear(out_dim, out_dim)
When you do that you will realize that you almost spent more time reading the comments than the code blocks that they separate, and that the "device" comment you wrote is extremely obvious for anyone that is remotely familiar with
pytorch
. In that case, deleting taht blank line improves the readability of your code.The comments about the blocks of your fancy neural network are indeed adding some useful information, but there may be a better alternative. Those comments are useful because they give information about what each layer is doing, but this is something that could be improved by finding meaningful names to the defined layers.
super().__init__() self.device = device self.encoder_in = torch.nn.Linear(in_dim, out_dim) self.encoder_out = torch.nn.Linear(out_dim, out_dim) self.decoder_in = torch.nn.Linear(in_dim, out_dim) self.decoder_out = torch.nn.Linear(out_dim, out_dim)
Now that the names are meaningful and there are no comments, we find that the different line length between
self.device
andself.encoder_in = torch.nn.Linear(in_dim, out_dim)
makes it easier to differentiate those two blocks. On the other handencoder
anddecoder
have the same length, and that makes it difficult to spot quickly when the definition of the decoder starts. Adding a comment between the two of them will improve the readability of the code:super().__init__() self.device = device self.encoder_in = torch.nn.Linear(in_dim, out_dim) self.encoder_out = torch.nn.Linear(out_dim, out_dim) # Decoder layers of my fancy DNN model self.decoder_in = torch.nn.Linear(in_dim, out_dim) self.decoder_out = torch.nn.Linear(out_dim, out_dim)
Keeping the comment about de encoder is not worth it in this case because its commenting a block of only two lines of code, but if the encoder had more layers it could be useful to make the code more readable. Inside the
__init__
function, try to not add more than 1 comment per 5 lines of code. -
Write a new function to encapsulate the new code block:
- It will reduce the cognitive load of the reviewer by abstracting the complexity of the code.
- The function call will remove the need of additional comments.
- It will make testing easier.
-
Use reserved words in variable definitions and function calls to separate code blocks: Take advantage of the different color highlighting that reserved words have to make an implicit separation of blocks. For example, instead of:
my_var = do_one_thing(param_1=val_1, param_2=val_2, param_3=value_3) another_var = do_another_thing(my_var) return another_var
You could do:
my_var = my_object.do_one_thing(param_1=val_1, param_2=val_2, param_3=value_3) return do_another_thing(my_var)
- Using PyCharm as an IDE will help you highlight the most common mistakes amd help you enforce PEP8.
Reading well-written Python code is also a way to improve your skills. Please avoid copying anything that has been written by a researcher; it will likely be a compendium of bad practices. Instead, take a look at any of the following projects:
When it comes to Reinforcement Learning, please avoid at any cost using OpenAI baselines as an example.
-
The Zen of Python in examples:
-
Idiomatic Python:
- Blog post about Idiomatic Python
- More Python Idioms