Skip to content

Commit

Permalink
Added Documentation (#115)
Browse files Browse the repository at this point in the history
* added logo

* added initial citation.cff

* updated readme

* set upper boundaries on requirements and added docs req

* added icons

* added faq

* bumped version

* added changelog

* added api documentation

* added installation docs

* added documentation setup

* added index page

* updated citation with authors

* fixed about

* added tqdm to requirements

* fixed setup

* removed missing imports

* debugged documentation

* added documentation workflow

* added edit button

* updated precommit hooks

* ran pre-comit hooks

* added flake8

Co-authored-by: KennethEnevoldsen <[email protected]>
Co-authored-by: Martin Bernstorff <[email protected]>
  • Loading branch information
3 people authored Jun 22, 2022
1 parent 7bf532a commit 5bf2fad
Show file tree
Hide file tree
Showing 46 changed files with 1,458 additions and 371 deletions.
19 changes: 0 additions & 19 deletions .github/workflows/automatic_semantic_pr.yml

This file was deleted.

29 changes: 29 additions & 0 deletions .github/workflows/documentation.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@

name: Documentation
on:
push:
branches:
- master
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
with:
fetch-depth: 0 # otherwise, you will failed to push refs to dest repo
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install -e .
- name: Build and Commit
uses: sphinx-notes/pages@v2
with:
documentation_path: docs
install_requirements: "true"
- name: Push changes
uses: ad-m/github-push-action@v2
with:
github_token: ${{ secrets.SPHINX_DOCUMENTATION }}
branch: gh-pages
37 changes: 34 additions & 3 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,37 @@
default_stages: [commit, push]

repos:
- repo: https://github.com/psf/black
- repo: https://github.com/pycqa/isort
rev: 5.10.1
hooks:
- id: isort
name: isort (python)
args: ["--profile", "black", "--filter-files"]

- repo: https://github.com/asottile/add-trailing-comma
rev: v2.2.3
hooks:
- id: add-trailing-comma

- repo: https://github.com/asottile/pyupgrade
rev: v2.34.0
hooks:
- id: pyupgrade

- repo: https://github.com/myint/docformatter
rev: v1.3.1
hooks:
- id: docformatter
args: [--in-place]

- repo: https://github.com/psf/black
rev: 22.3.0
hooks:
- id: black
language_version: python3.8
- id: black
language_version: python3.8

- repo: https://github.com/PyCQA/flake8
rev: 4.0.1
hooks:
- id: flake8
args: [--config, .flake8]
126 changes: 35 additions & 91 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
<a href="https://github.com/Aarhus-Psychiatry-Research/psycop-ml-utils"><img src="https://github.com/Aarhus-Psychiatry-Research/psycop-ml-utils/blob/main/docs/icon.png?raw=true" width="200" align="right" /></a>
# PSYCOP Machine Learning Utilities

![python versions](https://img.shields.io/badge/Python-%3E=3.7-blue)
[![Code style: black](https://img.shields.io/badge/Code%20Style-Black-black)](https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html)
[![github actions pytest](https://github.com/Aarhus-Psychiatry-Research/psycop-ml-utils/actions/workflows/pytest.yml/badge.svg)](https://github.com/Aarhus-Psychiatry-Research/psycop-ml-utils/actions)
[![github actions docs](https://github.com/Aarhus-Psychiatry-Research/psycop-ml-utils/actions/workflows/documentation.yml/badge.svg)](https://Aarhus-Psychiatry-Research.github.io/psycop-ml-utils/)
![coverage](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/martbern/d6c40a5b5a3169c079e8b8f778b8e517/raw/badge-psycop-ml-utils-pytest-coverage.json)
=======
![badge](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/martbern/d6c40a5b5a3169c079e8b8f778b8e517/raw/badge-psycop-ml-utils-pytest-coverage.json)

# Installation
Expand Down Expand Up @@ -32,106 +39,43 @@ or
# Usage
- [ ] Update examples as API matures

## Loading data from SQL

Currently only contains one function to load a view from SQL, `sql_load`

```py
from psycopmlutils.loaders.sql_load import sql_load
## 🔧 Installation
To get started using psycop-ml-utils simply install it using pip by running the following line in your terminal:

view = "[FOR_SFI_fritekst_resultat_udfoert_i_psykiatrien_aendret_2011]"
sql = "SELECT * FROM [fct]." + view
df = sql_load(sql, chunksize = None)
```

## Flattening time series
To train baseline models (logistic regression, elastic net, SVM, XGBoost/random forest etc.), we need to represent the longitudinal data in a tabular, flattened way.

In essence, we need to generate a training example for each prediction time, where that example contains "latest_blood_pressure" (float), "X_diagnosis_within_n_hours" (boolean) etc.

To generate this, I propose the time-series flattener class (`TimeSeriesFlattener`). It builds a dataset like described above.

### TimeSeriesFlattener
```python
class FlattenedDataset:
def __init__():
"""Class containing a time-series flattened.
Args:
prediction_times_df (DataFrame): Dataframe with prediction times.
prediction_timestamp_colname (str, optional): Colname for timestamps. Defaults to "timestamp".
id_colname (str, optional): Colname for patients ids. Defaults to "dw_ek_borger".
"""

def add_outcome():
"""Adds an outcome-column to the dataset
Args:
outcome_df (DataFrame): Cols: dw_ek_borger, datotid, value if relevant.
lookahead_days (float): How far ahead to look for an outcome in days. If none found, use fallback.
resolve_multiple (str): What to do with more than one value within the lookahead.
Suggestions: earliest, latest, mean, max, min.
fallback (List[str]): What to do if no value within the lookahead.
Suggestions: latest, mean_of_patient, mean_of_population, hardcode (qualified guess)
timestamp_colname (str): Column name for timestamps
values_colname (str): Colname for outcome values in outcome_df
id_colname (str): Column name for citizen id
new_col_name (str): Name to use for new col. Automatically generated as '{new_col_name}_within_{lookahead_days}_days'.
Defaults to using values_colname.
"""

def add_predictor():
"""Adds a predictor-column to the dataset
Args:
predictor_df (DataFrame): Cols: dw_ek_borger, datotid, value if relevant.
lookahead_days (float): How far ahead to look for an outcome in days. If none found, use fallback.
resolve_multiple (str): What to do with more than one value within the lookahead.
Suggestions: earliest, latest, mean, max, min.
fallback (List[str]): What to do if no value within the lookahead.
Suggestions: latest, mean_of_patient, mean_of_population, hardcode (qualified guess)
outcome_colname (str): What to name the column
id_colname (str): Column name for citizen id
timestamp_colname (str): Column name for timestamps
"""
pip install git+https://github.com/Aarhus-Psychiatry-Research/psycop-ml-utils.git
```

Inspiration-code can be found in previous commits.
For more detailed instructions on installation, see the [installation instructions](https://Aarhus-Psychiatry-Research.github.io/psycop-ml-utils/installation).

#### Example
- [ ] Update examples as API matures

```python
import FlattenedDataset

dataset = FlattenedDataset(prediction_times_df = prediction_times, prediction_timestamp_colname = "timestamp", id_colname = "dw_ek_borger")

dataset.add_outcome(
outcome_df=type_2_diabetes_df,
lookahead_days=730,
resolve_multiple="max",
fallback=[0],
name="t2d",
)

dataset.add_predictor(
predictor=hba1c,
lookback_window=365,
resolve_multiple="max",
fallback=["latest", 40],
name="hba1c",
)
```
## 📖 Documentation

| Documentation | |
| -------------------------- | --------------------------------------------------------------------------- |
| 📚 **[Usage Guides]** | Guides and instructions on how the package and its features. |
| 📰 **[News and changelog]** | New additions, changes and version history. |
| 🎛 **[API References]** | The detailed reference for psycop-ml-utils's API. Including function documentation |
| 🙋 **[FAQ]** | Frequently asked question |

[usage guides]: https://Aarhus-Psychiatry-Research.github.io/psycop-ml-utils/introduction.html
[api references]: https://Aarhus-Psychiatry-Research.github.io/psycop-ml-utils/
[Augmenters]: https://Aarhus-Psychiatry-Research.github.io/psycop-ml-utils/augmenters.html
[Demo]: https://share.streamlit.io/Aarhus-Psychiatry-Research/psycop-ml-utils/dev/streamlit.py
[News and changelog]: https://Aarhus-Psychiatry-Research.github.io/psycop-ml-utils/news.html
[FAQ]: https://Aarhus-Psychiatry-Research.github.io/psycop-ml-utils/faq.html

Dataset now looks like this:
## 💬 Where to ask questions

| dw_ek_borger | datetime_prediction | outc_t2d_within_next_730_days | pred_max_hba1c_within_prev_365_days |
|--------------|---------------------|-------------------------------|-------------------------------------|
| 1 | yyyy-mm-dd hh:mm:ss | 0 | 48 |
| 2 | yyyy-mm-dd hh:mm:ss | 0 | 40 |
| 3 | yyyy-mm-dd hh:mm:ss | 1 | 44 |
| Type | |
| ------------------------------ | ---------------------- |
| 🚨 **Bug Reports** | [GitHub Issue Tracker] |
| 🎁 **Feature Requests & Ideas** | [GitHub Issue Tracker] |
| 👩‍💻 **Usage Questions** | [GitHub Discussions] |
| 🗯 **General Discussion** | [GitHub Discussions] |

[github issue tracker]: https://github.com/Aarhus-Psychiatry-Research/psycop-ml-utils/issues
[github discussions]: https://github.com/Aarhus-Psychiatry-Research/psycop-ml-utils/discussions

For binary outcomes, `add_predictor` with `fallback = [0]` would take a df with only the times where the event occurred, and then generate 0's for the rest.

I propose we create the above functionality on a just-in-time basis, building the features as we need them.
15 changes: 15 additions & 0 deletions citation.cff
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Martin"
given-names: "Bernstorff"
- family-names: "Lasse"
given-names: "Hansen"
- family-names: "Enevoldsen"
given-names: "Kenneth"
orcid: "https://orcid.org/0000-0001-8733-0966"
title: "PSYCOP machine learning utilities"
version: 0.1.1
# doi: 10.5281/zenodo.6675315
date-released: 2022-21-06
url: "https://github.com/Aarhus-Psychiatry-Research/psycop-ml-utils"
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
Binary file added docs/_static/favicon.ico
Binary file not shown.
Binary file added docs/_static/icon.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/icon_with_title.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
21 changes: 21 additions & 0 deletions docs/api.model_performance.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
Model Performance
--------------------------------------------------


model_performance.model_performance
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. automodule:: psycopmlutils.model_performance.model_performance
:members:
:undoc-members:
:show-inheritance:
:exclude-members:

model_performance.utils
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. automodule:: psycopmlutils.model_performance.utils
:members:
:undoc-members:
:show-inheritance:
:exclude-members:
31 changes: 31 additions & 0 deletions docs/api.timeseriesflattener.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
Time Series Flattener
--------------------------------------------------


timeseriesflattener.create_feature_combinations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. automodule:: psycopmlutils.timeseriesflattener.create_feature_combinations
:members:
:undoc-members:
:show-inheritance:
:exclude-members:

timeseriesflattener.flattened_dataset
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. automodule:: psycopmlutils.timeseriesflattener.flattened_dataset
:members:
:undoc-members:
:show-inheritance:
:exclude-members:


timeseriesflattener.resolve_multiple_functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. automodule:: psycopmlutils.timeseriesflattener.resolve_multiple_functions
:members:
:undoc-members:
:show-inheritance:
:exclude-members:
12 changes: 12 additions & 0 deletions docs/api.writers.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Writers
--------------------------------------------------


writers.sql_writer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. automodule:: psycopmlutils.writers.sql_writer
:members:
:undoc-members:
:show-inheritance:
:exclude-members:
4 changes: 4 additions & 0 deletions docs/changelog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# News and Changelog

- v. 0.1.1 (21 June 2022)
- Documentation was added
Loading

1 comment on commit 5bf2fad

@github-actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverage

Coverage Report
FileStmtsMissCoverMissing
src
   init.py00100% 
src/psycopmlutils
   init.py10100% 
   utils.py20100% 
src/psycopmlutils/loaders
   init.py880%1–8
   load_demographics.py21210%1–39
   load_diagnoses.py47470%1–206
   load_ids.py880%1–21
   load_lab_results.py67670%1–182
   load_medications.py33330%1–149
   load_outcomes.py21210%1–40
   load_visits.py10100%1–18
   sql_load.py17170%1–67
src/psycopmlutils/model_performance
   init.py10100% 
   model_performance.py85693%128, 351–394
   utils.py52296%134, 139
src/psycopmlutils/timeseriesflattener
   init.py20100% 
   create_feature_combinations.py290100% 
   flattened_dataset.py1781989%72–77, 144–149, 246–247, 250–251, 268, 536, 539, 543, 602, 681, 725
   resolve_multiple_functions.py240100% 
src/psycopmlutils/writers
   init.py00100% 
   sql_writer.py22220%1–83
TOTAL62828155% 

Please sign in to comment.