Added Documentation (#115)

* added logo * added initial citation.cff * updated readme * set upper boundaries on requirements and added docs req * added icons * added faq * bumped version * added changelog * added api documentation * added installation docs * added documentation setup * added index page * updated citation with authors * fixed about * added tqdm to requirements * fixed setup * removed missing imports * debugged documentation * added documentation workflow * added edit button * updated precommit hooks * ran pre-comit hooks * added flake8 Co-authored-by: KennethEnevoldsen <[email protected]> Co-authored-by: Martin Bernstorff <[email protected]>
Aarhus-Psychiatry-Research · Jun 22, 2022 · 5bf2fad · 5bf2fad · github-actions · Jun 22, 2022
1 parent 7bf532a
commit 5bf2fad
Show file tree

Hide file tree

Showing 46 changed files with 1,458 additions and 371 deletions.
diff --git a/.github/workflows/automatic_semantic_pr.yml b/.github/workflows/automatic_semantic_pr.yml
diff --git a/.github/workflows/documentation.yml b/.github/workflows/documentation.yml
@@ -0,0 +1,29 @@
+
+name: Documentation
+on:
+  push:
+    branches:
+    - master
+jobs:
+  build:
+    runs-on: ubuntu-latest
+    steps:
+    - name: Checkout
+      uses: actions/checkout@v3
+      with:
+        fetch-depth: 0 # otherwise, you will failed to push refs to dest repo
+    - name: Install dependencies
+      run: |
+        python -m pip install --upgrade pip
+        pip install -r requirements.txt
+        pip install -e .
+    - name: Build and Commit
+      uses: sphinx-notes/pages@v2
+      with:
+        documentation_path: docs
+        install_requirements: "true"
+    - name: Push changes
+      uses: ad-m/github-push-action@v2
+      with:
+        github_token: ${{ secrets.SPHINX_DOCUMENTATION }}
+        branch: gh-pages
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -1,6 +1,37 @@
+default_stages: [commit, push]
+
 repos:
--   repo: https://github.com/psf/black
+  - repo: https://github.com/pycqa/isort
+    rev: 5.10.1
+    hooks:
+      - id: isort
+        name: isort (python)
+        args: ["--profile", "black", "--filter-files"]
+
+  - repo: https://github.com/asottile/add-trailing-comma
+    rev: v2.2.3
+    hooks:
+      - id: add-trailing-comma
+
+  - repo: https://github.com/asottile/pyupgrade
+    rev: v2.34.0
+    hooks:
+      - id: pyupgrade
+
+  - repo: https://github.com/myint/docformatter
+    rev: v1.3.1
+    hooks:
+      - id: docformatter
+        args: [--in-place]
+
+  - repo: https://github.com/psf/black
     rev: 22.3.0
     hooks:
-    - id: black
-      language_version: python3.8
+      - id: black
+        language_version: python3.8
+
+  - repo: https://github.com/PyCQA/flake8
+    rev: 4.0.1
+    hooks:
+      - id: flake8
+        args: [--config, .flake8]
diff --git a/README.md b/README.md
@@ -1,5 +1,12 @@
+<a href="https://github.com/Aarhus-Psychiatry-Research/psycop-ml-utils"><img src="https://github.com/Aarhus-Psychiatry-Research/psycop-ml-utils/blob/main/docs/icon.png?raw=true" width="200" align="right" /></a>
+# PSYCOP Machine Learning Utilities
+
 ![python versions](https://img.shields.io/badge/Python-%3E=3.7-blue)
 [![Code style: black](https://img.shields.io/badge/Code%20Style-Black-black)](https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html)
+[![github actions pytest](https://github.com/Aarhus-Psychiatry-Research/psycop-ml-utils/actions/workflows/pytest.yml/badge.svg)](https://github.com/Aarhus-Psychiatry-Research/psycop-ml-utils/actions)
+[![github actions docs](https://github.com/Aarhus-Psychiatry-Research/psycop-ml-utils/actions/workflows/documentation.yml/badge.svg)](https://Aarhus-Psychiatry-Research.github.io/psycop-ml-utils/)
+![coverage](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/martbern/d6c40a5b5a3169c079e8b8f778b8e517/raw/badge-psycop-ml-utils-pytest-coverage.json)
+=======
 ![badge](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/martbern/d6c40a5b5a3169c079e8b8f778b8e517/raw/badge-psycop-ml-utils-pytest-coverage.json)
 
 # Installation
@@ -32,106 +39,43 @@ or
 # Usage
 - [ ] Update examples as API matures
 
-## Loading data from SQL
-
-Currently only contains one function to load a view from SQL, `sql_load`
 
-```py 
-from psycopmlutils.loaders.sql_load import sql_load
+## 🔧 Installation
+To get started using psycop-ml-utils simply install it using pip by running the following line in your terminal:
 
-view = "[FOR_SFI_fritekst_resultat_udfoert_i_psykiatrien_aendret_2011]"
-sql = "SELECT * FROM [fct]." + view
-df = sql_load(sql, chunksize = None)
 ```
-
-## Flattening time series
-To train baseline models (logistic regression, elastic net, SVM, XGBoost/random forest etc.), we need to represent the longitudinal data in a tabular, flattened way. 
-
-In essence, we need to generate a training example for each prediction time, where that example contains "latest_blood_pressure" (float), "X_diagnosis_within_n_hours" (boolean) etc.
-
-To generate this, I propose the time-series flattener class (`TimeSeriesFlattener`). It builds a dataset like described above.
-
-### TimeSeriesFlattener
-```python
-class FlattenedDataset:
-    def __init__():
-        """Class containing a time-series flattened.
-
-        Args:
-            prediction_times_df (DataFrame): Dataframe with prediction times.
-            prediction_timestamp_colname (str, optional): Colname for timestamps. Defaults to "timestamp".
-            id_colname (str, optional): Colname for patients ids. Defaults to "dw_ek_borger".
-        """
-
-    def add_outcome():
-        """Adds an outcome-column to the dataset
-
-        Args:
-            outcome_df (DataFrame): Cols: dw_ek_borger, datotid, value if relevant.
-            lookahead_days (float): How far ahead to look for an outcome in days. If none found, use fallback.
-            resolve_multiple (str): What to do with more than one value within the lookahead.
-                Suggestions: earliest, latest, mean, max, min.
-            fallback (List[str]): What to do if no value within the lookahead.
-                Suggestions: latest, mean_of_patient, mean_of_population, hardcode (qualified guess)
-            timestamp_colname (str): Column name for timestamps
-            values_colname (str): Colname for outcome values in outcome_df
-            id_colname (str): Column name for citizen id
-            new_col_name (str): Name to use for new col. Automatically generated as '{new_col_name}_within_{lookahead_days}_days'.
-                Defaults to using values_colname.
-        """
-
-    def add_predictor():
-        """Adds a predictor-column to the dataset
-
-        Args:
-            predictor_df (DataFrame): Cols: dw_ek_borger, datotid, value if relevant.
-            lookahead_days (float): How far ahead to look for an outcome in days. If none found, use fallback.
-            resolve_multiple (str): What to do with more than one value within the lookahead.
-                Suggestions: earliest, latest, mean, max, min.
-            fallback (List[str]): What to do if no value within the lookahead.
-                Suggestions: latest, mean_of_patient, mean_of_population, hardcode (qualified guess)
-            outcome_colname (str): What to name the column
-            id_colname (str): Column name for citizen id
-            timestamp_colname (str): Column name for timestamps
-        """
+pip install git+https://github.com/Aarhus-Psychiatry-Research/psycop-ml-utils.git
 ```
 
-Inspiration-code can be found in previous commits.
+For more detailed instructions on installation, see the [installation instructions](https://Aarhus-Psychiatry-Research.github.io/psycop-ml-utils/installation).
 
-#### Example
-- [ ] Update examples as API matures
 
-```python
-import FlattenedDataset
-
-dataset = FlattenedDataset(prediction_times_df = prediction_times, prediction_timestamp_colname = "timestamp", id_colname = "dw_ek_borger")
-
-dataset.add_outcome(
-    outcome_df=type_2_diabetes_df,
-    lookahead_days=730,
-    resolve_multiple="max",
-    fallback=[0],
-    name="t2d",
-)
-
-dataset.add_predictor(
-    predictor=hba1c,
-    lookback_window=365,
-    resolve_multiple="max",
-    fallback=["latest", 40],
-    name="hba1c",
-)
-```
+## 📖 Documentation
+
+| Documentation              |                                                                             |
+| -------------------------- | --------------------------------------------------------------------------- |
+| 📚 **[Usage Guides]**       | Guides and instructions on how the package and its features.            |
+| 📰 **[News and changelog]** | New additions, changes and version history.                                 |
+| 🎛 **[API References]**     | The detailed reference for psycop-ml-utils's API. Including function documentation |
+| 🙋 **[FAQ]**                | Frequently asked question                                |
+
+[usage guides]: https://Aarhus-Psychiatry-Research.github.io/psycop-ml-utils/introduction.html
+[api references]: https://Aarhus-Psychiatry-Research.github.io/psycop-ml-utils/
+[Augmenters]: https://Aarhus-Psychiatry-Research.github.io/psycop-ml-utils/augmenters.html
+[Demo]: https://share.streamlit.io/Aarhus-Psychiatry-Research/psycop-ml-utils/dev/streamlit.py
+[News and changelog]: https://Aarhus-Psychiatry-Research.github.io/psycop-ml-utils/news.html
+[FAQ]: https://Aarhus-Psychiatry-Research.github.io/psycop-ml-utils/faq.html
 
-Dataset now looks like this:
+## 💬 Where to ask questions
 
-| dw_ek_borger | datetime_prediction | outc_t2d_within_next_730_days | pred_max_hba1c_within_prev_365_days |
-|--------------|---------------------|-------------------------------|-------------------------------------|
-| 1            | yyyy-mm-dd hh:mm:ss | 0                             | 48                                  |
-| 2            | yyyy-mm-dd hh:mm:ss | 0                             | 40                                  |
-| 3            | yyyy-mm-dd hh:mm:ss | 1                             | 44                                  |
+| Type                           |                        |
+| ------------------------------ | ---------------------- |
+| 🚨 **Bug Reports**              | [GitHub Issue Tracker] |
+| 🎁 **Feature Requests & Ideas** | [GitHub Issue Tracker] |
+| 👩‍💻 **Usage Questions**          | [GitHub Discussions]   |
+| 🗯 **General Discussion**       | [GitHub Discussions]   |
 
+[github issue tracker]: https://github.com/Aarhus-Psychiatry-Research/psycop-ml-utils/issues
+[github discussions]: https://github.com/Aarhus-Psychiatry-Research/psycop-ml-utils/discussions
 
-For binary outcomes, `add_predictor` with `fallback = [0]` would take a df with only the times where the event occurred, and then generate 0's for the rest. 
 
-I propose we create the above functionality on a just-in-time basis, building the features as we need them.
diff --git a/citation.cff b/citation.cff
@@ -0,0 +1,15 @@
+cff-version: 1.2.0
+message: "If you use this software, please cite it as below."
+authors:
+- family-names: "Martin"
+  given-names: "Bernstorff"
+- family-names: "Lasse"
+  given-names: "Hansen"
+- family-names: "Enevoldsen"
+  given-names: "Kenneth"
+  orcid: "https://orcid.org/0000-0001-8733-0966"
+title: "PSYCOP machine learning utilities"
+version: 0.1.1
+# doi: 10.5281/zenodo.6675315
+date-released: 2022-21-06
+url: "https://github.com/Aarhus-Psychiatry-Research/psycop-ml-utils"
diff --git a/docs/Makefile b/docs/Makefile
@@ -0,0 +1,20 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS    ?=
+SPHINXBUILD   ?= sphinx-build
+SOURCEDIR     = .
+BUILDDIR      = _build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
diff --git a/docs/_static/favicon.ico b/docs/_static/favicon.ico
diff --git a/docs/_static/icon.png b/docs/_static/icon.png
diff --git a/docs/_static/icon_with_title.png b/docs/_static/icon_with_title.png
diff --git a/docs/api.model_performance.rst b/docs/api.model_performance.rst
@@ -0,0 +1,21 @@
+Model Performance
+--------------------------------------------------
+
+
+model_performance.model_performance
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. automodule:: psycopmlutils.model_performance.model_performance
+   :members:
+   :undoc-members:
+   :show-inheritance:
+   :exclude-members:
+
+model_performance.utils
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. automodule:: psycopmlutils.model_performance.utils
+   :members:
+   :undoc-members:
+   :show-inheritance:
+   :exclude-members:
diff --git a/docs/api.timeseriesflattener.rst b/docs/api.timeseriesflattener.rst
@@ -0,0 +1,31 @@
+Time Series Flattener
+--------------------------------------------------
+
+
+timeseriesflattener.create_feature_combinations
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. automodule:: psycopmlutils.timeseriesflattener.create_feature_combinations
+   :members:
+   :undoc-members:
+   :show-inheritance:
+   :exclude-members:
+
+timeseriesflattener.flattened_dataset
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. automodule:: psycopmlutils.timeseriesflattener.flattened_dataset
+   :members:
+   :undoc-members:
+   :show-inheritance:
+   :exclude-members:
+
+
+timeseriesflattener.resolve_multiple_functions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. automodule:: psycopmlutils.timeseriesflattener.resolve_multiple_functions
+   :members:
+   :undoc-members:
+   :show-inheritance:
+   :exclude-members:
diff --git a/docs/api.writers.rst b/docs/api.writers.rst
@@ -0,0 +1,12 @@
+Writers
+--------------------------------------------------
+
+
+writers.sql_writer
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. automodule:: psycopmlutils.writers.sql_writer
+   :members:
+   :undoc-members:
+   :show-inheritance:
+   :exclude-members:
diff --git a/docs/changelog.md b/docs/changelog.md
@@ -0,0 +1,4 @@
+# News and Changelog
+
+- v. 0.1.1 (21 June 2022)
+  - Documentation was added
File	Stmts	Miss	Cover	Missing
src
init.py	0	0	100%
src/psycopmlutils
init.py	1	0	100%
utils.py	2	0	100%
src/psycopmlutils/loaders
init.py	8	8	0%	1–8
load_demographics.py	21	21	0%	1–39
load_diagnoses.py	47	47	0%	1–206
load_ids.py	8	8	0%	1–21
load_lab_results.py	67	67	0%	1–182
load_medications.py	33	33	0%	1–149
load_outcomes.py	21	21	0%	1–40
load_visits.py	10	10	0%	1–18
sql_load.py	17	17	0%	1–67
src/psycopmlutils/model_performance
init.py	1	0	100%
model_performance.py	85	6	93%	128, 351–394
utils.py	52	2	96%	134, 139
src/psycopmlutils/timeseriesflattener
init.py	2	0	100%
create_feature_combinations.py	29	0	100%
flattened_dataset.py	178	19	89%	72–77, 144–149, 246–247, 250–251, 268, 536, 539, 543, 602, 681, 725
resolve_multiple_functions.py	24	0	100%
src/psycopmlutils/writers
init.py	0	0	100%
sql_writer.py	22	22	0%	1–83
TOTAL	628	281	55%