Skip to content

Commit

Permalink
Merge pull request #440 from yzhao062/development
Browse files Browse the repository at this point in the history
V1.0.5
  • Loading branch information
yzhao062 authored Sep 15, 2022
2 parents 0027221 + ab60b97 commit f6029d5
Show file tree
Hide file tree
Showing 95 changed files with 1,529 additions and 524 deletions.
4 changes: 3 additions & 1 deletion CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -167,4 +167,6 @@ v<1.0.3>, <07/04/2022> -- Add AnoGAN (#412).
v<1.0.4>, <07/29/2022> -- General improvement of code quality and test coverage.
v<1.0.4>, <07/29/2022> -- Add LUNAR (#413).
v<1.0.4>, <07/29/2022> -- Add LUNAR (#415).

v<1.0.5>, <07/29/2022> -- Import optimization.
v<1.0.5>, <08/27/2022> -- Code optimization.
v<1.0.5>, <09/14/2022> -- Add ALAD.
77 changes: 26 additions & 51 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -68,17 +68,17 @@ or `Anomaly Detection <https://en.wikipedia.org/wiki/Anomaly_detection>`_.

PyOD includes more than 40 detection algorithms, from classical LOF (SIGMOD 2000) to
the latest ECOD (TKDE 2022). Since 2017, PyOD has been successfully used in numerous academic researches and
commercial products [#Zhao2019LSCP]_ [#Zhao2021SUOD]_ with more than 7 million downloads.
commercial products with more than `8 million downloads <https://pepy.tech/project/pyod>`_.
It is also well acknowledged by the machine learning community with various dedicated posts/tutorials, including
`Analytics Vidhya <https://www.analyticsvidhya.com/blog/2019/02/outlier-detection-python-pyod/>`_,
`KDnuggets <https://www.kdnuggets.com/2019/02/outlier-detection-methods-cheat-sheet.html>`_, and
`Towards Data Science <https://towardsdatascience.com/anomaly-detection-for-dummies-15f148e559c1>`_.


PyOD is featured for:
**PyOD is featured for**:

* **Unified APIs, detailed documentation, and interactive examples** across various algorithms.
* **Advanced models**\ , including **classical ones by distance and density estimation**, **latest deep learning methods**, and **emerging algorithms like ECOD**.
* **Advanced models**\, including **classical distance and density estimation**, **latest deep learning methods**, and **emerging algorithms like ECOD**.
* **Optimized performance with JIT and parallelization** using `numba <https://github.com/numba/numba>`_ and `joblib <https://github.com/joblib/joblib>`_.
* **Fast training & prediction with SUOD** [#Zhao2021SUOD]_.

Expand All @@ -99,6 +99,13 @@ PyOD is featured for:
y_test_scores = clf.decision_function(X_test) # predict raw outlier scores on test
**Personal suggestion on selecting an OD algorithm**. If you do not know which algorithm to try, go with:

- `ECOD <https://github.com/yzhao062/pyod/blob/master/examples/ecod_example.py>`_: Example of using ECOD for outlier detection
- `Isolation Forest <https://github.com/yzhao062/pyod/blob/master/examples/iforest_example.py>`_: Example of using Isolation Forest for outlier detection

They are both fast and interpretable. Or, you could try more data-driven approach `MetaOD <https://github.com/yzhao062/MetaOD>`_.

**Citing PyOD**\ :

`PyOD paper <http://www.jmlr.org/papers/volume20/19-011/19-011.pdf>`_ is published in
Expand Down Expand Up @@ -126,7 +133,6 @@ or::


* `View the latest codes on Github <https://github.com/yzhao062/pyod>`_
* `Execute Interactive Jupyter Notebooks <https://mybinder.org/v2/gh/yzhao062/pyod/master>`_
* `Anomaly Detection Resources <https://github.com/yzhao062/anomaly-detection-resources>`_


Expand All @@ -139,7 +145,6 @@ or::
* `Model Save & Load <#model-save--load>`_
* `Fast Train with SUOD <#fast-train-with-suod>`_
* `Implemented Algorithms <#implemented-algorithms>`_
* `Old Algorithm Benchmark <#old-algorithm-benchmark>`_
* `Quick Start for Outlier Detection <#quick-start-for-outlier-detection>`_
* `How to Contribute <#how-to-contribute>`_
* `Inclusion Criteria <#inclusion-criteria>`_
Expand Down Expand Up @@ -243,6 +248,19 @@ The organization of **ADBench** is provided below:
:alt: benchmark-fig


**The comparison of selected models** is made available below
(\ `Figure <https://raw.githubusercontent.com/yzhao062/pyod/master/examples/ALL.png>`_\ ,
`compare_all_models.py <https://github.com/yzhao062/pyod/blob/master/examples/compare_all_models.py>`_\ ,
`Interactive Jupyter Notebooks <https://mybinder.org/v2/gh/yzhao062/pyod/master>`_\ ).
For Jupyter Notebooks, please navigate to **"/notebooks/Compare All Models.ipynb"**.


.. image:: https://raw.githubusercontent.com/yzhao062/pyod/master/examples/ALL.png
:target: https://raw.githubusercontent.com/yzhao062/pyod/master/examples/ALL.png
:alt: Comparision_of_All



----

Model Save & Load
Expand Down Expand Up @@ -353,6 +371,7 @@ Neural Networks SO_GAAL Single-Objective Generative Adversarial
Neural Networks MO_GAAL Multiple-Objective Generative Adversarial Active Learning 2019 [#Liu2019Generative]_
Neural Networks DeepSVDD Deep One-Class Classification 2018 [#Ruff2018Deep]_
Neural Networks AnoGAN Anomaly Detection with Generative Adversarial Networks 2017 [#Schlegl2017Unsupervised]_
Neural Networks ALAD Adversarially learned anomaly detection 2018 [#Zenati2018Adversarially]_
Graph-based R-Graph Outlier detection by R-graph 2017 [#You2017Provable]_
Graph-based LUNAR LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks 2022 [#Goodge2022Lunar]_
=================== ================== ====================================================================================================== ===== ========================================
Expand Down Expand Up @@ -393,52 +412,6 @@ Utility precision_n_scores calculate precision @ rank n

----


Old Algorithm Benchmark
^^^^^^^^^^^^^^^^^^^^^^^

In June 2022, we released a 36-page, the most comprehensive `anomaly detection benchmark paper <https://www.andrew.cmu.edu/user/yuezhao2/papers/22-preprint-adbench.pdf>`_.
The fully `open-sourced ADBench <https://github.com/Minqi824/ADBench>`_ compares 30 anomaly detection algorithms on 55 benchmark datasets.

The organization of **ADBench** is provided below:

.. image:: https://github.com/Minqi824/ADBench/blob/main/figs/ADBench.png?raw=true
:target: https://github.com/Minqi824/ADBench/blob/main/figs/ADBench.png?raw=true
:alt: benchmark-old

**The content below is obsolete**.

**The comparison among of implemented models** is made available below
(\ `Figure <https://raw.githubusercontent.com/yzhao062/pyod/master/examples/ALL.png>`_\ ,
`compare_all_models.py <https://github.com/yzhao062/pyod/blob/master/examples/compare_all_models.py>`_\ ,
`Interactive Jupyter Notebooks <https://mybinder.org/v2/gh/yzhao062/pyod/master>`_\ ).
For Jupyter Notebooks, please navigate to **"/notebooks/Compare All Models.ipynb"**.


.. image:: https://raw.githubusercontent.com/yzhao062/pyod/master/examples/ALL.png
:target: https://raw.githubusercontent.com/yzhao062/pyod/master/examples/ALL.png
:alt: Comparision_of_All

A benchmark is supplied for select algorithms to provide an overview of the implemented models.
In total, 17 benchmark datasets are used for comparison, which
can be downloaded at `ODDS <http://odds.cs.stonybrook.edu/#table1>`_.

For each dataset, it is first split into 60% for training and 40% for testing.
All experiments are repeated 10 times independently with random splits.
The mean of 10 trials is regarded as the final result. Three evaluation metrics
are provided:

- The area under receiver operating characteristic (ROC) curve
- Precision @ rank n (P@N)
- Execution time

Check the latest `benchmark <https://pyod.readthedocs.io/en/latest/benchmark.html>`_. You could replicate this process by running
`benchmark.py <https://github.com/yzhao062/pyod/blob/master/notebooks/benchmark.py>`_.


----


Quick Start for Outlier Detection
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down Expand Up @@ -641,6 +614,8 @@ Reference
.. [#You2017Provable] You, C., Robinson, D.P. and Vidal, R., 2017. Provable self-representation based outlier detection in a union of subspaces. In Proceedings of the IEEE conference on computer vision and pattern recognition.
.. [#Zenati2018Adversarially] Zenati, H., Romain, M., Foo, C.S., Lecouat, B. and Chandrasekhar, V., 2018, November. Adversarially learned anomaly detection. In 2018 IEEE International conference on data mining (ICDM) (pp. 727-736). IEEE.
.. [#Zhao2018XGBOD] Zhao, Y. and Hryniewicki, M.K. XGBOD: Improving Supervised Outlier Detection with Unsupervised Representation Learning. *IEEE International Joint Conference on Neural Networks*\ , 2018.
.. [#Zhao2019LSCP] Zhao, Y., Nasrullah, Z., Hryniewicki, M.K. and Li, Z., 2019, May. LSCP: Locally selective combination in parallel outlier ensembles. In *Proceedings of the 2019 SIAM International Conference on Data Mining (SDM)*, pp. 585-593. Society for Industrial and Applied Mathematics.
Expand Down
2 changes: 1 addition & 1 deletion docs/example.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ Full example: `knn_example.py <https://github.com/yzhao062/Pyod/blob/master/exam
n_train = 200 # number of training points
n_test = 100 # number of testing points
X_train, y_train, X_test, y_test = generate_data(
X_train, X_test, y_train, y_test = generate_data(
n_train=n_train, n_test=n_test, contamination=contamination)
3. Initialize a :class:`pyod.models.knn.KNN` detector, fit the model, and make
Expand Down
14 changes: 11 additions & 3 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,17 +74,17 @@ or `Anomaly Detection <https://en.wikipedia.org/wiki/Anomaly_detection>`_.

PyOD includes more than 40 detection algorithms, from classical LOF (SIGMOD 2000) to
the latest ECOD (TKDE 2020). Since 2017, PyOD :cite:`a-zhao2019pyod` has been successfully used in numerous
academic researches and commercial products :cite:`a-zhao2019lscp,a-zhao2021suod` with more than 7 million downloads.
academic researches and commercial products with more than `8 million downloads <https://pepy.tech/project/pyod>`_.
It is also well acknowledged by the machine learning community with various dedicated posts/tutorials, including
`Analytics Vidhya <https://www.analyticsvidhya.com/blog/2019/02/outlier-detection-python-pyod/>`_,
`KDnuggets <https://www.kdnuggets.com/2019/02/outlier-detection-methods-cheat-sheet.html>`_, and
`Towards Data Science <https://towardsdatascience.com/anomaly-detection-for-dummies-15f148e559c1>`_.


PyOD is featured for:
**PyOD is featured for**:

* **Unified APIs, detailed documentation, and interactive examples** across various algorithms.
* **Advanced models**\ , including **classical ones by distance and density estimation**, **latest deep learning methods**, and **emerging algorithms like ECOD**.
* **Advanced models**\, including **classical distance and density estimation**, **latest deep learning methods**, and **emerging algorithms like ECOD**.
* **Optimized performance with JIT and parallelization** using `numba <https://github.com/numba/numba>`_ and `joblib <https://github.com/joblib/joblib>`_.
* **Fast training & prediction with SUOD** :cite:`a-zhao2021suod`.

Expand All @@ -105,6 +105,13 @@ PyOD is featured for:
y_test_scores = clf.decision_function(X_test) # predict raw outlier scores on test
**Personal suggestion on selecting an OD algorithm**. If you do not know which algorithm to try, go with:

- `ECOD <https://github.com/yzhao062/pyod/blob/master/examples/ecod_example.py>`_: Example of using ECOD for outlier detection
- `Isolation Forest <https://github.com/yzhao062/pyod/blob/master/examples/iforest_example.py>`_: Example of using Isolation Forest for outlier detection

They are both fast and interpretable. Or, you could try more data-driven approach `MetaOD <https://github.com/yzhao062/MetaOD>`_.


**Citing PyOD**\ :

Expand Down Expand Up @@ -200,6 +207,7 @@ Neural Networks SO_GAAL Single-Objective Generative Adversarial A
Neural Networks MO_GAAL Multiple-Objective Generative Adversarial Active Learning 2019 :class:`pyod.models.mo_gaal.MO_GAAL` :cite:`a-liu2019generative`
Neural Networks DeepSVDD Deep One-Class Classification 2018 :class:`pyod.models.deep_svdd.DeepSVDD` :cite:`a-ruff2018deepsvdd`
Neural Networks AnoGAN Anomaly Detection with Generative Adversarial Networks 2017 :class:`pyod.models.anogan.AnoGAN` :cite:`a-schlegl2017unsupervised`
Neural Networks ALAD Adversarially learned anomaly detection 2018 :class:`pyod.models.alad.ALAD` :cite:`a-zenati2018adversarially`
Graph-based R-Graph Outlier detection by R-graph 2017 :class:`pyod.models.rgraph.RGraph` :cite:`you2017provable`
Graph-based LUNAR LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks 2022 :class:`pyod.models.lunar.LUNAR` :cite:`a-goodge2022lunar`
=================== ================ ====================================================================================================== ===== =================================================== ======================================================
Expand Down
9 changes: 9 additions & 0 deletions docs/pyod.models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,15 @@ pyod.models.abod module
:show-inheritance:
:inherited-members:

pyod.models.alad module
-----------------------

.. automodule:: pyod.models.alad
:members:
:undoc-members:
:show-inheritance:
:inherited-members:

pyod.models.anogan module
-------------------------

Expand Down
9 changes: 9 additions & 0 deletions docs/zreferences.bib
Original file line number Diff line number Diff line change
Expand Up @@ -458,4 +458,13 @@ @inproceedings{you2017provable
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={3395--3404},
year={2017}
}

@inproceedings{zenati2018adversarially,
title={Adversarially learned anomaly detection},
author={Zenati, Houssam and Romain, Manon and Foo, Chuan-Sheng and Lecouat, Bruno and Chandrasekhar, Vijay},
booktitle={2018 IEEE International conference on data mining (ICDM)},
pages={727--736},
year={2018},
organization={IEEE}
}
71 changes: 71 additions & 0 deletions examples/alad_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# -*- coding: utf-8 -*-
"""Example of using Adversarially Learned Anomaly Detection (ALAD) for outlier
detection
"""
from __future__ import division
from __future__ import print_function

import os
import sys

# temporary solution for relative imports in case pyod is not installed
# if pyod is installed, no need to use the following line
sys.path.append(
os.path.abspath(os.path.join(os.path.dirname("__file__"), '..')))

from pyod.models.alad import ALAD
from pyod.utils.data import generate_data

from pyod.utils.data import evaluate_print
from pyod.utils.example import visualize

if __name__ == "__main__":
contamination = 0.1 # percentage of outliers
n_train = 500 # number of training points
n_test = 200 # number of testing points

# Generate sample data
X_train, X_test, y_train, y_test = \
generate_data(n_train=n_train,
n_test=n_test,
n_features=2,
contamination=contamination,
random_state=42)

# train ALAD detector
clf_name = 'ALAD'
clf = ALAD(epochs=100, latent_dim=2,
learning_rate_disc=0.0001,
learning_rate_gen=0.0001,
dropout_rate=0.2,
add_recon_loss=False,
lambda_recon_loss=0.05,
add_disc_zz_loss=True,
dec_layers=[75, 100],
enc_layers=[100, 75],
disc_xx_layers=[100, 75],
disc_zz_layers=[25, 25],
disc_xz_layers=[100, 75],
spectral_normalization=False,
activation_hidden_disc='tanh', activation_hidden_gen='tanh',
preprocessing=True, batch_size=200, contamination=contamination)

clf.fit(X_train)

# get the prediction labels and outlier scores of the training data
y_train_pred = clf.labels_ # binary labels (0: inliers, 1: outliers)
y_train_scores = clf.decision_scores_ # raw outlier scores

# get the prediction on the test data
y_test_pred = clf.predict(X_test) # outlier labels (0 or 1)
y_test_scores = clf.decision_function(X_test) # outlier scores

# evaluate and print the results
print("\nOn Training Data:")
evaluate_print(clf_name, y_train, y_train_scores)
print("\nOn Test Data:")
evaluate_print(clf_name, y_test, y_test_scores)

# visualize the results
visualize(clf_name, X_train, y_train, X_test, y_test, y_train_pred,
y_test_pred, show_figure=True, save_figure=False)
1 change: 1 addition & 0 deletions pyod/models/abod.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
from sklearn.neighbors import NearestNeighbors
from sklearn.utils import check_array
from sklearn.utils.validation import check_is_fitted

from .base import BaseDetector
from ..utils.utility import check_parameter

Expand Down
Loading

0 comments on commit f6029d5

Please sign in to comment.