Merge pull request #440 from yzhao062/development

V1.0.5
yzhao062 · Sep 15, 2022 · f6029d5 · f6029d5
2 parents 0027221 + ab60b97
commit f6029d5
Show file tree

Hide file tree

Showing 95 changed files with 1,529 additions and 524 deletions.
diff --git a/CHANGES.txt b/CHANGES.txt
@@ -167,4 +167,6 @@ v<1.0.3>, <07/04/2022> -- Add AnoGAN (#412).
 v<1.0.4>, <07/29/2022> -- General improvement of code quality and test coverage.
 v<1.0.4>, <07/29/2022> -- Add LUNAR (#413).
 v<1.0.4>, <07/29/2022> -- Add LUNAR (#415).
-
+v<1.0.5>, <07/29/2022> -- Import optimization.
+v<1.0.5>, <08/27/2022> -- Code optimization.
+v<1.0.5>, <09/14/2022> -- Add ALAD.
diff --git a/README.rst b/README.rst
@@ -68,17 +68,17 @@ or `Anomaly Detection <https://en.wikipedia.org/wiki/Anomaly_detection>`_.
 
 PyOD includes more than 40 detection algorithms, from classical LOF (SIGMOD 2000) to
 the latest ECOD (TKDE 2022). Since 2017, PyOD has been successfully used in numerous academic researches and
-commercial products [#Zhao2019LSCP]_ [#Zhao2021SUOD]_ with more than 7 million downloads.
+commercial products with more than `8 million downloads <https://pepy.tech/project/pyod>`_.
 It is also well acknowledged by the machine learning community with various dedicated posts/tutorials, including
 `Analytics Vidhya <https://www.analyticsvidhya.com/blog/2019/02/outlier-detection-python-pyod/>`_,
 `KDnuggets <https://www.kdnuggets.com/2019/02/outlier-detection-methods-cheat-sheet.html>`_, and
 `Towards Data Science <https://towardsdatascience.com/anomaly-detection-for-dummies-15f148e559c1>`_.
 
 
-PyOD is featured for:
+**PyOD is featured for**:
 
 * **Unified APIs, detailed documentation, and interactive examples** across various algorithms.
-* **Advanced models**\ , including **classical ones by distance and density estimation**, **latest deep learning methods**, and **emerging algorithms like ECOD**.
+* **Advanced models**\, including **classical distance and density estimation**, **latest deep learning methods**, and **emerging algorithms like ECOD**.
 * **Optimized performance with JIT and parallelization** using `numba <https://github.com/numba/numba>`_ and `joblib <https://github.com/joblib/joblib>`_.
 * **Fast training & prediction with SUOD** [#Zhao2021SUOD]_.
 
@@ -99,6 +99,13 @@ PyOD is featured for:
     y_test_scores = clf.decision_function(X_test)  # predict raw outlier scores on test
 
 
+**Personal suggestion on selecting an OD algorithm**. If you do not know which algorithm to try, go with:
+
+- `ECOD <https://github.com/yzhao062/pyod/blob/master/examples/ecod_example.py>`_: Example of using ECOD for outlier detection
+- `Isolation Forest <https://github.com/yzhao062/pyod/blob/master/examples/iforest_example.py>`_: Example of using Isolation Forest for outlier detection
+
+They are both fast and interpretable. Or, you could try more data-driven approach `MetaOD <https://github.com/yzhao062/MetaOD>`_.
+
 **Citing PyOD**\ :
 
 `PyOD paper <http://www.jmlr.org/papers/volume20/19-011/19-011.pdf>`_ is published in
@@ -126,7 +133,6 @@ or::
 
 
 * `View the latest codes on Github <https://github.com/yzhao062/pyod>`_
-* `Execute Interactive Jupyter Notebooks <https://mybinder.org/v2/gh/yzhao062/pyod/master>`_
 * `Anomaly Detection Resources <https://github.com/yzhao062/anomaly-detection-resources>`_
 
 
@@ -139,7 +145,6 @@ or::
 * `Model Save & Load <#model-save--load>`_
 * `Fast Train with SUOD <#fast-train-with-suod>`_
 * `Implemented Algorithms <#implemented-algorithms>`_
-* `Old Algorithm Benchmark <#old-algorithm-benchmark>`_
 * `Quick Start for Outlier Detection <#quick-start-for-outlier-detection>`_
 * `How to Contribute <#how-to-contribute>`_
 * `Inclusion Criteria <#inclusion-criteria>`_
@@ -243,6 +248,19 @@ The organization of **ADBench** is provided below:
    :alt: benchmark-fig
 
 
+**The comparison of selected models** is made available below
+(\ `Figure <https://raw.githubusercontent.com/yzhao062/pyod/master/examples/ALL.png>`_\ ,
+`compare_all_models.py <https://github.com/yzhao062/pyod/blob/master/examples/compare_all_models.py>`_\ ,
+`Interactive Jupyter Notebooks <https://mybinder.org/v2/gh/yzhao062/pyod/master>`_\ ).
+For Jupyter Notebooks, please navigate to **"/notebooks/Compare All Models.ipynb"**.
+
+
+.. image:: https://raw.githubusercontent.com/yzhao062/pyod/master/examples/ALL.png
+   :target: https://raw.githubusercontent.com/yzhao062/pyod/master/examples/ALL.png
+   :alt: Comparision_of_All
+
+
+
 ----
 
 Model Save & Load
@@ -353,6 +371,7 @@ Neural Networks      SO_GAAL             Single-Objective Generative Adversarial
 Neural Networks      MO_GAAL             Multiple-Objective Generative Adversarial Active Learning                                               2019   [#Liu2019Generative]_
 Neural Networks      DeepSVDD            Deep One-Class Classification                                                                           2018   [#Ruff2018Deep]_
 Neural Networks      AnoGAN              Anomaly Detection with Generative Adversarial Networks                                                  2017   [#Schlegl2017Unsupervised]_
+Neural Networks      ALAD                Adversarially learned anomaly detection                                                                 2018   [#Zenati2018Adversarially]_
 Graph-based          R-Graph             Outlier detection by R-graph                                                                            2017   [#You2017Provable]_
 Graph-based          LUNAR               LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks                               2022   [#Goodge2022Lunar]_
 ===================  ==================  ======================================================================================================  =====  ========================================
@@ -393,52 +412,6 @@ Utility              precision_n_scores      calculate precision @ rank n
 
 ----
 
-
-Old Algorithm Benchmark
-^^^^^^^^^^^^^^^^^^^^^^^
-
-In June 2022, we released a 36-page, the most comprehensive `anomaly detection benchmark paper <https://www.andrew.cmu.edu/user/yuezhao2/papers/22-preprint-adbench.pdf>`_.
-The fully `open-sourced ADBench <https://github.com/Minqi824/ADBench>`_ compares 30 anomaly detection algorithms on 55 benchmark datasets.
-
-The organization of **ADBench** is provided below:
-
-.. image:: https://github.com/Minqi824/ADBench/blob/main/figs/ADBench.png?raw=true
-   :target: https://github.com/Minqi824/ADBench/blob/main/figs/ADBench.png?raw=true
-   :alt: benchmark-old
-
-**The content below is obsolete**.
-
-**The comparison among of implemented models** is made available below
-(\ `Figure <https://raw.githubusercontent.com/yzhao062/pyod/master/examples/ALL.png>`_\ ,
-`compare_all_models.py <https://github.com/yzhao062/pyod/blob/master/examples/compare_all_models.py>`_\ ,
-`Interactive Jupyter Notebooks <https://mybinder.org/v2/gh/yzhao062/pyod/master>`_\ ).
-For Jupyter Notebooks, please navigate to **"/notebooks/Compare All Models.ipynb"**.
-
-
-.. image:: https://raw.githubusercontent.com/yzhao062/pyod/master/examples/ALL.png
-   :target: https://raw.githubusercontent.com/yzhao062/pyod/master/examples/ALL.png
-   :alt: Comparision_of_All
-
-A benchmark is supplied for select algorithms to provide an overview of the implemented models.
-In total, 17 benchmark datasets are used for comparison, which
-can be downloaded at `ODDS <http://odds.cs.stonybrook.edu/#table1>`_.
-
-For each dataset, it is first split into 60% for training and 40% for testing.
-All experiments are repeated 10 times independently with random splits.
-The mean of 10 trials is regarded as the final result. Three evaluation metrics
-are provided:
-
-- The area under receiver operating characteristic (ROC) curve
-- Precision @ rank n (P@N)
-- Execution time
-
-Check the latest `benchmark <https://pyod.readthedocs.io/en/latest/benchmark.html>`_. You could replicate this process by running
-`benchmark.py <https://github.com/yzhao062/pyod/blob/master/notebooks/benchmark.py>`_.
-
-
-----
-
-
 Quick Start for Outlier Detection
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
@@ -641,6 +614,8 @@ Reference
 
 .. [#You2017Provable] You, C., Robinson, D.P. and Vidal, R., 2017. Provable self-representation based outlier detection in a union of subspaces. In Proceedings of the IEEE conference on computer vision and pattern recognition.
 
+.. [#Zenati2018Adversarially] Zenati, H., Romain, M., Foo, C.S., Lecouat, B. and Chandrasekhar, V., 2018, November. Adversarially learned anomaly detection. In 2018 IEEE International conference on data mining (ICDM) (pp. 727-736). IEEE.
+
 .. [#Zhao2018XGBOD] Zhao, Y. and Hryniewicki, M.K. XGBOD: Improving Supervised Outlier Detection with Unsupervised Representation Learning. *IEEE International Joint Conference on Neural Networks*\ , 2018.
 
 .. [#Zhao2019LSCP] Zhao, Y., Nasrullah, Z., Hryniewicki, M.K. and Li, Z., 2019, May. LSCP: Locally selective combination in parallel outlier ensembles. In *Proceedings of the 2019 SIAM International Conference on Data Mining (SDM)*, pp. 585-593. Society for Industrial and Applied Mathematics.

diff --git a/docs/example.rst b/docs/example.rst
@@ -46,7 +46,7 @@ Full example: `knn_example.py <https://github.com/yzhao062/Pyod/blob/master/exam
         n_train = 200  # number of training points
         n_test = 100  # number of testing points
 
-        X_train, y_train, X_test, y_test = generate_data(
+        X_train, X_test, y_train, y_test = generate_data(
             n_train=n_train, n_test=n_test, contamination=contamination)
 
 3. Initialize a :class:`pyod.models.knn.KNN` detector, fit the model, and make

diff --git a/docs/index.rst b/docs/index.rst
@@ -74,17 +74,17 @@ or `Anomaly Detection <https://en.wikipedia.org/wiki/Anomaly_detection>`_.
 
 PyOD includes more than 40 detection algorithms, from classical LOF (SIGMOD 2000) to
 the latest ECOD (TKDE 2020). Since 2017, PyOD :cite:`a-zhao2019pyod` has been successfully used in numerous
-academic researches and commercial products :cite:`a-zhao2019lscp,a-zhao2021suod` with more than 7 million downloads.
+academic researches and commercial products with more than `8 million downloads <https://pepy.tech/project/pyod>`_.
 It is also well acknowledged by the machine learning community with various dedicated posts/tutorials, including
 `Analytics Vidhya <https://www.analyticsvidhya.com/blog/2019/02/outlier-detection-python-pyod/>`_,
 `KDnuggets <https://www.kdnuggets.com/2019/02/outlier-detection-methods-cheat-sheet.html>`_, and
 `Towards Data Science <https://towardsdatascience.com/anomaly-detection-for-dummies-15f148e559c1>`_.
 
 
-PyOD is featured for:
+**PyOD is featured for**:
 
 * **Unified APIs, detailed documentation, and interactive examples** across various algorithms.
-* **Advanced models**\ , including **classical ones by distance and density estimation**, **latest deep learning methods**, and **emerging algorithms like ECOD**.
+* **Advanced models**\, including **classical distance and density estimation**, **latest deep learning methods**, and **emerging algorithms like ECOD**.
 * **Optimized performance with JIT and parallelization** using `numba <https://github.com/numba/numba>`_ and `joblib <https://github.com/joblib/joblib>`_.
 * **Fast training & prediction with SUOD** :cite:`a-zhao2021suod`.
 
@@ -105,6 +105,13 @@ PyOD is featured for:
     y_test_scores = clf.decision_function(X_test)  # predict raw outlier scores on test
 
 
+**Personal suggestion on selecting an OD algorithm**. If you do not know which algorithm to try, go with:
+
+- `ECOD <https://github.com/yzhao062/pyod/blob/master/examples/ecod_example.py>`_: Example of using ECOD for outlier detection
+- `Isolation Forest <https://github.com/yzhao062/pyod/blob/master/examples/iforest_example.py>`_: Example of using Isolation Forest for outlier detection
+
+They are both fast and interpretable. Or, you could try more data-driven approach `MetaOD <https://github.com/yzhao062/MetaOD>`_.
+
 
 **Citing PyOD**\ :
 
@@ -200,6 +207,7 @@ Neural Networks      SO_GAAL           Single-Objective Generative Adversarial A
 Neural Networks      MO_GAAL           Multiple-Objective Generative Adversarial Active Learning                                               2019   :class:`pyod.models.mo_gaal.MO_GAAL`                 :cite:`a-liu2019generative`
 Neural Networks      DeepSVDD          Deep One-Class Classification                                                                           2018   :class:`pyod.models.deep_svdd.DeepSVDD`              :cite:`a-ruff2018deepsvdd`
 Neural Networks      AnoGAN            Anomaly Detection with Generative Adversarial Networks                                                  2017   :class:`pyod.models.anogan.AnoGAN`                   :cite:`a-schlegl2017unsupervised`
+Neural Networks      ALAD                Adversarially learned anomaly detection                                                               2018   :class:`pyod.models.alad.ALAD`                       :cite:`a-zenati2018adversarially`
 Graph-based          R-Graph           Outlier detection by R-graph                                                                            2017   :class:`pyod.models.rgraph.RGraph`                   :cite:`you2017provable`
 Graph-based          LUNAR             LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks                               2022   :class:`pyod.models.lunar.LUNAR`                     :cite:`a-goodge2022lunar`
 ===================  ================  ======================================================================================================  =====  ===================================================  ======================================================

diff --git a/docs/pyod.models.rst b/docs/pyod.models.rst
@@ -11,6 +11,15 @@ pyod.models.abod module
     :show-inheritance:
     :inherited-members:
 
+pyod.models.alad module
+-----------------------
+
+.. automodule:: pyod.models.alad
+    :members:
+    :undoc-members:
+    :show-inheritance:
+    :inherited-members:
+
 pyod.models.anogan module
 -------------------------
 

diff --git a/docs/zreferences.bib b/docs/zreferences.bib
@@ -458,4 +458,13 @@ @inproceedings{you2017provable
   booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
   pages={3395--3404},
   year={2017}
+}
+
+@inproceedings{zenati2018adversarially,
+  title={Adversarially learned anomaly detection},
+  author={Zenati, Houssam and Romain, Manon and Foo, Chuan-Sheng and Lecouat, Bruno and Chandrasekhar, Vijay},
+  booktitle={2018 IEEE International conference on data mining (ICDM)},
+  pages={727--736},
+  year={2018},
+  organization={IEEE}
 }
diff --git a/examples/alad_example.py b/examples/alad_example.py
@@ -0,0 +1,71 @@
+# -*- coding: utf-8 -*-
+"""Example of using Adversarially Learned Anomaly Detection (ALAD) for outlier
+detection
+"""
+from __future__ import division
+from __future__ import print_function
+
+import os
+import sys
+
+# temporary solution for relative imports in case pyod is not installed
+# if pyod is installed, no need to use the following line
+sys.path.append(
+    os.path.abspath(os.path.join(os.path.dirname("__file__"), '..')))
+
+from pyod.models.alad import ALAD
+from pyod.utils.data import generate_data
+
+from pyod.utils.data import evaluate_print
+from pyod.utils.example import visualize
+
+if __name__ == "__main__":
+    contamination = 0.1  # percentage of outliers
+    n_train = 500  # number of training points
+    n_test = 200  # number of testing points
+
+    # Generate sample data
+    X_train, X_test, y_train, y_test = \
+        generate_data(n_train=n_train,
+                      n_test=n_test,
+                      n_features=2,
+                      contamination=contamination,
+                      random_state=42)
+
+    # train ALAD detector
+    clf_name = 'ALAD'
+    clf = ALAD(epochs=100, latent_dim=2,
+               learning_rate_disc=0.0001,
+               learning_rate_gen=0.0001,
+               dropout_rate=0.2,
+               add_recon_loss=False,
+               lambda_recon_loss=0.05,
+               add_disc_zz_loss=True,
+               dec_layers=[75, 100],
+               enc_layers=[100, 75],
+               disc_xx_layers=[100, 75],
+               disc_zz_layers=[25, 25],
+               disc_xz_layers=[100, 75],
+               spectral_normalization=False,
+               activation_hidden_disc='tanh', activation_hidden_gen='tanh',
+               preprocessing=True, batch_size=200, contamination=contamination)
+
+    clf.fit(X_train)
+
+    # get the prediction labels and outlier scores of the training data
+    y_train_pred = clf.labels_  # binary labels (0: inliers, 1: outliers)
+    y_train_scores = clf.decision_scores_  # raw outlier scores
+
+    # get the prediction on the test data
+    y_test_pred = clf.predict(X_test)  # outlier labels (0 or 1)
+    y_test_scores = clf.decision_function(X_test)  # outlier scores
+
+    # evaluate and print the results
+    print("\nOn Training Data:")
+    evaluate_print(clf_name, y_train, y_train_scores)
+    print("\nOn Test Data:")
+    evaluate_print(clf_name, y_test, y_test_scores)
+
+    # visualize the results
+    visualize(clf_name, X_train, y_train, X_test, y_test, y_train_pred,
+              y_test_pred, show_figure=True, save_figure=False)
diff --git a/pyod/models/abod.py b/pyod/models/abod.py
@@ -16,6 +16,7 @@
 from sklearn.neighbors import NearestNeighbors
 from sklearn.utils import check_array
 from sklearn.utils.validation import check_is_fitted
+
 from .base import BaseDetector
 from ..utils.utility import check_parameter