Host-based Intrusion Detection System (HIDS) that identifies anomalies in system call traces by leveraging a combination of statistical and machine learning techniques to distinguish between normal (clean) and potentially malicious (infected) behaviors.
This pipeline is currently run offline / post-hoc; it therefore serves to be a practical bound on accuracy and a guide for future research efforts.
View pipeline here.
Technique/Feature | Description |
---|---|
Feature Engineering | Conversion of syscall info into high-dimensional feature vectors. |
Probabilistic Syscall Subclustering | Gaussian mixture models for granular syscall behavior understanding. |
Temporal Dependency Modeling | Markov chains capture transitions between syscall states as a function of time. |
Buffer Overflow Detection | Gaussian interval of string argument lengths to catch overflow attempts. |
Pathname Similarity Analysis | Self-organizing maps to visualize and detect anomalies in syscall pathnames. |
DoS Attack Detection | Markov chain edge frequency analysis per-trace for DoS detection. |
Segmentation | Suffix-tree based longest repeating substring is used as a segmentation sequence. |
Below are the confusion matrices showing the performance of the HIDS pipeline on the Twindroid dataset:
a) Average-Case Confusion Matrix:
b) Best-Case Confusion Matrix:
- Liao et al. "Anomaly Detection of System Call Sequence Based on Dynamic Features and Relaxed-SVM"
- Shamim et al. "Efficient Approach for Anomaly Detection in IoT Using System Calls"
- Frossi et al. "Selecting and Improving System Call Models for Anomaly Detection"
- Android Dataset
- Cosma Shalizi's Notes on Markov Chains and Prediction Processes
- Columbia CS Dept's Intrusion Detection Pipeline
This project is licensed under the MIT License.