Upon encountering the term multicollinearity
, I decided to explore its definition to grasp its significance. Let's explore this concept in depth.
Multicollinearity refers to a phenomenon in statistical analysis where two or more explanatory variables in a multiple regression model are highly correlated. In essence, it describes a situation where there exists a strong linear relationship between predictor variables.
To put it simply:
- Multicollinearity occurs when independent variables in a model are not truly independent of each other.
- The absence of multicollinearity implies that no substantial linear relationship exists between the explanatory variables.
We can express the assumption of no multicollinearity mathematically as follows:
For any combination of coefficients a₀, ..., aₖ (not all zero) and k > 0:
Where E denotes the expected value, and X₁, ..., Xₖ are the explanatory variables.
To better understand multicollinearity, consider a scenario where identical variables appear multiple times in a regression model:
In this model, if X₁ = X₂ always holds true, then by choosing appropriate values for a₁ and a₂, we can create conditions that violate the previous assumption of no multicollinearity.
- Unstable coefficient estimates
- Inflated standard errors
- Difficulty in determining individual variable importance
- Potential overfitting of the model
Researchers often use methods such as:
- Variance Inflation Factor (VIF)
- Correlation matrices
- Eigenvalue analysis
When faced with multicollinearity, analysts might:
- Remove one of the correlated variables
- Combine correlated variables
- Use regularisation techniques (e.g., ridge regression)
- Collect more data, if possible
Understanding and addressing multicollinearity is compulsory for building statistical models and drawing accurate conclusions from data analysis.
The core idea centres on weights (or neurons) exhibiting high correlation with one another, which likely contain redundant information. Reducing multicollinearity in the model can be achieved by removing these correlated weights, resulting in a more effective and potentially more generalisable model.
Let's define the compression algorithm mathematically:
For a given layer
-
Define a correlation function
$C(w_i, w_j)$ between two weight vectors$w_i$ and$w_j$ :$$C(w_i, w_j) = \frac{\sum_{k=1}^n (w_{ik} - \bar{w_i})(w_{jk} - \bar{w_j})}{\sqrt{\sum_{k=1}^n (w_{ik} - \bar{w_i})^2} \sqrt{\sum_{k=1}^n (w_{jk} - \bar{w_j})^2}}$$ where
$\bar{w_i}$ and$\bar{w_j}$ are the means of$w_i$ and$w_j$ respectively. -
Define a pruning indicator function
$P(w_i)$ for each weight vector$w_i$ :$$P(w_i) = \begin{cases} 1 & \text{if } \max_{j < i} |C(w_i, w_j)| \leq \tau \ 0 & \text{otherwise } \end{cases}$$
where
$\tau$ is the correlation threshold. -
The compressed weight matrix
$W'$ is then defined as:$$W' = {w_i : P(w_i) = 1, i = 1, ..., m}$$ -
The compression ratio
$R$ for the layer is given by:$$R = 1 - \frac{|W'|}{|W|}$$ where
$|W|$ and$|W'|$ are the number of weights in the original and compressed matrices respectively. -
The overall model compression is achieved by applying this process to all layers:
$$M' = {L'_1, L'_2, ..., L'_k}$$ where
$L'_i$ is the compressed version of the$i$ -th layer, and$k$ is the total number of layers.
This approach to addressing multicollinearity is based on model pruning, one of several techniques used in model compression. Other methods in the code, like magnitude-based or variance-based pruning, also address multicollinearity by removing less important weights, which may be correlated with more important ones.
To fully utilize this multicollinearity-based compression, the initial weight initialization should provide sufficient variability and the correlation threshold
To use it in your project, compile the library using the Makefile, link against the resulting lib/libmodelcompressor.a
, and include include/model_compressor.h
in your source files. Finally, after training your model, apply the compression functions to the trained weights.
This implementation provides a foundation for model compression. You may extend it with more advanced pruning or distillation techniques as needed. For large language models, consider fine-tuning the compressed model to adapt it to specific tasks or domains. Doing so could improve its performance.
- Enlarging of the sample to address multicollinearity
- Covariance Matrix Analysis for Optimal Portfolio Selection
- Explainable Artificial Intelligence and Multicollinearity : A Mini Review of Current Approaches
- Regularized boosting with an increasing coefficient magnitude stop criterion as meta-learner in hyperparameter optimization stacking ensemble
This project is licensed under the GNU General Public License v3.0.
@misc{mcllm2024,
author = {Oketunji, A.F.},
title = {Understanding Multicollinearity},
year = 2024,
version = {0.0.1},
publisher = {Zenodo},
doi = {10.5281/zenodo.13308667},
url = {https://doi.org/10.5281/zenodo.13308667}
}
(c) 2024 Finbarrs Oketunji. All Rights Reserved.