This repository provides methods for visualizing loss landscapes to understand the optimization paths and behaviors of neural networks.
- Description: Visualizes how loss changes as we move from initial parameters (θ) to optimized parameters (θ*).
- Method: Interpolate the parameters between these two sets and plot the loss along this path.
- Insight: A smooth change in loss indicates stable optimization and potentially good generalization ability.
- Resources:
- Description: An extension of linear interpolation that samples the loss function in a 2D space spanned by two directions.
- Method: Select two directions (normalized directions) d1 and d2, then evaluate the loss functions at points θ + αd1 + βd2 where α and β are scalars.
- Applications: Widely used for loss visualization and provides insights into the optimization path in a 2D plane.
- Resources:
- Description: Uses the geometric properties of the loss landscapes, particularly curvatures.
- Insight: Useful for understanding the local geometry around minima, providing detailed information on the curvature and stability of the loss landscape.
- Problem: Not feasible for large networks due to the high complexity of computing the full Hessian.
- Resources:
- Description: PCA can be used to understand the optimization structure and the overall structure.
- Method I (Optimization Path): Calculate PCA on the weight matrix to get information on the most optimized directions.
- Method II (Overall Structure): Find the top 2 or 3 principal components capturing the most variation in the data.
- Limitations: PCA captures only linear variations, while neural networks are almost always non-linear.
- Resources:
- Arxiv: 1712.09913 (pages 9-10)
- UMAP: Captures both local and global structures of the data and handles larger datasets.
- Autoencoders: Use an encoder and decoder to significantly reduce dimensionality.
- Trustworthiness: Measures how well the low-dimensional representation preserves the neighborhood structure of the high-dimensional data.
- Reconstruction Error: Measures how accurately the original data can be reconstructed from the reduced dimensions, especially useful if using autoencoders.