- #paper/read ~ [[2021 CE]] ~ [[Embedding]], [[Modularity]]
- Evaluating Word Embeddings with Categorical Modularity
- https://arxiv.org/abs/2106.00877
- https://github.com/enscma2/categorical-modularity
- Mentioned papers:
- [[Distributed Representations of Words and Phrases]]
- [[GloVe]]
- [[Enriching Word Vectors with Subword Information]]
- [[Cross-lingual Word Embedding Models Survey]]
- [[Word Translation Without Parallel Data]]
- [[Human Brain Activity for Machine Attention]]
-
Summary
- Evaluation of embeddings may be extrinsic (downstream task performance is measured) or intrinsic (direct testing of how well embeddings capture [[Semantic Vector Space|Semantic]] or syntactic properties).
- Categorical modularity metric employs 500 words drawn from brain-based semantic categories. All words are translated into 29 [[Language|Languages]].
-
The technique
- Calculate some distance [[Function]] for all embedding pairs.
- [[Cosine Similarity]] is used in the paper.
- The resulting [[Matrix]]
$M_D$ is symmetrical.
- For a given
$k \in \mathbb{Z}_+$ , build an adjacency [[Matrix]]$M_N$ for the resulting [[k-Nearest Neighbors, kNN|kNN]] [[Graph]].- This one is asymmetrical though!
- Let
$m$ be the total number of edges in the kNN graph.- To calculate it from
$M_N$ , let's count all the edges in the symmetrical version of the matrix and divide that by two:m = np.sum(np.fmax(M_N, M_N.T)) // 2
.
- To calculate it from
- The fraction of the expected number of edges within the category
$c$ :$$a_c = \frac{1}{2m} \sum_{i, j} M_{N_{i,j}} \mathbb{1}(c_i = c)$$ - The fraction of edges that connect words of the same semantic category
$c$ :$$e_c = \frac{1}{2m} \sum_{i, j} M_{N_{i,j}} \mathbb{1}(c_i = c) \mathbb{1}(c_j = c)$$ - The overall modularity Q is calculated as follows:
$$Q = \sum_{c} (e_c - a_c^2)$$ - Finally, it should be normalized by setting:
$$Q_{max} = 1 - \sum_{c} a_c^2$$ $$Q_{norm} = \frac{Q}{Q_{max}}$$ - A higher value of
$Q_{norm}$ indicates that a higher number of words that belong to the same categories are connected in the graph.
- Calculate some distance [[Function]] for all embedding pairs.
-
Notes
- Categorical modularity seems to reveal how well models map to the human [[Brain]].
- This is especially true of [[Regression]] tasks such as [[Word Similarity]].
- It may hint at how linguistic [[Information]] is encoded in the brain.
- Categorical modularity seems to reveal how well models map to the human [[Brain]].
![[categorical-modularity-for-embeddings.pdf]]