add rethinking llm interp link

csinva · Feb 6, 2024 · 3b20dba · 3b20dba
1 parent da2b92e
commit 3b20dba
Show file tree

Hide file tree

Showing 4 changed files with 20 additions and 2 deletions.
diff --git a/_blog/misc/24_tensor_product_repr.md b/_blog/misc/24_tensor_product_repr.md
@@ -54,7 +54,7 @@ Each tensor product results in a matrix for each pair, representing a 2D plane i
 The composite tensor for the sentence "Cat chases mouse" is the sum of these individual tensor products.
 Since the roles are orthogonal, it's easy to see that the unique contribution of each role-filler pair is preserved without interference (in different rows).
 
-This example simplifies many aspects for clarity. In practice, the dimensions for roles and fillers might be much larger to capture more nuanced semantic features, and the mathematical operations might involve more sophisticated mechanisms to encode, manipulate, and decode the structured representations effectively.
+This example simplifies many aspects for clarity. In practice, the dimensions for roles and fillers might be much larger to capture more nuanced semantic features, and the mathematical operations might involve more sophisticated mechanisms to encode, manipulate, and decode the structured representations effectively. See another [example here](https://rtmccoy.com/tpdn/tpr_demo.html) (it's focused on applying TPRs to RNN representations).
 
 **Notes**
 - Learning in TPRs involves optimizing the filler and role vectors to optimize the reconstruction of input structures from their TPRs, achievable through gradient descent or other techniques

diff --git a/_includes/01_research.html b/_includes/01_research.html
@@ -184,6 +184,17 @@ <h2 style="text-align: center; margin-top: -150px;"> Research
         </tr>
     </thead>
     <tbody>
+        <tr>
+            <td class="center">'24</td>
+            <td>Rethinking Interpretability in the Era of Large Language Models
+            </td>
+            <td>singh et al.</td>
+            <td class="med">🔎🌀</td>
+            <td class="center"><a href="https://arxiv.org/abs/2402.01761">arxiv</a></td>
+            <td class="big"></td>
+            <td class="med">
+            </td>
+        </tr>
         <tr>
             <td class="center">'24</td>
             <td>Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs

diff --git a/_notes/neuro/comp_neuro.md b/_notes/neuro/comp_neuro.md
@@ -637,8 +637,14 @@ subtitle: Diverse notes on various topics in computational neuro, data-driven ne
 
   - TPR of a structure is the sum of the TPR of its constituents
     - tensor product operation allows constituents to be uniquely identified, even after the sum (if roles are linearly independent)
-
+- [TPR intro blog post](https://csinva.io/blog/misc/24_tensor_product_repr)
 - [TPR slides](https://www.mit.edu/~jda/teaching/6.884/slides/oct_02.pdf)
+- RNNs Implicitly Implement Tensor Product Representations ([mccoy...smolensky, 2019](https://arxiv.org/pdf/1812.08718.pdf))
+  - introduce TP Decomposition Networks (TPDNs), which use TPRs to approximate existing vector representations
+    - assumes a particular hypothesis for the relevant set of roles (e.g., sequence indexes or structural positions in a parse tree)
+
+  - TPDNs can successfully approximate linear and tree-based RNN autoencoder representations
+
 
 ## synaptic plasticity, hebb's rule, and statistical learning
 

diff --git a/_notes/research_ovws/ovw_transformers.md b/_notes/research_ovws/ovw_transformers.md
@@ -728,6 +728,7 @@ See related papers in the [📌 interpretability](https://csinva.io/notes/resear
       - $C = C(x)$
   - Tree Transformer: Integrating Tree Structures into Self-Attention ([wang, .., chen, 2019](https://arxiv.org/pdf/1909.06639.pdf))
   - Waveformer: Linear-Time Attention with Forward and Backward Wavelet Transform ([zhuang...shang, 2022](https://arxiv.org/abs/2210.01989))
+- White-Box Transformers via Sparse Rate Reduction: Compression Is All There Is? ([yaodong yu...yi ma, 2023](https://arxiv.org/abs/2311.13110))
 
 
 ## model merging / mixture of experts (MoE) / routing