🍈 Zettelkasten

❯

Machine Learning

❯

Probe

Oct 03, 20251 min read

machine_learning
ai_safety

These are tools used to check if a model is hitting a specific benchmark. Involves training a Linear Classifier on model activations to test whether a feature is linearly decodable at a given layer.

Commonly used in Natural Language Processing and Mechanistic Interpretability

Graph View

Backlinks

Project Idea Masterlist
Mechanistic Interpretability

Created with Quartz v4.4.0 © 2025

GitHub
Discord Community