🍈 Zettelkasten

❯

❯

Deep Neural Rejection

Deep Neural Rejection

Jun 30, 20251 min read

security
ai_safety

This is a method for securing Generative AI. Involves creating a second model that summarizes important layers in the Neural Net, then comparing the shapes of the inputs for each layer, and removing outliers.

Process

Train your initial model
Find inputs that badly mess up the outputs of the model
Apply DNR wrapper to the model, for it to be able to detect outlier inputs and refuse to evaluate them. You give it the option to say “I don-‘t know”

Graph View

Backlinks

AI Security
Cost Functions Are Too Confident

Created with Quartz v4.4.0 © 2025

GitHub
Discord Community