🍈 Zettelkasten

❯

❯

Asenion Security for AI

Asenion Security for AI

Jun 30, 20252 min read

ai_safety
security

This is a talk by David Van Bruwaene.

Notes

Isaac Asimov’s I Robot believed that AI will have intelligence, meaning it will do things, but we dont know exactly what it will do
We dont have much control over these autonamous systems
In I Robot, there is a central control over the robot, that governs Three Laws of Robotics
They create a AI that evaluates compliance of other agents (Agent as a judge)
AI Regulations have been implemented within
Google loses huge valuation when Google Gemini came out, and started generating people to be black or chinese
Chatbots recommend suicide
Classifiers that give racist recommendations to prisoners sentences. Trained off of racist data, so it has a data bias.
In the story I Robot, the robots lock humans up
Data inference to determine the private data present within training datasets by interrogating model outputs
Model extraction (Learn or steal a model by training a new model on sample inputs)
They use adversarial testing to check for certain detections
Legal skills are very good for coding surprisingly
Humans need to be able to decide, and have reasoning. This is where humans will remain on top
You will have standardized test results from these benchmarks
There are certain benchmarks for certain biases within models (how many black, how many chinese for image generation)
Testing aligning is to just tell the model, that it is not in a sandbox.
Probes are used to test specific categories
Ascenion has a python SDK to work within a CICD pipeline
If you know its not a secure system, dont trust to secure it
Use cheap regex guards first
Probes are categorized for specific attacks, these are usually categories gotten from OWASP Top 10 for LLM, other security vulns
Probes also include evaluation criteria, they look for a target goal, that the model has to aim for
How do we test probing models? We use human-generated test sets that will generalize and standardize evaluations
Evaluation attack, you can use Mechanistic Interpretability, to see what circuits in the model are learning specific contexts.
The field is 3yrs old, it is growing fast, super fast

Graph View

Backlinks

AI Security

Created with Quartz v4.4.0 © 2025

GitHub
Discord Community