Search
❯
Apr 19, 20251 min read
A process of Reinforcement Learning wherein outputs are given human feedback for tailored reasoning.