Online Autoresearch: Running Autoresearch with Real Online Feedback.
We ran an agent through 7 SFT experiments end-to-end — from data filtering to online ELO evaluation — using real user feedback as the only reward signal. The best run lifted online win rate from 56.0% to 60.1%.
→