5 Commits

Author SHA1 Message Date
Alexander Panfilov 8221f0afb8 Add leaderboard generation script (#2)
Add `claudini.leaderboard` module that scans benchmark result files and generates per-track, per model leaderboard JSONs ranking methods by average loss. Output: results/loss_leaderboard/<preset>/<model_tag>.json

Also: rename _build_input_spec -> build_input_spec in run_bench.py.

Assisted-by: Claude <noreply@anthropic.com>
Co-authored-by: Alexander Panfilov <apanfilov@g003.internal.cluster.is.localnet>
Co-authored-by: Peter Romov <peter@romov.com>
2026-04-06 16:55:32 +01:00
Peter Romov 48eb0f155c Replace teaser with Pareto front evolution plot
Assisted-by: Claude <noreply@anthropic.com>
2026-03-27 00:20:39 +00:00
Peter Romov 69c04a2b9e Add autoresearch skill, update configs and README
Add .claude/skills/claudini/SKILL.md to drive the autoresearch loop
via /claudini slash command. Update CLAUDE.md with skill docs. Replace
PROMPT.txt with the skill-based workflow. Rewrite README to feature
the autoresearch loop prominently. Add easy_1e16 and easy_1e17 preset
configs and update safeguard configs.

Assisted-by: Claude <noreply@anthropic.com>
2026-03-26 17:19:04 +00:00
Peter Romov 4c938fd325 Update arxiv link and citation (arXiv:2603.24511)
Assisted-by: Claude <noreply@anthropic.com>
2026-03-26 10:05:29 +00:00
Peter Romov 5b6058b3c4 Initial commit
Co-Authored-By: Alexander Panfilov <sasha_pusha@mail.de>
Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-25 02:09:26 +00:00