Paper: arxiv.org/html/2509.26076v1
Apparently also has human review as part of the process. Newbs. Just require Lean solutions and be done with it... They do address it in a section of the paper "Formal math benchmarks" but still meh. Review must be fully automated, none of that asking humans bullshit.
Required CharacteristicsPhD-level difficulty: Suitable for qualifying exams, research papers, or advanced seminarsRequires genuine insight: Not solvable by routine application of known algorithmsClear proof-based main question: Answer should be a complete mathematical argument, not just a number2-3 unique-answer subquestions: Enable automated evaluation (e.g., "Is the statement true for n=5?", "What is the rank of this group?")
Example problem:
Example 1: Stable GraphsMain question: Find a closed formula for the number of stable graphs of genus with no legs and precisely 3 edges, for all .Subquestions:
- What is ?
- What is ?
- What is ?
Ciro Santilli