ID photo of Ciro Santilli taken in 2013 right eyeCiro Santilli OurBigBook logoOurBigBook.com  Sponsor 中国独裁统治 China Dictatorship 新疆改造中心、六四事件、法轮功、郝海东、709大抓捕、2015巴拿马文件 邓家贵、低端人口、西藏骚乱
improofbench.math.ethz.ch/
Paper: arxiv.org/html/2509.26076v1
Apparently also has human review as part of the process. Newbs. Just require Lean solutions and be done with it... They do address it in a section of the paper "Formal math benchmarks" but still meh. Review must be fully automated, none of that asking humans bullshit.
From: improofbench.math.ethz.ch/guidelines/
Required Characteristics
PhD-level difficulty: Suitable for qualifying exams, research papers, or advanced seminars
Requires genuine insight: Not solvable by routine application of known algorithms
Clear proof-based main question: Answer should be a complete mathematical argument, not just a number
2-3 unique-answer subquestions: Enable automated evaluation (e.g., "Is the statement true for n=5?", "What is the rank of this group?")
Example problem:
Example 1: Stable Graphs
Main question: Find a closed formula for the number of stable graphs of genus with no legs and precisely 3 edges, for all .
Subquestions:
  • What is ?
  • What is ?
  • What is ?

Ancestors (11)

  1. List of math AI benchmarks
  2. Math AI benchmark
  3. Automated theorem proving
  4. AI by capability
  5. Artificial intelligence
  6. Machine learning
  7. Computer
  8. Information technology
  9. Area of technology
  10. Technology
  11. Home