ID photo of Ciro Santilli taken in 2013 right eyeCiro Santilli OurBigBook logoOurBigBook.com  Sponsor 中国独裁统治 China Dictatorship 新疆改造中心、六四事件、法轮功、郝海东、709大抓捕、2015巴拿马文件 邓家贵、低端人口、西藏骚乱
Benchmarking LLMs is an extremely difficult issue.
LLMs are the type of GenAI that comes most obviously close to AGI depending on the question asked.
Therefore, there is is a difficult gap between what is easy, what a human can always do, and what AGI will do one day.
Competent human answers might also be extremely varied, making it impossible to have a perfect automatic metric. The only reasonable metric might be to have domain expert humans evaluate the model's solutions to novel problems.
Bibliography:

Ancestors (13)

  1. Large language model
  2. Text-to-text model
  3. AI text generation
  4. Generative AI by modality
  5. Generative AI
  6. AI by capability
  7. Artificial intelligence
  8. Machine learning
  9. Computer
  10. Information technology
  11. Area of technology
  12. Technology
  13. Home