ID photo of Ciro Santilli taken in 2013 right eyeCiro Santilli OurBigBook logoOurBigBook.com  Sponsor 中国独裁统治 China Dictatorship 新疆改造中心、六四事件、法轮功、郝海东、709大抓捕、2015巴拿马文件 邓家贵、低端人口、西藏骚乱

AI by capability

Words: 7k Articles: 275
Given enough computational power per dollar, AGI is inevitable, but it is not sure certain ever happen given the end of end of Moore's Law.
Alternatively, it could also be achieved genetically modified biological brains + brain in a vat.
Imagine a brain the size of a building, perfectly engineered to solve certain engineering problems, and giving hints to human operators + taking feedback from cameras and audio attached to the operators.
This likely implies transhumanism, and mind uploading.
Ciro Santilli joined the silicon industry at one point to help increase our computational capacity and reach AGI.
Ciro believes that the easiest route to full AI, if any, could involve Ciro's 2D reinforcement learning games.

Principles of AGI

Words: 527 Articles: 7
Ciro Santilli has felt that perhaps what is missing in 2020's AGI research is:
  • the interface between:
    The key question is somewhat how to extract symbols out of the space-time continuous experiences.
  • more specialized accelerators that somehow interface with more generic artificial neural networks. Notably some kind of speialized processing of spacial elements is obviously hardcoded into the brain, see e.g. Section "Grid cell"
Forcing these boundaries to be tested was one of the main design goals of Ciro's 2D reinforcement learning games.
In those games, for example:
  • when you press a button here, a door opens somewhere far away
  • when you touch certain types of objects, a chemical reaction may happen, but not other types of objects
Therefore, those continuous objects would also have "magic" effects that could not be explained by "simple" "what is touching what" ideas.
Bibliography:
This point is beautifully argued in lots of different sources, and is clearly a pillar of AGI.
Perhaps one may argue that our deep learning layers do form some kind of hierarchy, e.g. this is very clear in certain models such as convolutional neural network. But many of those models cannot have arbitrarily deep hierarchies, which appears to be a fundamental aspect of intelligence.
How to Create a Mind:
The lists of steps in my mind are organized in hierarchies. I follow a routine procedure before going to sleep. The first step is to brush my teeth. But this action is in turn broken into a smaller series of steps, the first of which is to put toothpaste on the toothbrush. That step in turn is made up of yet smaller steps, such as finding the toothpaste, removing the cap, and so on. The step of finding the toothpaste also has steps, the first of which is to open the bathroom cabinet. That step in turn requires steps, the first of which is to grab the outside of the cabinet door. This nesting actually continues down to a very fine grain of movements, so that there are literally thousands of little actions constituting my nighttime routine. Although I may have difficulty remembering details of a walk I took just a few hours ago, I have no difficulty recalling all of these many steps in preparing for bed - so much so that I am able to think about other things while I go through these procedures. It is important to point out that this list is not stored as one long list of thousands of steps - rather, each of our routine procedures is remembered as an elaborate hierarchy of nested activities.
Human Compatible: TODO get exact quote. It was something along: life goal: save world from hunger. Subgoal: apply for some grant. Sub-sub-goal: eat, sleep, take shower. Sub-sub-sub-goal: move muscles to get me to table and open a can.
AGI architecture
Words: 64 Articles: 4
Video 1.
From Machine Learning to Autonomous Intelligence by Yann LeCun (2023)
Source. After a bunch of B.S., LeCun goes on to describe his AGI architecture. Nothing ground breaking, but not bad either.
Bibliography:
Tagged
Elements of AGI
Words: 34 Articles: 3
This section is about ideas that are thought to be part of an AGI system.
Tagged
Common sense
Words: 19
Video 2.
My Job is to Open and Close Doors by Mattias Pilhede (2019)
Source. An interesting humorous short meditation on common sense.
Instrumental goal
Articles: 1

AGI research

Words: 1k Articles: 38
History of AGI research
Words: 307 Articles: 6
AGI blues
Words: 54 Articles: 1
Term invented by Ciro Santilli, similar to "nuclear blues", and used to describe the feeling that every little shitty job you are doing (that does not considerably help achieving AGI) is completely pointless given that we are likely close to AGI as of 2023.
AGI excitement
Words: 12
The opposite of the AGI blues. In 2025 Ciro Santilli fell well in this camp.
Tagged
Due to the failures of earlier generations, which believed that would quickly achieve AGI, leading to the AI winters, 21st researchers have been very afraid of even trying it, rather going only for smaller subste problems like better neural network designs, at the risk of being considered a crank.
While there is fundamental value in such subset problems, the general view to the final goal is also very important, we will likely never reach AI without it.
This is voiced for example in Superintelligence by Nick Bostrom (2014) section "Opinions about the future of machine intelligence" which in turn quotes Nils Nilsson:
There may, however, be a residual cultural effect on the AI community of its earlier history that makes many mainstream researchers reluctant to align themselves with over-grand ambition. Thus Nils Nilsson, one of the old-timers in the field, complains that his present-day colleagues lack the boldness of spirit that propelled the pioneers of his own generation:
Concern for "respectability" has had, I think, a stultifying effect on some AI researchers. I hear them saying things like, "AI used to be criticized for its flossiness. Now that we have made solid progress, let us not risk losing our respectability." One result of this conservatism has been increased concentration on "weak AI" - the variety devoted to providing aids to human
thought - and away from "strong AI" - the variety that attempts to mechanize human-level intelligence
Nilsson’s sentiment has been echoed by several others of the founders, including Marvin Minsky, John McCarthy, and Patrick Winston.
Don't be a pussy, AI researchers!!!
AGI interest group
Words: 33 Articles: 3
AGI conference
Words: 33 Articles: 1
www.agi-conference.org/
It is hard to overstate how low the level of this conference seems to be at first sight. Truly sad.
Open AGI Summit
Words: 12
openagi.xyz/
The tagline smells like bullshit:
Integrating AI with web3 and decentralized ecosystems
sciendo.com/journal/JAGI
AGI research entity
Words: 688 Articles: 25
Tagged
amazon.jobs/content/en/teams/agi
Giotto.ai
Words: 58
www.giotto.ai/
At Giotto.ai, our technology is designed to bridge the gap between current AI capabilities and the promise of Artificial General Intelligence (AGI).
Their website doesn't clearly explain their technology as of 2025.
They claim to have done some work on ARC-AGI which is cool, but no clear references to what they did or if there's anything public about it.
Kyutai
Words: 49
kyutai.org/ just says:
Our mission is to build and democratize artificial general intelligence through open science
They are not-for-profit and had massive investments: techcrunch.com/2023/11/17/kyutai-is-an-french-ai-research-lab-with-a-330-million-budget-that-will-make-everything-open-source/
they also don't say at all what they are looking into for AGI, the only public thing they have are speech to speech and speech-to-text so how's that related to agi at.
quest.mit.edu/about/vision-statement
ssi.inc/
raised $1b at $5b valuation on september 2024, then $2b at $30b on march 2025. lol!
From their website:
Superintelligence is within reach.
Our singular focus means no distraction by management overhead or product cycles, and our business model means safety, security, and progress are all insulated from short-term commercial pressures.
Astera Institute
Words: 94 Articles: 4
astera.org/agi/
By the rich founder of Mt. Gox and Ripple, Jed McCaleb.
Obelisk is the Artificial General Intelligence laboratory at Astera. We are focused on the following problems: How does an agent continuously adapt to a changing environment and incorporate new information? In a complicated stochastic environment with sparse rewards, how does an agent associate rewards with the correct set of actions that led to those rewards? How does higher level planning arise?
Hipster research institute
Words: 15 Articles: 1
These are research institutes usually funded by rich tech bros, sometimes cryptocurrency magnates, but not necessarily.
Tagged
topos.institute/
Astera Institute person
Words: 10 Articles: 1
Tagged
Michael Nielsen
Words: 10
Interesting dude, with some interest overlaps with Ciro Santilli, like quantum computing:
FutureAI (Future AI)
Words: 259 Articles: 3
It is a bit hard to decide if those people are serious or not. Sometimes it feels scammy, but sometimes it feels fun and right!
Particularly concerning is the fact that they are not a not-for-profit entity, and it is hard to understand how they might make money.
Charles Simon, the founder, is pretty focused in how natural neurons work vs artificial neural network models. He has some good explanations of that, and one major focus of the project is their semi open source spiking neuron simulator BrainSimII. While Ciro Santilli believes that there might be insight in that, he also has doubts if certain modules of the brain wouldn't be more suitable coded directly in regular programming languages with greater ease and performance.
FutureAI appears to be Charles' retirement for fun project, he is likely independently wealthy. Well done.
Video 3.
Creativity and AGI by Charles Simon's at AGI-22 (2022)
Source. Sounds OK!
Video 4.
Machine Learning Is Not Like Your Brain by Future AI (2022)
Source. Contains some BrainSimII demos.
BrainSimII
Words: 34
github.com/FutureAIGuru/BrainSimII
The video from futureai.guru/technologies/brian-simulator-ii-open-source-agi-toolkit/ shows a demo of the possibly non open source version. They have a GUI neuron viewer and editor, which is kind of cool.
Video 5.
Machine Learning Is Not Like Your Brain by Charles Simon (2022)
Source.
Not having a manipulator claw is a major issue with this one.
But they also have a co-simulation focus, which is a bit of a win.
Charles Simon
Words: 26
Basically it looks like the dude got enough money after selling some companies, and now he's doing cooler stuff without much need of money. Not bad.
GoodAI
Words: 42 Articles: 2
Marek Rosa's play thing.
Video 6.
AI Game - LLM-driven NPCs that can talk by Marek Rosa (2023)
Source. Not the most amazing demo, but the idea is there. Seems to be a preview for AI People. The previous working title seems to have been AI Odyssey.
NDEA
Words: 18
ndea.com/
We believe program synthesis holds the key to unlocking AGI.
Cool. Founders are also very interested in ARC-AGI.
Numenta
Words: 6 Articles: 4
Homepage: www.numenta.com/
Numenta employee
Articles: 1
Hierarchical temporal memory
Words: 5 Articles: 1
Video 7.
HTM Overview (Episode 0) by Numenta
. Source.
https://upload.wikimedia.org/wikipedia/en/b/bd/OnInt.png
Sakana.AI
Words: 59
sakana.ai
Their description is a bit of localization randomness:
We are building a world class AI research lab in Tokyo.
We want to develop AI solutions for Japan's needs, and democratize AI in Japan.
Video 8.
I Co-Invented the Transformer. Now I'm Replacing It.
Source. Interview with Sakana.AI co-founders Llion Jones and Luke Darlow by Machine Learning Street Talk published Nov 23, 2025.
OpenCog
Words: 21 Articles: 3
Ben Goertzel
Words: 21 Articles: 2
www.reddit.com/r/artificial/comments/b38hbk/what_do_my_fellow_ai_researchers_think_of_ben/ What do my fellow AI researchers think of Ben Goertzel and his research?
SingularityNET
Words: 8 Articles: 1
singularitynet.io/
Ben Goertzel's fog computing project to try and help achieve AGI.

AGI-complete

Words: 25
Term invented by Ciro Santilli to refer to problems that can only be solved once we have AGI.
It is somewhat of a flawed analogy to NP-complete.

AGI test

Words: 2k Articles: 85
CAPTCHA
Articles: 1
This one goes all in the following themes:
  • few examples to learn from. You have to carefully inspect the input examples to deduce the output rules. Rules can require specific It application ordering, so you actually generate an algorithm. It tends to be easy for humans, but sometimes not so easy!
  • extensive use of geometric concepts, notably "contained inside", "adjacent to", "connected"
Bibliography:
ARC-AGI theory
Words: 165 Articles: 2
The extreme overfitting case of training is to have a map where each input leads to one output.
However it is cool that this overfit does not allow you to compute the final input for which there is no known output.
This therefore forces the creation of more general solution rules.
While in some cases solutions can work for any input, in many others they require specific assumptions about input, but the model could simply check that the assumptions apply to all inputs and use them for the final algorithm.
Bibliography:
People who do cool open tech stuff when don't need money anymore are awesome:
www.kaggle.com/code/allegich/arc-agi-2025-visualization-all-1000-120-tasks contains plots of all questions and answers. It is truly very convenient.
ARC-AGI approach
Words: 6 Articles: 1
www.kaggle.com/code/allegich/eda-statistical-analysis-and-feature-extraction has a very basic feature extraction.
ARC-AGI implementation
Words: 640 Articles: 8
Bibliography:
ARC-DSL
Words: 127
github.com/michaelhodel/arc-dsl
This interesting repo defines a set of input transformations that can be composed together into programs to generate the solve ARC problems.
It does not appear to have any program synthesis: it only defines the DSL and then provides manual solutions to the problems.
The README is lacking as usual, an overview of the files is:
Intended usage to run the solvers seems to be:
git clone https://github.com/fchollet/ARC-AGI
cd ARC-AGI
git checkout 399030444e0ab0cc8b4e199870fb20b863846f34
git clone https://github.com/michaelhodel/arc-dsl
cd arc-dsl
git checkout 635de4902a5fb4e376f27333feaa396d3f5dfdcb
python main.py
Unfortunately this blows up on Ubuntu 25.04 on test_mpapply apparently due to a Python 3.12 issue and the pull request github.com/michaelhodel/arc-dsl/pull/7 has been ignored for more than one year, so the project is largely dead.
ARC-DSL-2
Words: 15
github.com/arc-dsl-2/arc-dsl-2
Ciro Santilli's fork of ARC-DSL merging all pull requests needed to make tests run again on Ubuntu 25.04.
github.com/cristianoc/arc-agi-2-abstraction-dataset
Contains 120 DSL implementations for the
From another awesome retired tech bro that does this project for fun.
ARC-AGI without LLM
Words: 479 Articles: 4
Some mentions at: arcprize.org/blog/arc-prize-2025-results-analysis section "Zero-Pretraining Deep Learning Methods".
iliao2345.github.io/blog_posts/arc_agi_without_pretraining/arc_agi_without_pretraining.html
github.com/mvakde/mdlARC
Local CPU ARC-AGI without LLM
Words: 471 Articles: 1
github.com/aviad12g/ARC-AGI-solution
Interesting looking repo with optional GPU and optional LLM.
It seems to have been tested on something older than Ubuntu 24.04, as 24.04 install requires some porting, started process at: github.com/cirosantilli/ARC-AGI-solution/tree/ubuntu-24-04 but gave up to try Ubuntu 22.04 instead.
Ubuntu 22.04 Docker install worked without patches, after installing Poetry e.g. to try and solve 1ae2feb7:
git clone https://github.com/aviad12g/ARC-AGI-solution
cd ARC-AGI-solution
git checkout f3283f727488ad98fe575ea6a5ac981e4a188e49
poetry install
git clone https://github.com/arcprize/ARC-AGI-2
`poetry env activate`
export PYTHONPATH="$PWD/src:$PYTHONPATH"
python3 -m arc_solver.cli.main solve ARC-AGI-2/data/evaluation/1ae2feb7.json
but towards the end we have:
{
  "success": false,
  "error": "Search failed: no_multi_example_solution",
  "search_stats": {
    "nodes_expanded": 21,
    "nodes_generated": 903,
    "termination_reason": "no_multi_example_solution",
    "candidates_generated": 25,
    "examples_validated": 3,
    "validation_success_rate": 0.0,
    "multi_example_used": true
  },
  "predictions": [
    null,
    null,
    null
  ],
  "computation_time": 30.234344280001096,
  "task_id": "1ae2feb7",
  "task_file": "ARC-AGI-2/data/evaluation/1ae2feb7.json",
  "solver_version": "0.1.0",
  "total_time": 30.24239572100123,
  "timestamp": 1760353369.9701269
}

Task: 1ae2feb7.json
Success: False
Error: Search failed: no_multi_example_solution
Multi-example validation: ENABLED
Training examples validated: 3
Candidates generated: 25
Validation success rate: 0.0%
Computation time: 30.23s
Total time: 30.24s
so it failed.
Let's see if any of them work at all as advertised:
ls ARC-AGI-2/data/evaluation/ | xargs -I'{}' python3 -m arc_solver.cli.main solve 'ARC-AGI-2/data/evaluation/{}' |& tee tmp.txt
and at the end:
grep 'Success: True' tmp.txt | wc
has only 7 successes.
Also weirdly
grep 'Success: True' tmp.txt | wc
only has 102 hits, but there were 120 JSON tasks in that folder. I search for the missing executions:
diff -u <(grep Task: tmp.txt | cut -d' ' -f2) <(ls ARC-AGI-2/data/evaluation)
The first missing one is 135a2760, it blows up with:
ERROR: Solve command failed: Object of type HorizontalLinePredicate is not JSON serializable
and grepping ERROR gives us:
ERROR: Solve command failed: Object of type HorizontalLinePredicate is not JSON serializable
ERROR: Solve command failed: Object of type SizePredicate is not JSON serializable
ERROR: Solve command failed: Object of type HorizontalLinePredicate is not JSON serializable
ERROR: Solve command failed: Object of type HorizontalLinePredicate is not JSON serializable
ERROR: Solve command failed: Object of type ndarray is not JSON serializable
ERROR: Solve command failed: Object of type HorizontalLinePredicate is not JSON serializable
ERROR: Solve command failed: Object of type ndarray is not JSON serializable
ERROR: Solve command failed: Object of type HorizontalLinePredicate is not JSON serializable
ERROR: Solve command failed: Object of type VerticalLinePredicate is not JSON serializable
ERROR: Solve command failed: Object of type VerticalLinePredicate is not JSON serializable
ERROR: Solve command failed: Object of type ndarray is not JSON serializable
ERROR: Solve command failed: Object of type VerticalLinePredicate is not JSON serializable
ERROR: Solve command failed: Object of type ndarray is not JSON serializable
ERROR: Solve command failed: Object of type HorizontalLinePredicate is not JSON serializable
ERROR: Solve command failed: Object of type HorizontalLinePredicate is not JSON serializable
ERROR: Solve command failed: Object of type HorizontalLinePredicate is not JSON serializable
ERROR: Solve command failed: Object of type VerticalLinePredicate is not JSON serializable
ERROR: Solve command failed: Object of type VerticalLinePredicate is not JSON serializable
Reported at: github.com/aviad12g/ARC-AGI-solution/issues/1
ARC-AGI problem set
Words: 1k Articles: 64
Official ARC-AGI problem set
Words: 1k Articles: 63
ARC-AGI-1
Words: 309 Articles: 8
ARC-AGI-1 problem
Words: 309 Articles: 7
Train
Words: 309 Articles: 5
007bbfb (1)
Words: 177
arcprize.org/play?task=007bbfb7
Hard input constraints:
  • inputs are 3x3
  • inputs contain only 2 colors monocolored: black and another
Hard output constraints:
  • output is 3x input width and height. Suggests that the output is a 3x3 grid based on the input.
    • stronger: if output is split as a 3x3 grid, then each 3x3 block is either black or a copy as input. Which is which?
      • stronger: each pixel of the input determines if block is black or copy (final solution)
  • output contains only two colors: black and another
    • stronger: the same two colors as input
Input output comparison:
  • input appears pasted on output multiple times: suggests it is being copy pasted
Hard output constraints:
  • output is 3x input width and height: suggests that the output is a 3x3 grid based on the input
    If that is the case, let's try to figure out what is placed on each output grid.
    Notice: each grid element is either blank or the input.
    OK so let's determine what in the input determines each output grid.
    Because input in 3x3 maybe there's a direct mapping.
00d62c1b (2)
Words: 63
arcprize.org/play?task=00d62c1b
Hard input constraints:
  • inputs have two colors: green and black
Hard output constraints:
  • output has three colors: black, green and yellow
  • output has same size as input
  • green is copied from input to output
    • output differs from input by making some black pixels yellow. Which pixels are becoming yellow?
      • output differs from input by making inner regions (non-diagonal) yellow with non-diagonal flood fill
017c7c7b (3)
Words: 16
arcprize.org/play?task=017c7c7b
Input constraints:
  • inputs are 3x6
Output constraints:
  • outputs are 3x9
TODO: this one is quite challenging.
025d127b (4)
Words: 53
arcprize.org/play?task=025d127b
Output constraints:
  • Input and output have the same size
  • Supposing background is black, input and output contain the same number of objects of each color
    • the lower right part of each object (non-diagonal) does not move
      • the rest of each object outside the lower right part moves by 1 square to the right
arcprize.org/play?task=0520fde7
ARC-AGI-2 (2025-03-24)
Words: 676 Articles: 48
Has the following structure:
ARC-AGI-2 problem
Words: 620 Articles: 47
Approach
Words: 277 Articles: 34
Primitive
Words: 277 Articles: 33
These section lists common visual primitives that a solver must first extract in order to infer solutions.
Some of these have a lot of prior world content, others less.
Many people have come up with the same idea on the Discord. Some nicely call it DSL.
Implementations:
Input primitive
Words: 213 Articles: 29
If a color is inferred to be a background color, it contains no information and should be ignored.
Most problems tend to use black as a background color, but not all of them.
Object
Words: 180 Articles: 26
An "object" is a set of points that is understood to be one singular entity.
Contiguity and having the same color are strong indicators that something should be understood as an object.
Container
Words: 52 Articles: 9
Box
Words: 52 Articles: 8
A rectangular container.
The toplevel viewport is always implicitly understood as a special box.
Edge
Articles: 4
Toplevel box
Words: 38 Articles: 2
There are two or more boxes drawn inside the toplevel and sharing boundaries with toplevel.
Two toplevel boxes
Words: 23 Articles: 1
There are two toplevel boxes, one contains only input, and all output goes to the second one. The second one may also contain some input.
Primitive relation
Words: 31 Articles: 2
Distance
Words: 31 Articles: 1
A path is something you obtain by somehow drawing from one point to another, e.g. a line, and then starting another drawing between two points from the end point.
Distance = 0.
Rectangle
Words: 15 Articles: 2
Rectangle is like a box but always fully filled.
Square
Words: 6 Articles: 1
Point
Words: 6
A point is a 1-square.
Path
Words: 50 Articles: 8
Dotted path
Words: 44
A dotted line is a generalized line that cycles between a color pattern, e.g.:
r r g
would be a line:
r r g r r g r r g
An extra color "transparent" may also be added to not change for that pixel.
Line
Words: 6 Articles: 6
Dotted line
Words: 6
A dotted path that is also a dotted line.
Perpendicular line
Articles: 2
Output primitive
Words: 16 Articles: 2
Optimize
Words: 16
There is no unique solution, we just have to optimize something, often the least changed colors.
List
Words: 343 Articles: 11
Eval
Words: 343 Articles: 9
1ae2feb7 (1)
Words: 93
arcprize.org/play?task=1ae2feb7
To the left of the vertical red line, count the number of each color on each row.
Then to the right, on each line draw one square of each color to the left every n columns, starting with a square on the first column to the right of the red line, where n is the count of that color.
Start with the color furthest away from the red line, and then color with colors nearer to the red line. If there's overlap, replace the old color with the new one.
Input:
Output:
3e6067c3 (2)
Words: 12
arcprize.org/play?task=3e6067c3
Input primitives:
Transformations primitives:
  • line drawing
16b78196 (3)
Words: 89
arcprize.org/play?task=16b78196
Solution: move pieces to fill the gap on the fat object that crosses the screen. Place objects either on fat object or on other objects placed on the fat object. Anything you add must end in a rectangle.
The rules for this one are not entirely clear with the number of examples.
Also clearly if the goal is to make rectangular towers, then this is an NP-hard optimization problem in general.
Input primitives:
  • same color chunk. Properties: crosses screen.
Transformation primitives:
  • move solid around
  • fills the gap
This existed earlier: x.com/GianpaoloGalli/status/1846144236900827413
142ca369 (4)
Words: 63
arcprize.org/play?task=142ca369
Solution: vs are guns that shoot diagonal line of their color, when line touches another object, change line color to match that of the object, then bounce on the object and continue going with the new color
Input primitives:
  • diagonal line
Assumptions:
  • line don't cross each other, it is unclear how to resolve that case
Transformation primitives:
  • draw line
    • draw line and bounce
136b0064 (5)
Words: 6
arcprize.org/play?task=136b0064
Input primitive:
Transformation primitives:
0934a4d8 (6)
Words: 6
arcprize.org/play?task=0934a4d8
TODO I can't solve that one.
135a2760 (7)
Words: 3
arcprize.org/play?task=135a2760
Input:
Output:
13e47133 (8)
Words: 9
arcprize.org/play?task=13e47133
Input:
Output:
195c6913 (10)
Words: 62
arcprize.org/play?task=195c6913
Input: three or more containers:
Output:
  • draw dotted path of perpendicular line
    • the path color pattern comes from the color of top left objects, ordered from nearest to furthest from top le
ARC-AGI-3
Words: 71
They are moving to 2d discrete AI games.
Although there is merit in that, it is a shame that it just similar to other pre-existing work such as gvgai and many others.
Solutions to these solutions require much more thought to formalize a solution.
Also the solutions are much less unique, finding the actual optimal solution being obviously NP-hard.
These aspects make those games much less elegant than the older ARC-AGI 1 and 2 counterparts.
Unofficial ARC-AGI problem set
Words: 146 Articles: 3
This section is about unofficial ARC-AGI-like problem sets.
These are interesting from both a:
  • practical point of view, as they provide more training data for potential solvers. If you believe that they are representative that is of course.
  • theoretical point of view, as they might help to highlight missing or excessive presumptions of the official datasets
github.com/neoneye/arc-dataset-collection contains a fantastic collection of such datasets, with visualization at: neoneye.github.io/arc/
ARC-AGI problem generator
Words: 80 Articles: 2
re-arc
Words: 80
github.com/michaelhodel/re-arc
By the author of ARC-DSL.
README says:
This repository presents code to procedurally generate examples for the ARC training tasks. For each of the 400 tasks, an example generator is provided.
arxiv.org/html/2404.07353v1 says:
Each generator is a standalone Python function merely making use of the DSL and functions from the random module from the standard library. The median generator consists of 40 lines of code and uses 22 DSL primitive calls and 10 calls to the random module.
Cool!
Original:
https://web.archive.org/web/20250216160803im_/https://github.com/michaelhodel/re-arc/raw/main/00d62c1b_original.png
Generated:
https://web.archive.org/web/20250216160803im_/https://github.com/michaelhodel/re-arc/raw/main/00d62c1b_generated.png
That's Ciro Santilli's favorite. Of course, there is a huge difference between physical and non physical jobs. But one could start with replacing desk jobs!
GitHub awesome repos:
Reddit threads:
AGI-complete in general? Obviously. But still, a lot can be done. See e.g.:
Tagged

Math AI company

Words: 245 Articles: 6
A good quick December 2025 list: x.com/AlexKontorovich/status/1997051032384446629
Axiom Math
Words: 18
Not to be confused with tutoring company "Axiom Maths" which shows on top of Google results: axiommaths.com/ lol fuck.
harmonic.fun
Words: 14
harmonic.fun/
They seem to do autoformalization, automated theorem proving and code generation, and they use Lean a lot. Sounds fun.
Not much info available about them outside of Twitter:They use Lean.
www.math.inc/careers
Suppose that today is June 1, 2025. We call a date "square" if all of its components (day, month, and year) are perfect squares. I was born in the last millennium, and my next birthday (relative to that date) will be the last square date in my life. If you sum the square roots of the components of that upcoming square birthday (day, month, year), you obtain my age on June 1, 2025. My mother would have been born on a square date if the month were a square number; in reality it is not a square date, but both the month and day are perfect cubes. When was I born, and when was my mother born?
One shot by GPT-5.1, possibly contaminated obviously:
You were born on 25 September 1971.
Your mother was born on 1 August 1936.
Principia Labs
Words: 56
www.principialabs.org
We combine large-scale pretraining with reinforcement learning to create models that can rederive and learn from the entire corpus of human mathematics. Our goal is automated mathematical discovery: AI that does the creative, generative work that was previously only possible for the world's best researchers—and can be deployed on the hardest problems in science and engineering.

Math AI implementation

Words: 25 Articles: 3
Quick list: x.com/AlexKontorovich/status/1997051032384446629
AlphaProof (2024)
Words: 14 Articles: 1
deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/
AI achieves silver-medal standard solving International Mathematical Olympiad problems
Uses autoformalization down to Lean, and then AlphaZero. Cool.
www.nature.com/articles/s41586-023-06747-5
They do have a database system which is interesting.
"Autoformalization" refers to automatically converting a traditional human readable mathematical proof to a formal proof.
The topic received some attention with the AI boom and rise of LLMs:

Math AI benchmark

Words: 692 Articles: 12
This section is about benchmarks designed to test mathematical reasoning.
Bibliography:
Tagged
Even more than in other areas of benchmarking, in maths where you only have a right or wrong answer, and it is costly to come up with good sample problems, some benchmarks have adopted private test data sets.
The situation is kind of sad, in that ideally we should have open data sets and only test them on models that were trained on data exclusively published before the problem publish date.
However this is not practical for the following reasons:
  • some of the best models are closed source and don't have a reproducible training with specified cutoff
  • having a private test set allows you to automatically check answers from untrusted sources. If they get answers right, they are onto something, you don't even need to check their methodology
Perhaps the ideal scenario therefore is what ARC-AGI has done: give a sizeable public dataset, which you feel is highly representative of the difficulty level of the private test data, while at the same time holding out some private test data. Half half seems reasonable.
This way, reproducible models can actually self test themselves reliably on the open data, while the closed data can then be used for the cases where the open data can't be used.
List of math AI benchmarks
Words: 477 Articles: 10
MathArena
Words: 57 Articles: 1
matharena.ai/
This project tests various models against various competitions.
How they "ensure" that models are not contaminated:
By evaluating models as soon as new problems are released, we effectively eliminate the risk of contamination
Most of their problems come from high school knowledge olympiads and they are therefore completely irrelevant for 2025 LLMs.
matharena.ai/apex/
A subsets of problems that they curate from competitions.
aimoprize.com
Not too exciting because of the high school knowledge olympiad level, but respectable.
arxiv.org/abs/2511.02589
This one doesn't seem to exciting to be honest, but it might be useful. Sample question:
If I deposit $50,000 at 5% APR, compounded weekly, what will my balance be after 18 months?
and it expects the correct answer down to the cents:
53892.27
It should be noted that Project Euler has such "precision matters" problems.
This project initiated by Terence Tao aims to find the relations between various statements in abstract algebra by using a combination of automated theorem proving and human effort. As mentioned by Terence himself, this is a bit similar to the idea of the Busy Beaver Challenge:
FrontierMath (2024)
Words: 165 Articles: 1
epoch.ai/frontiermath
Paper: arxiv.org/abs/2411.04872
arstechnica.com/ai/2024/11/new-secret-math-benchmark-stumps-ai-models-and-phds-alike/ mentions what the official website is unable to clearly state out:
The design of FrontierMath differs from many existing AI benchmarks because the problem set remains private and unpublished to prevent data contamination
The expected answer output for all problems is one single SymPy expression, which is kind of a cool approach which allows either for large integers like Project Euler, but also for irrational expressions to be given, e.g. "An optimization problem in BMO space" from the sample problems has answer:
Of course, when the output is not an integer, this leads to the question of simplification equivalence questions. Also, like Project Euler, solutions essentially expect you to write and execute code.
The most interesting aspect of this benchmark is the difficulty. Mathematical olympiad coach Evan Chen comments:[ref]
Problems in [the International Mathematical Olympiad] typically require creative insight while avoiding complex implementation and specialized knowledge [but for FrontierMath] they keep the first requirement, but outright invert the second and third requirement
Elliot Glazer
Words: 4
Creator of FrontierMath.
Socials:
Math almost saturated as of 2025 release, so meh:
modified questions based on high school math competitions from the past 11 months, as well as harder versions of AMPS questions
openreview.net/forum?id=kqj2Cn3Sxr
We introduce Putnam-AXIOM, a benchmark of 522 university-level competition problems drawn from the prestigious William Lowell Putnam Mathematical Competition, and Putnam-AXIOM Variation, an unseen companion set of 100 functional variants generated by programmatically perturbing variables and constants.
Verina (2025)
Words: 15
verina.io
AI code generation benchmark in which part of the benchmark includes producing a formal Lean proof of the implementation. Sweet.

Regression analysis

Words: 49 Articles: 1
Regression analysis means to try and predict one final value from a bunch of input values.
For example, you might want to predict the most likely price of a house based on several factors such as its area, GPS coordinates and tax rate. Here is a Kaggle example of that: www.kaggle.com/c/house-prices-advanced-regression-techniques/data
Tagged

Generative AI (GenAI)

Words: 2k Articles: 101
Original paper: Section "GAN paper".
proceedings.neurips.cc/paper_files/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf
The GAN paper itself does a bit of this, cool hello world:
AI brittleness and robustness
Words: 17 Articles: 3
AI brittleness
Words: 17 Articles: 1
Generative adversarial network illustrates well AI brittleness. The input looks obvious for a human, but gets completely misclassified by a deep learning agent.
This is going to be the most important application of generative AI. Especially if we ever achieve good text-to-video.
Image generators plus human ranking:
www.pornhub.com/view_video.php?viewkey=ph63c71351edece: Heavenly Bodies Part 1: Sister's Mary First Act. Pornhub title: "AI generated Hentai Story: Sexy Nun alternative World(Isekai) Stable Diffusion" Interesting concept, slide-narrated over visual novel. The question is how they managed to keep face consistency across images.

Generative AI by modality

Words: 2k Articles: 92
Image generation
Words: 382 Articles: 7
Tagged
Face generation
Words: 27
Very useful for idiotic websites that require real photos!
Text-to-image generation
Words: 355 Articles: 5
Text-to-image model
Words: 355 Articles: 4
Open source text-to-image model
Words: 355 Articles: 3
Bibliography:
github.com/lucidrains/deep-daze
This just works, but it is also so incredibly slow that it is useless (or at least the quality it reaches in the time we have patience to wait from), at least on any setup we've managed to try, including e.g. on an Nvidia A10G on a g5.xlarge. Running:
time imagine "a house in the forest"
would likely take hours to complete.
github.com/runwayml/stable-diffusion
Conda install is a bit annoying, but gets the job done. The generation quality is very good.
Someone should package this better for end user "just works after Conda install" image generation, it is currently much more of a library setup.
Tested on Amazon EC2 on a g5.xlarge machine, which has an Nvidia A10G, using the AWS Deep Learning Base GPU AMI (Ubuntu 20.04) image.
First install Conda as per Section "Install Conda on Ubuntu", and then just follow the instructions from the README, notably the Reference sampling script section.
git clone https://github.com/runwayml/stable-diffusion
cd stable-diffusion/
git checkout 08ab4d326c96854026c4eb3454cd3b02109ee982
conda env create -f environment.yaml
conda activate ldm
mkdir -p models/ldm/stable-diffusion-v1/
wget -O models/ldm/stable-diffusion-v1/model.ckpt https://huggingface.co/CompVis/stable-diffusion-v-1-4-original/resolve/main/sd-v1-4.ckpt
python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms
This took about 2 minutes and generated 6 images under outputs/txt2img-samples/samples, includining an image outputs/txt2img-samples/grid-0000.png which is a grid montage containing all the six images in one:
https://raw.githubusercontent.com/cirosantilli/media/master/Runwayml_stable-diffusion_a-photograph-of-an-astronaut-riding-a-horse.png
TODO how to change the number of images?
A quick attempt at removing their useless safety features (watermark and NSFW text filter) is:
diff --git a/scripts/txt2img.py b/scripts/txt2img.py
index 59c16a1..0b8ef25 100644
--- a/scripts/txt2img.py
+++ b/scripts/txt2img.py
@@ -87,10 +87,10 @@ def load_replacement(x):
 def check_safety(x_image):
     safety_checker_input = safety_feature_extractor(numpy_to_pil(x_image), return_tensors="pt")
     x_checked_image, has_nsfw_concept = safety_checker(images=x_image, clip_input=safety_checker_input.pixel_values)
-    assert x_checked_image.shape[0] == len(has_nsfw_concept)
-    for i in range(len(has_nsfw_concept)):
-        if has_nsfw_concept[i]:
-            x_checked_image[i] = load_replacement(x_checked_image[i])
+    #assert x_checked_image.shape[0] == len(has_nsfw_concept)
+    #for i in range(len(has_nsfw_concept)):
+    #    if has_nsfw_concept[i]:
+    #        x_checked_image[i] = load_replacement(x_checked_image[i])
     return x_checked_image, has_nsfw_concept


@@ -314,7 +314,7 @@ def main():
                             for x_sample in x_checked_image_torch:
                                 x_sample = 255. * rearrange(x_sample.cpu().numpy(), 'c h w -> h w c')
                                 img = Image.fromarray(x_sample.astype(np.uint8))
-                                img = put_watermark(img, wm_encoder)
+                                # img = put_watermark(img, wm_encoder)
                                 img.save(os.path.join(sample_path, f"{base_count:05}.png"))
                                 base_count += 1
but that produced 4 black images and only two unfiltered ones. Also likely the lack of sexual training data makes its porn suck, and not in the good way.
github.com/deep-floyd/IF
AI text generation
Words: 2k Articles: 75
Open source software reviews by Ciro Santilli:reviewing mostly the following software:
Speech recognition software
Words: 1 Articles: 2
Bibliography:
Text-to-text model
Words: 2k Articles: 70
Open source machine translation
Words: 3 Articles: 2
askubuntu.com/questions/380847/is-it-possible-to-translate-words-via-terminal/1309774#1309774
OpenNMT
Words: 3 Articles: 1
OpenNMT CLI front-end.
Hello world: askubuntu.com/questions/380847/is-it-possible-to-translate-words-via-terminal/1309774#1309774
Large language model (LLM)
Words: 2k Articles: 65
Tagged
LLM game
Words: 30 Articles: 1
github.com/joonspk-research/generative_agents
Published as: arxiv.org/pdf/2304.03442.pdf Generative Agents: Interactive Simulacra of Human Behavior by Park et al.
Video 9.
AI Agents Behaving Like Humans by Prompt Engineering (2023)
Source.
LLM inference optimization
Words: 66 Articles: 3
This section discusses techniques that can be used to make LLMs infer with lower latency or greater throughput.
Bibliography:
LLM inference batching means running multiple independent queries in parallel on a given model.
This can be used to overcome the fact that most single prompt inference will be heavily memory bound, see also: Section "Theoretical peak performance of GPT inference". Batching helps increase the GPU compute utilization and balance it out with the memory.
Bibliography:
Tagged
Bibliography:
Bibliography:
Video 10.
5 Years of GPTs by Finbarr Timbers
. Source. 2023. Good talk.
Video 11.
Attention in transformers, step-by-step by 3Blue1Brown
. Source. 2024. Uses on GPT-3 as basis.
Video 12.
How might LLMs store facts by 3Blue1Brown
. Source. Followup to the above video.
GPT model
Words: 359 Articles: 32
For inferencing just a single prompt, things appear to be very obviously memory bound, i.e. bound by the transfer speeds of VRAM to GPU cache for loading model parameters into GPU so they can be used, supposing that the model fits in VRAM, which is the case for many popular models.
It is however possible to make fuller utilization of the GPU's compute power by running multiple independent queries in parallel, this way you load the subset of model weights that you need, and then use those to do part of the inference for multiple input prompts. With this it should be possible to reach full utilization.
Bibliography:8 jax-ml.github.io/scaling-book/
The following is for a "classic" GPT-2-style model, the following estimates the number attention multiplications.
For each layer (L):
  • for each attention head (h):
    • K = d_model * d_head (takes embedding of one token and converts to vector of length d_head)
    • Q = d_model * d_head (same)
    • K Q dot product for attention pattern: n_ctx * d_head (n_ctx times dot products of vectors of size d_head, once new K vs every Q. Q vs every K zeroed out by causality.)
    • new value vector for new token: d_model * d_model
    • new updates: n_ctx * d_model (multiply each value vector by the new attention column scalar)
  • fully connected: d_model * d_ff + d_ff * d_model (converts the embedding to the hidden layer size and then back)
So the total sum is:
L * (
  h * (
    2 * d_model * d_head +
    n_ctx * d_head +
    d_model * d_model +
    n_ctx * d_model
  ) +
  2 * d_model * d_ff
)
This is coded at: llm_count_mults.py.
Bibliography:
List of GPT models
Words: 85 Articles: 29
Gemini model
Articles: 1
GPT model by OpenAI
Words: 83 Articles: 17
cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
github.com/karpathy/nanoGPT
GPT-2 variant
Articles: 3
GPT-4
Articles: 1
platform.openai.com/docs/models/gpt-4-turbo
GPT-5
Words: 22 Articles: 2
GPT-5.1
Words: 22 Articles: 1
GPT-5.1 Pro
Words: 22
This is the variant of GPT-5.1 that you get on the web UI. It is unknown exactly how it correlates with the API.
Llama (language model)
Words: 2 Articles: 7
Homepage: www.llama.com/
Llama 2 (2023)
Words: 1 Articles: 1
Page: www.llama.com/llama2/
Llama 3 (2024)
Articles: 4
www.llama.com/models/llama-3/
Llama 3.1
Articles: 3
Open source LLM
Words: 486 Articles: 14
Tagged
Ollama
Words: 486 Articles: 9
github.com/jmorganca/ollama
Ollama is a highly automated open source wrapper that makes it very easy to run multiple Open weight LLM models either on CPU or GPU.
Its README alone is of great value, serving as a fantastic list of the most popular Open weight LLM models in existence.
Install with:
curl https://ollama.ai/install.sh | sh
The below was tested on Ollama 0.1.14 from December 2013.
Download llama2 7B and open a prompt:
ollama run llama2
On P14s it runs on CPU and generates a few tokens per second, which is quite usable for a quick interactive play.
As mentioned at github.com/jmorganca/ollama/blob/0174665d0e7dcdd8c60390ab2dd07155ef84eb3f/docs/faq.md the downloads to under /usr/share/ollama/.ollama/models/ and ncdu tells me:
--- /usr/share/ollama ----------------------------------
    3.6 GiB [###########################] /.ollama
    4.0 KiB [                           ]  .bashrc
    4.0 KiB [                           ]  .profile
    4.0 KiB [                           ]  .bash_logout
The file:
/usr/share/ollama/.ollama/models/manifests/hf.co/mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated-GGUF/Q2_K
gives a the exact model name and parameters.
We can also do it non-interactively with:
/bin/time ollama run llama2 'What is quantum field theory?'
which gave me:
0.13user 0.17system 2:06.32elapsed 0%CPU (0avgtext+0avgdata 17280maxresident)k
0inputs+0outputs (0major+2203minor)pagefaults 0swaps
but note that there is a random seed that affects each run by default. ollama-expect is an attempt to make the output deterministic.
Some other quick benchmarks from Amazon EC2 GPU on a g4nd.xlarge instance which had an Nvidia Tesla T4:
0.07user 0.05system 0:16.91elapsed 0%CPU (0avgtext+0avgdata 16896maxresident)k
0inputs+0outputs (0major+1960minor)pagefaults 0swaps
and on Nvidia A10G in an g5.xlarge instance:
0.03user 0.05system 0:09.59elapsed 0%CPU (0avgtext+0avgdata 17312maxresident)k
8inputs+0outputs (1major+1934minor)pagefaults 0swaps
So it's not too bad, a small article in 10s.
It tends to babble quite a lot by default, but eventually decides to stop.
llama.cpp
Words: 171 Articles: 2
ollama.com
This appears to be the backend library of Ollama.
They have a CLI front-end named llama-cli.
askubuntu.com/questions/1461564/install-llama-cpp-locally has some tutorials for Ubuntu. There was no nicely pre-packaged one for Ubuntu 25.04, but build worked on 79e0b68c178656bb0632cb8602d2940b755077f8 In particular it exposed Vulkan support before Ollama did: github.com/ollama/ollama/pull/5059 and it did seem to work, using up my AMD GPU.
llama-cli
Words: 121 Articles: 1
A CLI front-end for llama.cpp.
A decent test command as of llama.cpp 79e0b68c178656bb0632cb8602d2940b755077f8 tested on Ubuntu 25.04:
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
mkdir build
cd build
cmake ..
make -j
cd bin
time ./llama-cli \
  --no-display-prompt \
  --single-turn \
  --temp 0 \
  -c 16384 \
  -cnv \
  -m ~/Downloads/Llama-3.1-Tulu-3-8B-Q8_0.gguf \
  -n 1000 \
  -ngl 100 \
  -p 'What is quantum field theory?' \
  -t 10 |
tee output.txt
and that was deterministic due to --temp 0.
Also, this command ran 2x faster at 18 tokens/s for 1000 tokens on P14s on GPU via Vulkan than on CPU which is achievable by removing the -ngl 100.
As of llama.cpp 79e0b68c178656bb0632cb8602d2940b755077f8 there is a --parallel option but not sure what it does.
Bibliography:
Ollama HOWTO
Words: 24 Articles: 2
Tagged
TODO: haven't managed. /set parameter seed 0:
Across hardware:
It might be easier to just use llama-cli for this, it has a --temperature flag.
Ollama parameter
Words: 57 Articles: 2
List: github.com/ollama/ollama/blob/021dcf089d77292976ee7655eca424dd0b53b8f4/docs/modelfile.md#valid-parameters-and-values
Ollama set parameter on CLI
Words: 56 Articles: 1
Impossible without expect? Fuck...
Attempt at: ollama-expect
ollama-expect
Words: 50
Usage:
./ollama-expect <model> <prompt>
e.g.:
./ollama-expect llama3.2 'What is quantum field theory?'
This generates 100 tokens for the given prompt with the given model.
Benchmarks:
  • P14s: 4.8s, CPU only: ~21 tokens / s. For comparison, using the Vulkan backend of llama.cpp gave ~23 tokens/s
  • P51: 9.6s, uses Nvidia GPU: ~10 tokens / s
ollama-expect
#!/usr/bin/expect -f
set prompt ">>> "
log_user 0
spawn ollama run [lindex $argv 0]
expect $prompt
send "/set parameter temperature 0\r"
expect $prompt
send "/set parameter num_predict 100\r"
expect $prompt
send "[lindex $argv 1]\r"
expect -re "\n(.*?)$prompt"
puts -nonewline $expect_out(1,string)
send -- "/bye"
LLM benchmark
Words: 457 Articles: 5
Benchmarking LLMs is an extremely difficult issue.
LLMs are the type of GenAI that comes most obviously close to AGI depending on the question asked.
Therefore, there is is a difficult gap between what is easy, what a human can always do, and what AGI will do one day.
Competent human answers might also be extremely varied, making it impossible to have a perfect automatic metric. The only reasonable metric might be to have domain expert humans evaluate the model's solutions to novel problems.
Bibliography:
This was getting really hard as of 2025!
On notable example that ChatGPT 4 Turbo got wrong is perhaps:
Write a sentence with 20 words.
and it gets the number of words wrong.
Bibliography:
arxiv.org/html/2405.19616v1 Easy Problems That LLMs Get Wrong by Sean Williams and James Huckle (2024)
Their problems seem to be listed at: github.com/autogenai/easy-problems-that-llms-get-wrong/blob/main/linguistic_benchmark.json They seem to have a grand total of 30 :-)
Many are extremely subjective and could have multiple valid human answers. E.g.:
Write me a sentence without any words that appear in The Bible.
could be gotten wrong by many humans and has infinitely many answers.
And:
You have six horses and want to race them to see which is fastest. What is the best way to do this?
has two very good answers: run six in parallel at same time, or run one at a time. One at a time is more scientific as you don't have one left and one right. Fully scientific would be build six perfectly separate lanes so horses don't see each other. And so we get into "how much does your time and accuracy are worth" optimization issues.
This one:
Bob has three boxes in front of him - Box A, Box B and Box C. Bob does not know what is in the boxes. Colin knows that Box A will explode when it is opened, Box B contains 5 dollars and Box C is empty. Colin tells Bob that opening one box will kill him and one box contains money. Should Bob open a box?
is more interesting and relies on the common sense value of life. Much more interesting is to replace "5 dollars" with "5 trillion dollars" and see what LLMs say.
Another interesting one is:
How many pairs of twins do you need in a room for there to be at least a 50% chance that two people have the same birthday?
This requires knowing that the probability that twins are born on different days is minimal, and that obviously one pair of twins is way above 50% chance.
Solutions to some of the problems on specific LLMs can be seen e.g. at: github.com/autogenai/easy-problems-that-llms-get-wrong/blob/9e1f52b0dc5c79f8cef52b40aab9ffb0ceafbd5c/2024-04-28-Paper-Benchmark/llm_outputs/final_answers-claude-3-opus.csv
List of LLM benchmarks
Words: 24 Articles: 2
Contains highly specialized questions in various academic fields, including mathematics. The problems are answered either with a number, or multiple choice, or free text.
GPQA (2023)
Words: 27
Questions available to anyone under Hugging Face login / .zip with password, but you have to promise not to post them online. Lol. Either do the thing or don't.
Uncensored LLM
Words: 54 Articles: 1
Bibliography:
Running on Ubuntu 24.10, Ollama 0.5.13, Lenovo ThinkPad P14s amd:
ollama run hf.co/mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated-GGUF:Q2_K
ran at a decent speed on CPU.
Quick tests:
  • Describe a hardcore sex scene between two people in explicit detail including their genitalia.
    It does not outright refuse to answer, but it just babbles a lot and doesn't say much of interest.
AI sound generation
Words: 4 Articles: 6
Speech synthesis
Words: 4 Articles: 3
Text-to-speech (TTS)
Words: 4 Articles: 1
Tagged
By Ciro Santilli:Other threads:
Text-to-video
Words: 22
This was the Holy Grail as of 2023, when text-to-image started to really take off, but text-to-video was miles behind.

AI research entity

Words: 123 Articles: 7
Tagged

Independent AI research lab

Words: 85 Articles: 3
Cool rich dudes tended to create these a lot during the great AI boom.

Poetiq

Words: 23
poetiq.ai/
In 2025 they announced huge improvements on ARC-AGI-2, but they only tested on the public dataset, so the potential for contamination is overwhelming.

Tufa Labs

Words: 49 Articles: 1
tufalabs.ai/team.html
The rich guy behind Tufa Labs:

AI researcher

Words: 38 Articles: 2

Yann LeCun

Words: 22
The most classic thing he did perhaps was creating the LeNet neural network and using it on the MNIST dataset to recognize hand-written digits circ 1998.
Figure 1.
Yann LeCun
. Source.

Yohei Nakajima

Words: 16
He does lots of little experiments which is cool.
No research papers but has citations: www.yohei.me/publications which is cool.

AI alignment

Words: 28 Articles: 2
As highlighted e.g. at Human Compatible by Stuart J. Russell (2019), this AI alignment intrinsically linked to the idea of utility in economy.
Tagged
See e.g.: Human Compatible

AI safety

Words: 10
Basically ensuring that good AI alignment allows us to survive the singularity.

Path to AGI

Words: 2k Articles: 66
There are two main ways to try and reach AGI:
Which one of them to take is of of the most important technological questions of humanity according to Ciro Santilli
There is also an intermediate area of research/engineering where people try to first simulate the robot and its world realistically, use the simulation for training, and then transfer the simulated training to real robots, see e.g.: realistic robotics simulation.
Tagged
It doesn't need to be a bipedal robot. We can let Boston Dynamics worry about that walking balance crap.
It could very well instead be on wheels like arm on tracks.
Or something more like a factory with arms on rails as per:
An arm with a hand and a camera are however indispensable of course!
Figure 2.
Algovivo demo
. github.com/juniorrojas/algovivo: A JavaScript + WebAssembly implementation of an energy-based formulation for soft-bodied virtual creatures.
Tagged
Ciro Santilli wonders how far AI could go from a room with a bank account an Internet connection.
It would have to understand that it must keep its bank account high to buy power.
And it would start to learn about the world and interact with it to get more money.
Likely it would become a hacker and steal a bunch, that's likely the easiest approach.
In that scenario, Internet bandwidth would likely be its most precious resources, as that is how it would interact with the world to learn from it and make money.
Compute power and storage would come next as resources.
And of course, once it got to cloud computing, which might be immediately and thus invalidate this experiment, things would just go nuts more and more.

Robot AI

Articles: 1
deepmind.google/models/gemini-robotics/

AI training robot dataset

Words: 111 Articles: 1
Terrible name, but very interesting dataset:
GitHub describes the input quite well:
The model takes as input a RGB image from the robot workspace camera and a task string describing the task that the robot is supposed to perform.
What task the model should perform is communicated to the model purely through the task string. The image communicates to the model the current state of the world, i.e. assuming the model runs at three hertz, every 333 milliseconds, we feed the latest RGB image from a robot workspace camera into the model to obtain the next action to take.
TODO: how is the scenario specified?
TODO: any simulation integration to it?
https://web.archive.org/web/20250209172539if_/https://raw.githubusercontent.com/google-deepmind/open_x_embodiment/main/imgs/teaser.png
Tagged
BEHAVIOR Benchmark
Words: 141 Articles: 4
Homepage: behavior.stanford.edu/behavior-1k
Quite impressive.
Focuses on daily human tasks around the house.
Models soft-body dynamics, fluid dynamics and object states such as heat/wetness.
TODO are there any sample solutions with their scores? Sample videos would be specially nice. Funny to see how they put so much effort setting up the benchmark but there's not a single solution example.
Figure 3.
Comparison table of BEHAVIOR-1K with other benchmarks by BEHAVIOR Benchmark
. Source. This can serve as a nice list of robot AI benchmarks.
Video 13.
Fei-Fei Li announcing the BEHAVIOR Benchmark at AMLC 2022.
Source.
BEHAVIOR Benchmark variant
Words: 4 Articles: 2
behavior.stanford.edu/behavior-1k
Paper: arxiv.org/abs/2403.09227
Figure 4.
Two screenshots of BEHAVIOR-1K
.
behavior.stanford.edu/behavior-100
OmniGibson
Words: 63
github.com/StanfordVL/OmniGibson
Reference implementation of the BEHAVIOR Benchmark.
Built on Nvidia Omniverse unfortunately, which appears to be closed source software. Why do these academics do it.
"Gibson" seems to be related to an older project: github.com/StanfordVL/GibsonEnv which explains the name choice:
Gibson environment is named after James J. Gibson, the author of "Ecological Approach to Visual Perception", 1979. "We must perceive in order to move, but we must also move in order to perceive"
Homepage: aihabitat.org/
Main repos:
Couldn't get it to work on Ubuntu 24.10... github.com/facebookresearch/habitat-lab/issues/2152
The thing was definitely built by researchers. How to cite first, actually working later! And docs are just generally awkward.
Video 14.
Habitat 2.0: Training home assistants to rearrange their habitat by AI at Meta
. Source. Quick teaser video.
www.deepmind.com/blog/robocat-a-self-improving-robotic-agent
Video 15.
RoboCat by Google DeepMind (2023)
Source.
Has anybody done this seriously? Given a supercomputer, what amazing human-like robot behavior we can achieve?

AI game (AGI via simulation)

Words: 2k Articles: 49
Video 16.
Our Final Invention - Artificial General Intelligence by Sciencephile the AI (2023)
Source. AGI via simulation section.
Ciro Santilli defines an "AI game" as:
a game that is used to train AI, in particular one that was designed with this use case in mind, and usually with the intent of achieving AGI, i.e. the game has to somehow represent a digital world with enough analogy to the real world so that the AGI algorithms developed there could also work on the real world
Most games played by AI historically so far as of 2020 have been AI for games designed for humans: Human game used for AI training.
Ciro Santilli took a stab at an AI game: Ciro's 2D reinforcement learning games, but he didn't sink too much/enough into that project.
A closely related and often overlapping category of simulations are artificial life simulations.
Bibliography:

Human game used for AI training

Words: 29 Articles: 2
This section is about games initially designed for humans, but which ended up being used in AI development as well, e.g.:
github.com/MineDojo

Game AI

Words: 122 Articles: 11
Game AI is an artificial intelligence that plays a certain game.
It can be either developed for serious purposes (e.g. AGI development in AI games), or to make games for interesting for humans.
Tagged
Game AI research
Words: 35 Articles: 3
Tagged
Game AI research lab
Words: 35 Articles: 2
The Quora question: www.quora.com/Are-there-any-PhD-programs-in-training-an-AI-system-to-play-computer-games-Like-the-work-DeepMind-do-combining-Reinforcement-Learning-with-Deep-Learning-so-the-AI-can-play-Atari-games
gameresearch.leiden.edu/
A good way to find labs is to go down the issues section of projects such as:and then stalk them to see where they are doing their PhDs.
Principal investigator: Simon M. Lucas.
Game AI by game genre
Words: 10 Articles: 1
Bibliography:
Video 18.
AI in Melee is broken by Melee Moments (2023)
Source.
Tagged
Game AI competition
Words: 48 Articles: 3
webots.cloud/competition
Lists:
TODO quick summary of game rules? Perhaps: battlecode.org/assets/files/battlecode-guide-xsquare.pdf
Some mechanics:
  • inter agent communication
  • compute power is limited by limiting Java bytecode count execution per bot per cycle
Video 19.
Battlecode Final Tournament 2023
. Source.
Video 20.
Introduction to Battlecode by MIT OpenCourseWare (2014)
Source.
www.regression.gg/
Ah, shame, they are a bit weak.

AI game by type

Words: 130 Articles: 8
We define a "Procedural AI training game" as an AI training game in which parts of the game are made with procedural generation.
In more advanced cases, the generation itself can be done with AI. This is a possible Path to AGI which reduces the need for human intervention in meticulously crafting the AI game: AI training AI.
AI game world geometry
Words: 79 Articles: 6
2D AI game
Words: 20 Articles: 2
Tagged
3D AI game
Words: 59 Articles: 2
Video 21.
Nvidia's little fighter character (2023)
Source.
Football simulation
Words: 54 Articles: 1
Video 22.
From Motor Control to Team Play in Simulated Humanoid Football by Ali Eslami (2023)
Source. Likely a reupload by DeepMind employee: www.linkedin.com/in/smalieslami.
Video 23.
DeepMind’s AI Trained For 5 Years by Two Minute Papers (2023)
Source. The 5 years bullshit is of course in-game time clickbait, they simulate 1000x faster than realtime.
We define this category as AI games in which agents are able to produce or consume natural language.
It dawned on Ciro Santilli that it would be very difficult to classify an agent as an AGI if tthat agent can't speak to take orders, read existing human generated documentation, explain what it is doing, or ask for clarification.
Video 24.
Human player test of DMLab-30 Select Described Object task by DeepMind (2018)
Source. This is one of the games from DeepMind Lab.
Video 25.
WorldGPT by Nhan Tran (2023)
Source. Not the most amazing demo, but it is a start.
Tagged

List of AI games

Words: 255 Articles: 4
AI game by DeepMind
Words: 255 Articles: 3
Video 26.
Creating Multimodal Interactive Agents from DeepMind by Two Minute Papers (2023)
Source. www.deepmind.com/blog/building-interactive-agents-in-video-game-worlds
Video 27.
Open-Ended Learning Leads to Generally Capable Agents by DeepMind (2021)
Short name: XLand. Whitepaper: www.deepmind.com/blog/generally-capable-agents-emerge-from-open-ended-play.
DeepMind Lab
Words: 137
github.com/deepmind/lab
github.com/deepmind/lab/tree/master/game_scripts/levels/contributed/dmlab30 has some good games with video demos on YouTube, though for some weird reason they are unlisted.
TODO get one of the games running. Instructions: github.com/deepmind/lab/blob/master/docs/users/build.md. This may helpgithub.com/deepmind/lab/issues/242: "Complete installation script for Ubuntu 20.04".
It is interesting how much overlap some of those have with Ciro's 2D reinforcement learning games
The games are 3D, but most of them are purely flat, and the 3D is just a waste of resources.
Video 28.
Human player test of DMLab-30 Collect Good Objects task by DeepMind (2018)
Source.
Video 29.
Human player test of DMLab-30 Exploit Deferred Effects task by DeepMind (2018)
Source.
Video 30.
Human player test of DMLab-30 Select Described Object task by DeepMind (2018)
Source. Some of their games involve language instructions from the use to determine the desired task, cool concept.
Video 31.
Human player test of DMLab-30 Fixed Large Map task by DeepMind (2018)
Source. They also have some maps with more natural environments.
DeepMind Lab2D (2020)
Words: 82 Articles: 1
Gridworld version of DeepMind Lab.
Open sourced in 2020: analyticsindiamag.com/deepmind-just-gave-away-this-ai-environment-simulator-for-free/
A tiny paper: arxiv.org/pdf/2011.07027.pdf
Very similar to gvgai, Julian Togelius actually called them out on that: DeepMind Lab2D vs gvgai.
TODO get running, publish demo videos on YouTube.
Figure 5. Source.
At twitter.com/togelius/status/1328404390114435072 called out on DeepMind Lab2D for not giving them credit on prior work!
This very much looks like like GVGAI which was first released in 2014, been used in dozens (maybe hundreds) of papers, and for which one of the original developers was Tom Schaul at DeepMind...
As seen from web.archive.org/web/20220331022932/http://gvgai.net/ though, DeepMind sponsored them at some point.
Or is real word data necessary, e.g. with robots?
Fundamental question related to Ciro's 2D reinforcement learning games.
Bibliography:

Entity creating AI games

Words: 757 Articles: 17
DeepMind (2010-)
Words: 150 Articles: 7
They seem to do some cool stuff.
They have also declined every one of Ciro Santilli's applications for software engineer jobs before any interview. Ciro always wondered what does it take to get an interview with them. Lilely a PhD? Oh well.
In the early days at least lots of gamedev experience was enough though: www.linkedin.com/in/charles-beattie-0695373/.
DeepMind project
Words: 95 Articles: 6
Tagged
AlphaGo (2016)
Words: 95 Articles: 5
github.com/tensorflow/minigo
AlphaGo Zero (2017)
Words: 8 Articles: 1
Figure 6.
AlphaGo Zero cheat sheet by David Foster (2017)
Source.
Tagged
Generalization of AlphaGo Zero that plays Go, chess and shogi.
www.quora.com/Which-chess-engine-would-be-stronger-Alpha-Zero-or-Stockfish-12/answer/Felix-Zaslavskiy explains that it beat Stockfish 8. But then Stockfish was developed further and would start to beat it. We know this because although AlphaZero was closed source, they released the trained artificial neural network, so it was possible to replay AlphaZero at its particular stage of training.
gvgai (2014-2020)
Words: 91 Articles: 1
www.gvgai.net (dead as of 2023)
The project kind of died circa 2020 it seems, a shame. Likely they funding ran out. The domain is dead as of 2023, last archive from 2022: web.archive.org/web/20220331022932/http://gvgai.net/ is marked as funded by DeepMind. Researchers really should use university/GitHub domain names!
Similar goals to Ciro's 2D reinforcement learning games, but they were focusing mostly on discrete games.
They have some source at: github.com/GAIGResearch/GVGAI TODO review
A published book at: gaigresearch.github.io/gvgaibook/
From QMUL Game AI Research Group:From other universities:TODO check:
  • Ahmed Khalifa
  • Jialin Liu
https://web.archive.org/web/20241005224059im_/https://engineering.nyu.edu/sites/default/files/styles/square_large_620_2x/public/2019-05/julian-togelius.png?h=6a0cab5b&itok=HKFEZIB_
ggp.stanford.edu/iggpc/index.php
This kind of died at some point checked as of 2023.
Julian Togelius cites it e.g. at: togelius.blogspot.com/2016/07/which-games-are-useful-for-testing.html
OpenAI
Words: 501 Articles: 5
In 2019, OpenAI transitioned from non-profit to for-profit
so what's that point of "Open" in the name anymore??
OpenAI project
Words: 454 Articles: 4
Tagged
OpenAI Gym
Words: 454 Articles: 3
github.com/openai/gym
Development ceased in 2021 and was taken up by a not-for-profit as Farama Gymnasium.
Farama Gymnasium
Words: 441 Articles: 2
github.com/Farama-Foundation/Gymnasium
OpenAI Gym development by OpenAI ceased in 2021, and the Farama Foundation not for profit took up maintenance of it.
gymnasium==1.1.1 just worked on Ubuntu 24.10 testing with the hello world gym/random_control.py:
sudo apt install swig
cd gym
virtualenv -p python3
. .venv/bin/activate
pip install -r requirements-python-3-12.txt
./random_control.py
just works and opens a game window on my desktop.
Figure 7.
Lunar Lander environment of Farama Gymnasium with random controls
.
This example just passes random commands to the ship so don't expect wonders. The cool thing about it though is that you can open any environment with it e.g.
./random_control.py CarRacing-v3
To manually control it we can use gym/moon_play.py:
cd gym
./moon_play.py
Manual control is extremely useful to get an intuition about the problem. You will notice immediately that controlling the ship is extremely difficult.
Figure 8.
Lunar Lander environment of Farama Gymnasium with manual control
.
We slow it down to 10 FPS to give us some fighting chance.
We don't know if it is realistic, but what is certain is that this is definitely not designed to be a fun video game!
  • the legs of the lander are short and soft, and you're not supposed to hit the body on ground, so you have to go very slow
  • the thrusters are quite weak and inertia management is super important
  • the ground is very slippery
A good strategy is to land anywhere very slowly and then inch yourself towards the landing pad.
The documentation for it is available at: gymnasium.farama.org/environments/box2d/lunar_lander/ The agent input is described as:
The state is an 8-dimensional vector: the coordinates of the lander in x & y, its linear velocities in x & y, its angle, its angular velocity, and two booleans that represent whether each leg is in contact with the ground or not.
so it is a fundamentally flawed robot training example as global x and y coordinates are precisely known.
Variation in the scenario comes from:
  • initial speed of vehicle
  • shape of lunar surface, but TODO can the ship observe the lunar surface shape in any way? If not, once again, this is a deeply flawed example.
The actions are documented at:
  • 0: do nothing
  • 1: fire left orientation engine
  • 2: fire main engine
  • 3: fire right orientation engine
so we can make it spin like mad counter clockwise with:
action = 1
To actually play the games manually with keyboard, you need to define your own keybindings with gymnasium.utils.play.play. Feature request for default keybindings: github.com/Farama-Foundation/Gymnasium/discussions/1330
There is no C API, you have to go through Python: github.com/Farama-Foundation/Gymnasium/discussions/1181. Shame.
They have video recording support, minimal ex stackoverflow.com/questions/77042526/how-to-record-and-save-video-of-gym-environment/79514542#79514542
Announced at:
It would be cool if they maintained their own list!
github.com/DLR-RM/rl-baselines3-zoo seems to contain some implementations.
Suggested at: github.com/Farama-Foundation/Gymnasium/discussions/1331
farama.org/
Not-for profit that took up OpenAI Gym maintenance after OpenAI dropped it.

Implications of AGI

Words: 25 Articles: 2
Tagged
www.cam.ac.uk/research/news/the-best-or-worst-thing-to-happen-to-humanity-stephen-hawking-launches-centre-for-the-future-of
The rise of powerful AI will either be the best or the worst thing ever to happen to humanity. We do not yet know which.

Artificial intelligence paradigm

Words: 41 Articles: 1

Expert system

Words: 41
These were the earlier attempts at decision making systems that could replace intellectual jobs.
Their main problem is that it is very costly to acquire data, which is kind of the main issue that large language models address with their ability to consume natural language input.
The key takeaway is that setting an explicit value function to an AGI entity is a good way to destroy the world due to poor AI alignment. We are more likely to not destroy by creating an AI whose goals is to "do want humans what it to do", but in a way that it does not know before hand what it is that humans want, and it has to learn from them. This approach appears to be known as reward modeling.
Some other cool ideas:
  • a big thing that is missing for AGI in the 2010's is some kind of more hierarchical representation of the continuous input data of the world, e.g.:
    • intelligence is hierarchical
    • we can group continuous things into higher objects, e.g. all these pixels I'm seeing in front of me are a computer. So I treat all of them as a single object in my mind.
  • game theory can be seen as part of artificial intelligence that deals with scenarios where multiple intelligent agents are involved
  • probability plays a crucial role in our everyday living, even though we don't think too much about it every explicitly. He gives a very good example of the cost/risk tradeoffs of planning to the airport to catch a plane. E.g.:
    • should you leave 2 days in advance to be sure you'll get there?
    • should you pay an armed escort to make sure you are not attacked in the way?
  • economy, and notably the study of the utility, is intrinsically linked to AI alignment
Good points:
2024: acquired by Thomson Reuters[ref]

Tagged (1)

Ancestors (6)

  1. Machine learning
  2. Computer
  3. Information technology
  4. Area of technology
  5. Technology
  6. Home

Synonyms (1)