The main reason Ciro Santilli never touched it is that it feels that every public data set has already been fully mined or has already had the most interesting algorithms developed for it, so you can't do much outside of big companies.

This is why Ciro started Ciro's 2D reinforcement learning games to generate synthetic data and thus reduce the cost of data.

The other reason is that it is ugly.

Given a bunch of points in $n$ dimensions, PCA maps those points to a new $p$ dimensional space with $p≤n$.

$p$ is a hyperparameter, $p=1$ and $p=2$ are common choices when doing dataset exploration, as they can be easily visualized on a planar plot.

The mapping is done by projecting all points to a $p$ dimensional hyperplane. PCA is an algorithm for choosing this hyperplane and the coordinate system within this hyperplane.

The hyperplane choice is done as follows:

- the hyperplane will have origin at the mean point
- the first axis is picked along the direction of greatest variance, i.e. where points are the most spread out.Intuitively, if we pick an axis of small variation, that would be bad, because all the points are very close to one another on that axis, so it doesn't contain as much information that helps us differentiate the points.
- then we pick a second axis, orthogonal to the first one, and on the direction of second largest variance
- and so on until $p$ orthogonal axes are taken

www.sartorius.com/en/knowledge/science-snippets/what-is-principal-component-analysis-pca-and-how-it-is-used-507186 provides an OK-ish example with a concrete context. In there, we each point is a country, and the input data is the consumption of different kinds of foods per year, e.g.:so in this example, we would have input points in 4D.

- flour
- dry codfish
- olive oil
- sausage

Suppose that every country consumes the same amount of flour every year. Then, that number doesn't tell us much about which country each point represents (has the least variance), and the first PCA axes would basically never point anywhere near that direction.

Another cool thing is that PCA seems to automatically account for linear dependencies in the data, so it skips selecting highly correlated axes multiple times. For example, suppose that dry codfish and olive oil consumption are very high in Portugal and Spain, but very low in Germany and Poland. Therefore, the variation is very high in those two parameters, and contains a lot of information.

However, suppose that dry codfish consumption is also directly proportional to olive oil consumption. Because of this, it would be kind of wasteful if we selected:since the information about codfish already tells us the olive oil. PCA apparently recognizes this, and instead picks the first axis at a 45 degree angle to both dry codfish and olive oil, and then moves on to something else for the second axis.

- dry codfish as the first axis
- olive oil as the second axis

A parameter that you choose which determines how the algorithm will perform. In particular, it is not part of the training data set.

An impossible AI complete dream.

It is impossible to understand speech, and take meaningful actions from it, if you don't understand what is being talked about.

And without doubt, "understanding what is being talked about" comes down to understanding (efficiently representing) the geometry of the 3D world with a time component.

Not from hearing sounds alone.

- analyticsindiamag.com/5-open-source-recommender-systems-you-should-try-for-your-next-project/ 5 Open-Source Recommender Systems You Should Try For Your Next Project (2019)

One of the most simply classification algorithm one can think of: just see whatever kind of point your new point seems to be closer to, and say it is also of that type! Then it is just a question of defining "close".

This is the first thing you have to know about supervised learning:Both of those already have hardware acceleration available as of the 2010s.

- training is when you learn model parameters from input. This literally means learning the best value we can for a bunch of number input numbers of the model. This can easily be on the hundreds of thousands.
- inference is when we take a trained model (i.e. with the parameters determined), and apply it to new inputs

An IBM made/pushed term, but that matches Ciro Santilli's general view of how we should move forward AGI.

Ciro's motivation/push for this can be seen e.g. at: Ciro's 2D reinforcement learning games.

Very useful for idiotic websites that require real photos!

- thispersondoesnotexist.com/ holy fuck, the images are so photorealistic, that when there's a slight fail, it is really, really scary