Ciro Santilli $£ Sponsor €¥ 中国独裁统治 China Dictatorship 新疆改造中心、六四事件、法轮功、郝海东、709大抓捕、2015巴拿马文件 邓家贵、低端人口、西藏骚乱
It is quite cool that photosynthesis works just like cellular respiration by producing a proton potential through chemiosmoses.
It is important to note that due to horizontal gene transfer, the early days of life, and still bacteria to this day due to bacterial conjugation, are actually a graph and not a tree, see also: Figure "Graph of life".
Definitely have a look at: coral of life representations.
TODO vs Phylogenetic tree?
Cladograms and phylogenetic trees are functionally very similar, but they show different things. Cladograms do not indicate time or the amount of difference between groups, whereas phylogenetic trees often indicate time spans between branching points.
Figure 1. Coral of life by János Podani (2019) Source. Fantastic work!!! Some cool things we can easily see:
Interesting fractal approach to a phylogenetic tree:
Mostly data driven.
Basically the same as clade.
All non-clade groups are evil. All non-clade terms must be forgotten. Some notable ones:
When a characteristic is basal, it basically means the opposite of it being polyphyletic.
E.g. monotremes laying eggs did not evolve separately after function loss, it comes directly from reptiles.
Kind of the opposite of a basal group.
Basically mean that parallel evolution happened. Some cool ones:
The cool thing about parallel evolution is that it shows how complex phenotype can evolve from very different initial genetic conditions, highlighting the great power of evolution.
We list some cool ones at: polyphyly.
Naming taxonomic ranks like genus, domain, etc. is a fucking waste of time, only useful before we developed molecular biology.
All that matters is the tree of clades with examples of species in each clade, and common characteristics shared by the clade.
And with molecular biology, we can build those trees incredibly well for extant species. When extinct species are involved however, things get more complicated.
There's six to eight in different systems of the end of the 20th century:
There's about 60 of them.
Video 1. Do Bacteria Need Oxygen? by Microscope Project (2022) Source. Shows how (persumed) aerobic bacteria flock towards an air bublle in water.
Video 2. Where is Anatomy Encoded in Living Systems? by Michael Levin (2022) Source.
  • we are very far from full understanding. End game is a design system where you draw the body and it compiles the DNA for you.
  • some cool mentions of regeneration
How genes form bodies.
Video 3. Developmental Genetics 1 by Joseph Ross (2020) Source. Talks about homeobox genes.
This is hot shit, a possible worst case but sure to get there scenario to understand the brain!
It is quite mind blowing when you think about it, that the huge majority of your body's cells is essentially just there to support a tiny ammount of germline, which are the only cells that can actually pass on! It is fun to imagine the cell type tree for this, with a huge branching of somatic cells, and only a few germline going forward.
One of the simplest known seems to be: "The simplest multicellular organism unveiled" from 2013 mentions Tetrabaena socialis.
Then of course: Caenorhabditis elegans is a relatively simple and widely studied model organism.
Video 4. Nicole King (UC Berkeley, HHMI) 1: The origin of animal multicellularity by iBiology (2015) Source.
It is hard to distinguish between colonies of unicellular organism and multicellular organism as there is a continuum between both depending on how well integrated they cells are.
From Wikipedia:
Multicellularity has evolved independently at least 25 times in eukaryotes
Complex multicellular organisms evolved only in six eukaryotic groups: animals, symbiomycotan fungi, brown algae, red algae, green algae, and land plants.
Not a clade, and therefore a term better forgotten!
A clade name for arkarya is a proposed clade name for archaea plus eukarya.
It just has RNA that can be transcribed directly by the host ribosome. mentions that they get their lipid layer from the Golgi complex of the host, where they replicate. The Coronavirus Replication Cycle by Kevin Tokoph (2020)
COVID happens in two stages:
  • viral infection
  • inflammatory phase, where the body takes over, and sometimes harms itself. It seems that people are not generally contagious at this point?
This distinction is one of the reasons why separating the virus name (SARS-CoV-2) from the disease makes sense: the disease is much broader than the viral infection.
Why is it there such a clear separation of phases?
Why do people with mild symptoms go on to die? It is a great mystery.
Ciro Santilli's theory is that COVID is extremely effective at avoiding immune response. Then, in people where this is effective, things reach a point where there is so much virus, that the body notices and moves on to take a more drastic approach. This is compatible with the virus killing older people more, as they have weaker immunes systems. This is however incompatible with the fact that people don't seem to be contagious after the viral phase is over...
There are a few possibilities:
Genes at: TODO protein list on a database?
50-200 nanometers in diameter.
SARS-CoV-2 cell entry
words: 24 COVID-19 Symposium: Entry of Coronavirus into Cells | Dr. Paul Bates
Interaction points:
Video 5. Model of Membrane Fusion by SARS CoV-2 Spike Protein by clarafi (2020) Source.
Some are named after the encoded protein. Others that are not as clean are just orfXXX for open reading frame XXX.
Largest gene, polyprotein that contains SARS-CoV-2 non-structural proteins 1 to 11.
Nucleocapsid phosphoprotein, sticks to the RNA inside. mentions functions:
  • helps pack the viral RNA into the capsule
  • also has a side function in immune suppression
These are also required for test tube replication.
Protease that cuts up ORF1ab. Note that it is also present in ORF1ab.
The RdRp, since this is a Positive-strand RNA virus.
Unlike SARS-CoV-2 non-structural protein, these are not needed for test tube reproduction. They must therefore be for host modulation.
Integrates its RNA genome into the host genome.
Sounds complicated! The advantage is likely as in HIV: once inside the cell, it can remain hidden far away from the cell surface, but still infections.
Converts RNA to DNA, i.e. the inverse of transcription. Found in viruses such as Retrovirus, which includes e.g. HIV.
Notable examples:
Figure 2. Structure of a Gram-negative bacteria. Source.
Only present in Gram-negative bacteria.
Figure 3. Structure of a Gram-negative bacteria. Source.
Space between the inner and bacterial outer membrane in Gram-negative bacteria
Size: 1-2 micrometers long and about 0.25 micrometer in diameter, so: 2 * 0.5 * 0.5 * 10e-18 and thus 0.5 micrometer square.
Reference strain: E. Coli K-12 MG1655.
  • 4k genes
  • 5 Mbps
  • wget
  • wget -O NC_000913.3.fasta ''
Omics modeling: Tools for Genomic and Transcriptomic Analysis of Microbes at Single-Cell Level Zixi Chen, Lei Chen, Weiwen Zhang.
20 minutes in optimal conditions, with a crazy multiple start sites mechanism: E. Coli starts DNA replication before the previous one finished.
Otherwise, naively, would take 60-90 minutes just to replicate and segregate the full DNA otherwise. So it starts copying multiple times.
Appears to have just one, other bacteria can have more. TODO position in NCBI. Sequence determined in 1979:
The conventional starting point is not at the E. Coli K-12 MG1655 origin of replication. explains:
This site is the origin of replication of the E. coli chromosome. It contains the binding sites for DnaA, which is critical for initiation of replication. Replication proceeds bidirectionally. For historical reasons, the numbering of E. coli's circular chromosome does not start at the origin of replication, but at the origin of transfer during conjugation.
If it is a bit hard to understand what they mean by "origin of transfer" though, as that term is usually associated with the origin of transfer of bacterial conjugation.
By Tagkopoulos lab at University of California, Davies.
Reference strain: E. Coli K-12 MG1655.
NCBI taxonomy entry: This links to:
  • genome: From there there are links to either:
    • Download the FASTA: "Download sequences in FASTA format for genome, protein"
      For the genome, you get a compressed FASTA file with extension .fna called GCF_000005845.2_ASM584v2_genomic.fna that starts with:
      >NC_000913.3 Escherichia coli str. K-12 substr. MG1655, complete genome
      Using wc as in wc GCF_000005845.2_ASM584v2_genomic.fna gives 58022 lines, in Vim we see that each line is 80 characters, except for the final one which is 52. So we have 58020 * 80 + 52 = 4641652 =~ 4.6 Mbp
Note that this is not the conventional starting point for gene numbering: Section "E. Coli genome starting point".
The first gene in the E. Coli K-12 MG1655 genome. Remember however that bacterial chromosome is circular, so being the first doesn't mean much, how the choice was made: Section "E. Coli genome starting point".
At only 65 bp, this gene is quite small and boring. For a more interesting gene, have a look at the next gene, e. Coli K-12 MG1655 gene thrA.
Does something to do with threonine.
This is the first in the sequence thrL, thrA, thrB, thrC. This type of naming convention is quite common on related adjacent proteins, all of which must be getting transcribed into a single RNA by the same promoter. As mentioned in the analysis of the KEGG entry for e. Coli K-12 MG1655 gene thrA, those A, B and C are actually directly functionally linked in a direct metabolic pathway.
We can see that thrL, A, B, and C are in the same transcription unit by browsing the list of promoter at: By finding the first one by position we reach;
The second gene in the E. Coli K-12 MG1655 genome. Part of the E. Coli K-12 MG1655 operon thrLABC.
Part of a reaction that produces threonine.
This protein is an enzyme. The UniProt entry clearly shows the chemical reactions that it catalyses. In this case, there are actually two! It can either transforming the metabolite:
  • "L-homoserine" into "L-aspartate 4-semialdehyde"
  • "L-aspartate" into "4-phospho-L-aspartate"
Also interestingly, we see that both of those reaction require some extra energy to catalyse, one needing adenosine triphosphate and the other nADP+.
TODO: any mention of how much faster it makes the reaction, numerically?
Since this is an enzyme, it would also be interesting to have a quick search for it in the KEGG entry starting from the organism: We type in the search bar "thrA", it gives a long list, but the last entry is our "thrA". Selecting it highlights two pathways in the large graph, so we understand that it catalyzes two different reactions, as suggested by the protein name itself (fused blah blah). We can now hover over:
  • the edge: it shows all the enzymes that catalyze the given reaction. Both edges actually have multiple enzymes, e.g. the L-Homoserine path is also catalyzed by another enzyme called metL.
  • the node: they are the metabolites, e.g. one of the paths contains "L-homoserine" on one node and "L-aspartate 4-semialdehyde"
Note that common cofactor are omitted, since we've learnt from the UniProt entry that this reaction uses ATP.
If we can now click on the L-Homoserine edge, it takes us to: Under "Pathway" we see an interesting looking pathway "Glycine, serine and threonine metabolism": which contains a small manually selected and extremely clearly named subset of the larger graph!
But looking at the bottom of this subgraph (the UI is not great, can't Ctrl+F and enzyme names not shown, but the selected enzyme is slightly highlighted in red because it is in the URL vs we clearly see that thrA, thrB and thrC for a sequence that directly transforms "L-aspartate 4-semialdehyde" into "Homoserine" to "O-Phospho-L-homoserine" and finally tothreonine. This makes it crystal clear that they are not just located adjacently in the genome by chance: they are actually functionally related, and likely controlled by the same transcription factor: when you want one of them, you basically always want the three, because you must be are lacking threonine. TODO find transcription factor!
The UniProt entry also shows an interactive browser of the tertiary structure of the protein. We note that there are currently two sources available: X-ray crystallography and AlphaFold. To be honest, the AlphaFold one looks quite off!!!
By inspecting the FASTA for the entire genome, or by using the NCBI open reading frame tool, we see that this gene lies entirely in its own open reading frame, so it is quite boring
From the FASTA we see that the very first three Codons at position 337 are
where ATG is the start codon, and CGA GTG should be the first two that actually go into the protein: mentions that the enzime is most active as protein complex with four copies of the same protein:
Aspartate kinase I / homoserine dehydrogenase I comprises a dimer of ThrA dimers. Although the dimeric form is catalytically active, the binding equilibrium dramatically favors the tetrameric form. The aspartate kinase and homoserine dehydrogenase activities of each ThrA monomer are catalyzed by independent domains connected by a linker region.
TODO image?
Immediately follows e. Coli K-12 MG1655 gene thrA,
The fifth gene, and the first E. Coli K-12 MG1655 gene of unknown function as of 2021.
Note that this is very close to the "end" of the genome.
TODO DNA assembly structure.
The "last" gene, and also an E. Coli K-12 MG1655 gene of unknown function.
UniProt for example describes YaaX as "Uncharacterized protein YaaX".
As function is discovered, they then change it to a better name, e.g. to names such as the E. Coli K-12 MG1655 transcription unit thrLABC proteins all of which have a clear name due to threonine.
There are many other y??? as of 2021! Though they do tend to be smaller molecules.
From this we see that there is a convention of naming promoters as protein name + p, e.g. the first gene in E. Coli K-12 MG1655 promoter thrLp encodes protein thrL.
It is also possible to add numbers after the p, e.g. at we see that the protein zur has two promoters:
  • zurp6
  • zurp7
TODO why 6 and 7? There don't appear to be 1, 2, etc.
We can find it by searching for the species in the BioCyc promoter database. This leads to:
By finding the first operon by position we reach:
That page lists several components of the promoter, which we should try to understand!
After the first gene in the codon, thrL, there is a rho-independent termination. By comparing:
we understand that the presence of threonine or isoleucine variants, L-threonyl and L-isoleucyl, makes the rho-independent termination become more efficient, so the control loop is quite direct! Not sure why it cares about isoleucine as well though.
TODO which factor is actually specific to that DNA region?
Size: 300 x 600 nm
Has one of the smallest genomes known, and JCVI made a minimized strain with 473 genes: JCVI-syn3.0.
The reason why genitalium has such a small genome is that parasites tend to have smaller DNAs. So it must be highlighted that genitalium can only survive in highly enriched environments, it can't even make its own amino acids, which it normally obtains fromthe host cells! And because it cannot do cellular respiration, it very likely replicates slower than say E. Coli. It's easy to be small in such scenarios!
Power, Sex, Suicide by Nick Lane (2006) section "How to lose the cell wall without dying" page 184 has some related mentions puts it well very:
One group, the Mycoplasma, comprises mostly parasites, many of which live inside other cells. Mycoplasma cells are tiny, with very small genomes. M. genitalium, discovered in 1981, has the smallest known genome of any bacterial cell, encoding fewer than  genes. Despite its simplicity, it ranks among the most common of sexually transmitted diseases, producing symptoms similar to Chlamydia infection. It is so small (less than a third of a micron in diameter, or an order of magnitude smaller than most bacteria) that it must normally be viewed under the electron microscope; and difficulties culturing it meant its significance was not appreciated until the important advances in gene sequencing in the early 1990s. Like Rickettsia, Mycoplasma have lost virtually all the genes required for making nucleotides, amino acids, and so forth. Unlike Rickettsia, however, Mycoplasma have also lost all the genes for oxygen respiration, or indeed any other form of membrane respiration: they have no cytochromes, and so must rely on fermentation for energy.
Downsides mentioned at
  • too small to see on light microscope
  • difficult to genetically manipulate. TODO why?
  • less literature than E. Coli.
GPU accelerated, simulates the Craig's minimized M. genitalium, JCVI-syn3A at a particle basis of some kind.
Lab head is the cutest-looking lady ever:, Zaida (Zan) Luthey-Schulten. awesome visualization of simtk, paper: A Whole-Cell Computational Model Predicts Phenotype from Genotype - 2013 - Jonathan R. Karr. A Journey to the Center of Our Cells (2022) by James Somers comments on M. genitalium in general, and in particular on the JCVI strains.
essential metabolism for a minimal cell (2019) mentions:
JCVI-syn3A, a robust minimal cell with a 543 kbp genome and 493 genes, provides a versatile platform to study the basics of life.
Based on JCVI-syn3.0, they've added a few genes back to give better phenotypes, including slightly faster duplication time. Because the development cycle time is your God is also true in biology.
As of essential metabolism for a minimal cell (2019) it had only 91 genes of unknown function! So funny.
Figure 4. JCVI-syn3A during cell division by David Goodsell (2022) Source. A description is present at: Integrative Illustration of a JCVI-syn3A Minimal Cell by David Goodsell (2022) which describes everything in the picture.
CVI-syn3B strains differ from JCVI-syn3.0 by the presence of 19 additional non-essential genes that result in a more easily manipulated cell. JCVI-syn3B additionally includes a dual loxP landing pad that enables easy Cre recombinase mediated insertion of genes
It is also interesting to see how they are interested in co-culture with HeLa cells, presumably to enable infectious bacterial disease studies.
At (2023) they let it re-evove to it it would regain some fitness, and it did.
Name of the clade of archaea plus eukarya proposed at: Much better term than prokaryote as that is not a clade. Let's hope it catches on!
Archaea are much more closely related to the eukaryotes than bacteria, see e.g. Figure 1. "Coral of life by János Podani (2019)" which shows how archaea diverged from eukarya almost 2 By after LUCA!
It therefore appears that the mitochondrial endosymbioses happened when a bacteria like cell joined up with an archaea.
Some notable points in which archaea look more like eukaryotes than bacteria:
Power, Sex, Suicide by Nick Lane (2006) page 53 suggests that one tremendous advantage of eukaryotes over bacteria is their ability to change shape due to the presence of the cytoskeleton, and the lack of a rigid bacterial cell wall.
Imagine in a world where there are only bacteria, and you can eat entire bacteria in one go, what a huge advantage that is!
This group is a mess.
But one thing you should really know, as often mentioned in Power, Sex, Suicide by Nick Lane (2006): they are all eukaryotes.
Because prokaryotes are fundamentally unable to do phagocytoses, because they have a rigid cell wall. Changing cell shape at will requires a cytoskeleton.
A kingdom, formal name: "animalia".
A single hole that is used for shit, pee and fucking. Amazing.
It is quite mind blowing that this is polyphyletic on mammals and birds, what can't parallel evolution achieve??
Figure 5. Phylogenetic tree of the vertebrates. Source. Highlights how birds should obviously be classified as reptiles.
Now that's some basal shit! It's basically a fucking blob!!! Except that it is flat. No nervous system. Not even tissues. It is basically a multicellular
Figure 6. Source.
Just imagine this together with a Drosophila connectome on a single brain-in-the-loop simulation.
Figure 7. Source.
Chordate is a sad clade.
You read the name and think: hmm, neural cords!
But then you see that his is one of its members:
Figure 8. Source.
Yup. That's your cousin. And it's a much closer cousin than something like arthropods, which at least have heads eyes and legs like you.
The big breakthrough of the vertebrates appears to be the ability to swim around in a straight line and eat smaller species that are floating about.
Bones appear to help that a lot!
It is likely the most efficient design to travel long distances. Be thin and wiggle your tail around.
Perhaps smaller animals can skip the bone thing. Maybe a notable example are the lancets, which look a bit like small fish. But they only go up to 8 cm.
This paraphyletic subgroup is easy to form the "acquatic only" (fishes) vs "things that come out of water" (tetrapods). Though mudfish make that distinction harder.
Which kind of makes sense, why would you want for limbs unless you are going to stay out of water!
Once Ciro joked in a twenty questions-like game that humans are animals.
But counting humans a fish would have been a stroke of genius.
The exact relationships between those clades is not very clear as there's a bunch of extinct species in the middle we are not sure exactly where they go exactly, some hypothesis are listed at:
But at least it seems rock solid that those three are actually clades.
Name origin: amnion, a pellicle that covers embryos of both eggs and also during pregnancy.
Does not include amphibians. If you include them, you have the tetrapods.
This being a class is bullshit because it is not a clade, notably birds are not considered reptiles, but they are clearly in the clade.
Mammals and a bunch of extinct animals that look more like mammals than reptiles.
TODO name: Wikipedia says "being with a fused arch" but what does that mean???
The weirdest mammal clade: they lay fucking eggs. Only 5 known species alive as of 2020.
Eggs are basal: they simply didn't evolve out of what other reptiles do. From which we conclude that milk came before eggs stopped.
So this is the most basal subclade of mammals.
Etymology: means "single hole" in Greek, because like other reptiles it has a single hole for shit, pee and fucking: the cloaca.
Every mammal except the weird monotremes, i.e. marsupials and the placentalia.
The name is completely random, "wild beast". Are platypuses not "wild beasts"? They have a freaking poison!!
They split up from the rest of the mammals after the monotremes.
Every other mammal has a placenta.
This baby in pouch thing just feels like a pre-placenta stage.
As of 2020, account for about 20% of the known mammal species!!! mentions some reasons:
  • they can fly, so they can move out further
  • their eating habits are highly specialized
Since rat and mouse are not scientifically specific names, we'll just use them interchangeably.
When one specific species is implied, we will mean Mus musculus by default.
Exciting... sometimes cruel. But too exciting not to do:
Databases and projects:
Databases and projects:
This is the level at which human and all extinct siblings lie, with no other extant species, all others were killed or fucked to death: Section "Interbreeding between archaic and modern humans".
Exactly 1033 somatic cells on male, 959 on hermaphrodite, every time, counted as of 2020. A beauty.
Exactly 131 commit apoptoses in the hermaphrodite. contains the full lineage.
Browse freely moving whole-brain calcium imaging datasets
High level simulation only, no way to get from DNA to worm! :-) Includes:
A kingdom, formal name: "fungi".
Does not appear to refer to any one specific phylogenetic level, it usually refers to either:
Size: 10 micrometers.
Division time: 100 minutes.
A kingdom, formal name: "plantae".
This looks a lot like the beans that Brazilians venerate and can be easily found in the United Kingdom as of 2020.
The more exact type seems to be pinto bean, but this is close enough.
2021-03: same but 2.5 teaspons, seems to be the right ammount.
2021-02-10: attempt 3: 500g 1 hour 30 minutes no pressure, uncontrolled water. Salt with one chorizo: put 3 teaspoons, it was a bit too much, going to do 2 next time and see.
2020-12-14: attempt 3: 250g of beans, 1.5l of water, 30 minutes pressure.
2020-11-30: attempt 2: 275ml of dry beans, about 50% of 500g bag, putting 1650 ml (6x) of water on pressure cooker Still had to throw out some water.
Density dry raw: 216 g/250 ml = 432 g / 500 ml = 500 g / 580 ml = 864 g/L
500 g dry expands to in water after 12 hours: 1200 ml
Therefore 500 g dry = 864 / 2 L = 432 ml expands about 3x.
Therefore, to the maximum 2.5L of the cooker with 8x dry volume water from this recipe I can use:
2500 = volume expanded bean + volume water = 3 volume dry bean + 8 volume dry bean = 11 volume dry bean
and so:
volume dry bean = 2500/11 = 227ml
which is about 227 / 580 = 40% of the 500 g bag.
After first try, I found that 8x volume of water is way, way too much. Going to try 6x next time.
This seems to be the "brown Brazilian bean" that many Brazilians eat every day.
Edit: after buying it, not 100% sure. This one felt smaller than what Ciro had in Brazil, borlotti beans might be closer. Pinto beans are smaller, and creamier, and have softer peel, possibly produced less natural gas.
2021-04: second try.
2021-03: did for first time, started with same procedure as borlotti beans 2021-03. Maybe 1h30 is too much. Outcome was still very good.


  1. Biology
  2. Natural science
  3. Science
  4. Ciro Santilli's Homepage