Definitely have a look at: coral of life representations.
Mostly data driven.
When a characteristic is basal, it basically means the opposite of it being polyphyletic.
We list some cool ones at: polyphyly.
- eol.org/ Encyclopedia of Life
All that matters is the tree of clades with examples of species in each clade, and common characteristics shared by the clade.
There's about 60 of them.
How genes form bodies.
This is hot shit, a possible worst case but sure to get there scenario to understand the brain!
It is quite mind blowing when you think about it, that the huge majority of your body's cells is essentially just there to support a tiny ammount of germline, which are the only cells that can actually pass on! It is fun to imagine the cell type tree for this, with a huge branching of somatic cells, and only a few germline going forward.
One of the simplest known seems to be: en.wikipedia.org/wiki/Trichoplax
www.u-tokyo.ac.jp/focus/en/articles/a_00220.html "The simplest multicellular organism unveiled" from 2013 mentions Tetrabaena socialis.
Multicellularity has evolved independently at least 25 times in eukaryotesand:
Complex multicellular organisms evolved only in six eukaryotic groups: animals, symbiomycotan fungi, brown algae, red algae, green algae, and land plants.
www.youtube.com/watch?v=zvuYJTL90J8&t=166s The Coronavirus Replication Cycle by Kevin Tokoph (2020)
COVID happens in two stages:
- viral infection
- inflammatory phase, where the body takes over, and sometimes harms itself. It seems that people are not generally contagious at this point?
This distinction is one of the reasons why separating the virus name (SARS-CoV-2) from the disease makes sense: the disease is much broader than the viral infection.
Why is it there such a clear separation of phases?
Why do people with mild symptoms go on to die? It is a great mystery.
Ciro Santilli's theory is that COVID is extremely effective at avoiding immune response. Then, in people where this is effective, things reach a point where there is so much virus, that the body notices and moves on to take a more drastic approach. This is compatible with the virus killing older people more, as they have weaker immunes systems. This is however incompatible with the fact that people don't seem to be contagious after the viral phase is over...
First sequenced variant: www.ncbi.nlm.nih.gov/genome/?term=86693
Genes at: www.ncbi.nlm.nih.gov/nuccore/MN908947.3 TODO protein list on a database?
30kbp, 10 genes, 29 proteins: cen.acs.org/biological-chemistry/infectious-disease/know-novel-coronaviruss-29-proteins/98/web/2020/04
www.youtube.com/watch?v=6DxlkxA82FM COVID-19 Symposium: Entry of Coronavirus into Cells | Dr. Paul Bates
Genes list: www.ncbi.nlm.nih.gov/nuccore/MN908947.3
Some are named after the encoded protein. Others that are not as clean are just orfXXX for open reading frame XXX.
Nucleocapsid phosphoprotein, sticks to the RNA inside.
www.nature.com/articles/s41467-020-20768-y mentions functions:
- helps pack the viral RNA into the capsule
- also has a side function in immune suppression
These are also required for test tube replication.
Unlike SARS-CoV-2 non-structural protein, these are not needed for test tube reproduction. They must therefore be for host modulation.
Sounds complicated! The advantage is likely as in HIV: once inside the cell, it can remain hidden far away from the cell surface, but still infections.
Only present in Gram-negative bacteria.
- www.cell.com/cell/fulltext/S0092-8674(15)00568-1 2015. Using Genome-scale Models to Predict Biological Capabilities. Edward J. O'Brien, Jonathan M. Monk, Bernhard O. Palsson.
Size: 1-2 micrometers long and about 0.25 micrometer in diameter, so:
2 * 0.5 * 0.5 * 10e-18and thus 0.5 micrometer square.
20 minutes in optimal conditions, with a crazy multiple start sites mechanism: E. Coli starts DNA replication before the previous one finished.
Otherwise, naively, would take 60-90 minutes just to replicate and segregate the full DNA otherwise. So it starts copying multiple times.
- www.ncbi.nlm.nih.gov/pmc/articles/PMC2063475/ Organization of sister origins and replisomes during multifork DNA replication in Escherichia coli by Fossum et al (2007)
The conventional starting point is not at the E. Coli K-12 MG1655 origin of replication.
This site is the origin of replication of the E. coli chromosome. It contains the binding sites for DnaA, which is critical for initiation of replication. Replication proceeds bidirectionally. For historical reasons, the numbering of E. coli's circular chromosome does not start at the origin of replication, but at the origin of transfer during conjugation.If it is a bit hard to understand what they mean by "origin of transfer" though, as that term is usually associated with the origin of transfer of bacterial conjugation.
By Tagkopoulos lab at University of California, Davies.
- www.nature.com/articles/ncomms13090 Multi-omics integration accurately predicts cellular state in unexplored conditions for Escherichia coli (2016)
NCBI taxonomy entry: www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=511145 This links to:
- genome: www.ncbi.nlm.nih.gov/genome/?term=txid511145 From there there are links to either:
- Download the FASTA: "Download sequences in FASTA format for genome, protein"For the genome, you get a compressed FASTA file with extension
GCF_000005845.2_ASM584v2_genomic.fnathat starts with:
>NC_000913.3 Escherichia coli str. K-12 substr. MG1655, complete genome AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTGAACTG
- Interactively browse the sequence on the browser viewer: "Reference genome: Escherichia coli str. K-12 substr. MG1655" which eventually leads to: www.ncbi.nlm.nih.gov/nuccore/556503834?report=graphIf we zoom into the start, we hover over the very first gene/protein: the famous (just kidding) e. Coli K-12 MG1655 gene thrL, at position 190-255.The second one is the much more interesting e. Coli K-12 MG1655 gene thrA.
- Gene list, with a total of 4,629 as of 2021: www.ncbi.nlm.nih.gov/gene/?term=txid511145
Note that this is not the conventional starting point for gene numbering: Section "E. Coli genome starting point".
At only 65 bp, this gene is quite small and boring. For a more interesting gene, have a look at the next gene, e. Coli K-12 MG1655 gene thrA.
Does something to do with threonine.
This is the first in the sequence thrL, thrA, thrB, thrC. This type of naming convention is quite common on related adjacent proteins, all of which must be getting transcribed into a single RNA by the same promoter. As mentioned in the analysis of the KEGG entry for e. Coli K-12 MG1655 gene thrA, those A, B and C are actually directly functionally linked in a direct metabolic pathway.
Part of a reaction that produces threonine.
This protein is an enzyme. The UniProt entry clearly shows the chemical reactions that it catalyses. In this case, there are actually two! It can either transforming the metabolite:
Also interestingly, we see that both of those reaction require some extra energy to catalyse, one needing adenosine triphosphate and the other nADP+.
- "L-homoserine" into "L-aspartate 4-semialdehyde"
- "L-aspartate" into "4-phospho-L-aspartate"
TODO: any mention of how much faster it makes the reaction, numerically?
Since this is an enzyme, it would also be interesting to have a quick search for it in the KEGG entry starting from the organism: www.genome.jp/pathway/eco01100+M00022 We type in the search bar "thrA", it gives a long list, but the last entry is our "thrA". Selecting it highlights two pathways in the large graph, so we understand that it catalyzes two different reactions, as suggested by the protein name itself (fused blah blah). We can now hover over:
Note that common cofactor are omitted, since we've learnt from the UniProt entry that this reaction uses ATP.
- the edge: it shows all the enzymes that catalyze the given reaction. Both edges actually have multiple enzymes, e.g. the L-Homoserine path is also catalyzed by another enzyme called metL.
- the node: they are the metabolites, e.g. one of the paths contains "L-homoserine" on one node and "L-aspartate 4-semialdehyde"
If we can now click on the L-Homoserine edge, it takes us to: www.genome.jp/entry/eco:b0002+eco:b3940. Under "Pathway" we see an interesting looking pathway "Glycine, serine and threonine metabolism": www.genome.jp/pathway/eco00260+b0002 which contains a small manually selected and extremely clearly named subset of the larger graph!
But looking at the bottom of this subgraph (the UI is not great, can't Ctrl+F and enzyme names not shown, but the selected enzyme is slightly highlighted in red because it is in the URL www.genome.jp/pathway/eco00260+b0002 vs www.genome.jp/pathway/eco00260) we clearly see that thrA, thrB and thrC for a sequence that directly transforms "L-aspartate 4-semialdehyde" into "Homoserine" to "O-Phospho-L-homoserine" and finally tothreonine. This makes it crystal clear that they are not just located adjacently in the genome by chance: they are actually functionally related, and likely controlled by the same transcription factor: when you want one of them, you basically always want the three, because you must be are lacking threonine. TODO find transcription factor!
ecocyc.org/gene?orgid=ECOLI&id=ASPKINIHOMOSERDEHYDROGI-MONOMER mentions that the enzime is most active as protein complex with four copies of the same protein:
Aspartate kinase I / homoserine dehydrogenase I comprises a dimer of ThrA dimers. Although the dimeric form is catalytically active, the binding equilibrium dramatically favors the tetrameric form. The aspartate kinase and homoserine dehydrogenase activities of each ThrA monomer are catalyzed by independent domains connected by a linker region.TODO image?
Immediately follows e. Coli K-12 MG1655 gene thrA,
The fifth gene, and the first E. Coli K-12 MG1655 gene of unknown function as of 2021.
Note that this is very close to the "end" of the genome.
TODO DNA assembly structure.
The "last" gene, and also an E. Coli K-12 MG1655 gene of unknown function.
UniProt for example describes YaaX as "Uncharacterized protein YaaX".
There are many other
y???as of 2021! Though they do tend to be smaller molecules.
It is also possible to add numbers after the
p, e.g. at biocyc.org/ECOLI/NEW-IMAGE?type=OPERON&object=PM0-45989 we see that the protein
zurhas two promoters:
TODO why 6 and 7? There don't appear to be 1, 2, etc.
That page lists several components of the promoter, which we should try to understand!
After the first gene in the codon, thrL, there is a rho-independent termination. By comparing:threonine or isoleucine variants, L-threonyl and L-isoleucyl, makes the rho-independent termination become more efficient, so the control loop is quite direct! Not sure why it cares about isoleucine as well though.
TODO which factor is actually specific to that DNA region?
Subset of the longer E. Coli K-12 MG1655 transcription unit thrLABC.
Maybe the most famous one is Mycoplasma genitalium byt there are others, and notably with lower biosafety levels:
Size: 300 x 600 nm
The reason why genitalium has such a small genome is that parasites tend to have smaller DNAs. So it must be highlighted that genitalium can only survive in highly enriched environments, it can't even make its own amino acids, which it normally obtains fromthe host cells! And because it cannot do cellular respiration, it very likely replicates slower than say E. Coli. It's easy to be small in such scenarios!
Power, Sex, Suicide by Nick Lane (2006) section "How to lose the cell wall without dying" page 184 has some related mentions puts it well very:
One group, the Mycoplasma, comprises mostly parasites, many of which live inside other cells. Mycoplasma cells are tiny, with very small genomes. M. genitalium, discovered in 1981, has the smallest known genome of any bacterial cell, encoding fewer than genes. Despite its simplicity, it ranks among the most common of sexually transmitted diseases, producing symptoms similar to Chlamydia infection. It is so small (less than a third of a micron in diameter, or an order of magnitude smaller than most bacteria) that it must normally be viewed under the electron microscope; and difﬁculties culturing it meant its signiﬁcance was not appreciated until the important advances in gene sequencing in the early 1990s. Like Rickettsia, Mycoplasma have lost virtually all the genes required for making nucleotides, amino acids, and so forth. Unlike Rickettsia, however, Mycoplasma have also lost all the genes for oxygen respiration, or indeed any other form of membrane respiration: they have no cytochromes, and so must rely on fermentation for energy.
Lab head is the cutest-looking lady ever: chemistry.illinois.edu/zan, Zaida (Zan) Luthey-Schulten.
- 2022 paper: www.cell.com/cell/fulltext/S0092-8674(21)01488-4 Fundamental behaviors emerge from simulations of a living minimal cell by Thornburg et al. (2022) published on Cell
- faculty.scs.illinois.edu/schulten/lm/ actual source code. No Version control and non-code drop release, openess and best practices haven't reached such far obscure reaches of academia yet. One day.
- blogs.nvidia.com/blog/2022/01/20/living-cell-simulation/ Nvidia announcement. That's how they do business, it is quite interesting how they highlight this kind of research.
- catalog.ngc.nvidia.com/orgs/hpc/containers/lattice-microbes has a container
Followed up by the E. Coli Whole Cell Model by Covert Lab.
www.newyorker.com/magazine/2022/03/07/a-journey-to-the-center-of-our-cells A Journey to the Center of Our Cells (2022) by James Somers comments on M. genitalium in general, and in particular on the JCVI strains.
essential metabolism for a minimal cell (2019) mentions:
JCVI-syn3A, a robust minimal cell with a 543 kbp genome and 493 genes, provides a versatile platform to study the basics of life.
As of essential metabolism for a minimal cell (2019) it had only 91 genes of unknown function! So funny.
Imagine in a world where there are only bacteria, and you can eat entire bacteria in one go, what a huge advantage that is!
A kingdom, formal name: "animalia".
The exact relationships between those clades is not very clear as there's a bunch of extinct species in the middle we are not sure exactly where they go exactly, some hypothesis are listed at: en.wikipedia.org/w/index.php?title=Tetrapod&oldid=1053601110#Temnospondyl_hypothesis_(TH)
But at least it seems rock solid that those three are actually clades.
TODO name: Wikipedia says "being with a fused arch" but what does that mean???
So this is the most basal subclade of mammals.
The name is completely random, "wild beast". Are platypuses not "wild beasts"? They have a freaking poison!!
Every other mammal has a placenta.
This baby in pouch thing just feels like a pre-placenta stage.
As of 2020, account for about 20% of the known mammal species!!! www.sciencefocus.com/nature/why-are-there-so-many-species-of-bat/ mentions some reasons:
- they can fly, so they can move out further
- their eating habits are highly specialized
When one specific species is implied, we will mean Mus musculus by default.
Exciting... sometimes cruel. But too exciting not to do:
Databases and projects:
- www.jax.org/research-and-faculty/resources/mouse-mutant-resource The Jackson Laboratory
Databases and projects:
- www.ncbi.nlm.nih.gov/pmc/articles/PMC2716027/ The Knockout Mouse Project (2004)
- www.cell.com/cell-systems/fulltext/S2405-4712(16)30151-X A Genome-Scale Database and Reconstruction of Caenorhabditis elegans Metabolism Gebauer, Juliane et al. Cell Systems , Volume 2 , Issue 5 , 312 - 322
Exactly 1033 somatic cells on male, 959 on hermaphrodite, every time, counted as of 2020. A beauty.
Exactly 131 comit apoptoses in the hermaphrodite.
www.wormatlas.org/celllineages.html contains the full lineage.
A kingdom, formal name: "fungi".
Size: 10 micrometers.
- 12 Mbps
- 6k genes
- databases: en.wikipedia.org/wiki/Saccharomyces_Genome_Database | www.yeastgenome.org/ Includes:
- known pathways: pathway.yeastgenome.org/overviewsWeb/celOv.shtml
Division time: 100 minutes.
Minimization project: en.wikipedia.org/wiki/Saccharomyces_cerevisiae#Synthetic_yeast_genome_project | syntheticyeast.org/
A kingdom, formal name: "plantae".
The more exact type seems to be pinto bean, but this is close enough.
2021-03: same but 2.5 teaspons, seems to be the right ammount.
2021-02-10: attempt 3: 500g 1 hour 30 minutes no pressure, uncontrolled water. Salt with one chorizo: put 3 teaspoons, it was a bit too much, going to do 2 next time and see.
2020-12-14: attempt 3: 250g of beans, 1.5l of water, 30 minutes pressure.
2020-11-30: attempt 2: 275ml of dry beans, about 50% of 500g bag, putting 1650 ml (6x) of water on pressure cooker Still had to throw out some water.
Density dry raw: 216 g/250 ml = 432 g / 500 ml = 500 g / 580 ml = 864 g/L
500 g dry expands to in water after 12 hours: 1200 ml
Therefore 500 g dry = 864 / 2 L = 432 ml expands about 3x.
Therefore, to the maximum 2.5L of the cooker with 8x dry volume water from this recipe I can use:and so:which is about 227 / 580 = 40% of the 500 g bag.
2500 = volume expanded bean + volume water = 3 volume dry bean + 8 volume dry bean = 11 volume dry bean
volume dry bean = 2500/11 = 227ml
After first try, I found that 8x volume of water is way, way too much. Going to try 6x next time.
This seems to be the "brown Brazilian bean" that many Brazilians eat every day.
2021-04: second try.
2021-03: did for first time, started with same procedure as borlotti beans 2021-03. Maybe 1h30 is too much. Outcome was still very good.