= E. Coli Whole Cell Model by Covert Lab {c} {numbered} {scope} {title2=CovertLab/WholeCellEcoliRelease} https://github.com/CovertLab/WholeCellEcoliRelease is a model created by and other collaborators. The project is written in , hurray! But according to te , it seems to be the use a model with on-request access to master, very , asked https://github.com/CovertLab/WholeCellEcoliRelease/discussions/23[rationale on GitHub discussion], and they confirmed as expected that it is to: * to prevent their [publication] ideas from being stolen. Who would steal publication ideas with public proof in an issue tracker without crediting original authors? * to prevent noise from non collaborators. They do only get like 2 issues as year though, people forget that it is legal to ignore other people :-) Oh well. The project is a followup to the earlier which modelled . has 8x more genes (500 vs 4k), but it the undisputed and as such has been studied much more thoroughly. It also reproduces faster than Mycoplasma (20 minutes vs a few hours), which is a huge advantages for validation/exploratory . The project has a partial dependency on the [proprietary] which is , for students, not sure what it is used for exactly, from the comment in the `requirements.txt` the dependency is only partial. This project makes think of the as an . Given such external nutrient/temperature condition, which sequence makes the cell grow the fastest? Balancing feels like designing a speedrun. Everything in this section refers to version https://github.com/CovertLab/WholeCellEcoliRelease/tree/7e4cc9e57de76752df0f4e32eca95fb653ea64e4[7e4cc9e57de76752df0f4e32eca95fb653ea64e4], the code drop from November 2020, and was tested on 21.04 with a docker install of `docker.pkg.github.com/covertlab/wholecellecolirelease/wcm-full` with image id 502c3e604265, unless otherwise noted. = Install and first run {parent=E. Coli Whole Cell Model by Covert Lab} At https://github.com/CovertLab/WholeCellEcoliRelease/tree/7e4cc9e57de76752df0f4e32eca95fb653ea64e4[7e4cc9e57de76752df0f4e32eca95fb653ea64e4] you basically need to use the image on 21.04 due to breaking changes... (not their fault). Perhaps would solve things, but who has the patience for that?!?! The Docker setup from README does just work. The image download is a bit tedius, as it requires you to create a GitHub API key as described in the README, but there must be reasons for that. Once the image is downloaded, you really want to run is from the root of the source tree: `` sudo docker run --name=wcm -it -v "$(pwd):/wcEcoli" docker.pkg.github.com/covertlab/wholecellecolirelease/wcm-full `` This mounts the host source under `/wcEcoli`, so you can easily edit and view output images from your host. Once inside Docker we can compile, run the simulation, and analyze results with: `` make clean compile && python runscripts/manual/runFitter.py && python runscripts/manual/runSim.py && python runscripts/manual/analysisVariant.py && python runscripts/manual/analysisCohort.py && python runscripts/manual/analysisMultigen.py && python runscripts/manual/analysisSingle.py `` The meaning of each of the analysis commands is described at {full}. As a refresher, after you stop the container, e.g. by restarting your computer or running `sudo docker stop wcm`, you can get back into it with: `` sudo docker start wcm sudo docker run -it wcm bash `` `runscripts/manual/runFitter.py` takes about 15 minutes, and it generates files such as `reconstruction/ecoli/dataclasses/process/two_component_system.py` (https://github.com/CovertLab/WholeCellEcoliRelease/issues/20[related]) which is required to run the simulation, it is basically a part of the build. `runSim.py` does the main simulation, progress output contains lines of type: `` Time (s) Dry mass Dry mass Protein RNA Small mol Expected (fg) fold change fold change fold change fold change fold change ======== ======== =========== =========== =========== =========== =========== 0.00 403.09 1.000 1.000 1.000 1.000 1.000 0.20 403.18 1.000 1.000 1.000 1.000 1.000 `` and then it ended on the at: `` 2569.18 783.09 1.943 1.910 2.005 1.950 1.963 Simulation finished: - Length: 0:42:49 - Runtime: 0:09:13 `` when the cell had almost doubled, and presumably divided in 42 minutes of simulated time, which could make sense compared to the 20 under optimal conditions. = Output overview {parent=E. Coli Whole Cell Model by Covert Lab} Run output is placed under `out/`: Some of the output data is stored as `.cpickle` files. To observe those files, you need the original Python classes, and therefore you have to be inside Docker, from the host it won't work. We can list all the plots that have been produced under `out/` with `` find -name '*.png' `` Plots are also available in and formats, e.g.: * : `./out/manual/plotOut/low_res_plots/massFractionSummary.png` * : `./out/manual/plotOut/svg_plots/massFractionSummary.svg` The SVGs write text as polygons, see also: . * : `./out/manual/plotOut/massFractionSummary.pdf` The output directory has a hierarchical structure of type: `` ./out/manual/wildtype_000000/000000/generation_000000/000000/ `` where: * `wildtype_000000`: variant conditions. `wildtype` is a human readable label, and `000000` is an index amongst the possible `wildtype` conditions. For example, we can have different simulations with different nutrients, or different sequences. An example of this is shown at . * `000000`: initial random seed for the initial cell, likely fed to 's `np.random.seed` * `genereation_000000`: this will increase with generations if we simulate multiple cells, which is supported by the model * `000000`: this will presumably contain the cell index within a generation We also understand that some of the top level directories contain summaries over all cells, e.g. the `massFractionSummary.pdf` plot exists at several levels of the hierarchy: `` ./out/manual/plotOut/massFractionSummary.pdf ./out/manual/wildtype_000000/plotOut/massFractionSummary.pdf ./out/manual/wildtype_000000/000000/plotOut/massFractionSummary.pdf ./out/manual/wildtype_000000/000000/generation_000000/000000/plotOut/massFractionSummary.pdf `` Each of thoes four levels of `plotOut` is generated by a different one of the analysis scripts: * `./out/manual/plotOut`: generated by `python runscripts/manual/analysisVariant.py`. Contains comparisons of different variant conditions. We confirm this by looking at the results of . * `./out/manual/wildtype_000000/plotOut`: generated by `python runscripts/manual/analysisCohort.py --variant_index 0`. TODO not sure how to differentiate between two different labels e.g. `wildtype_000000` and `somethingElse_000000`. If `-v` is not given, a it just picks the first one alphabetically. TODO not sure how to automatically generate all of those plots without inspecting the directories. * `./out/manual/wildtype_000000/000000/plotOut`: generated by `python runscripts/manual/analysisMultigen.py --variant_index 0 --seed 0` * `./out/manual/wildtype_000000/000000/generation_000000/000000/plotOut`: generated by `python runscripts/manual/analysisSingle.py --variant_index 0 --seed 0 --generation 0 --daughter 0`. Contains information about a single specific cell. = Mass fracion summary plot analysis {parent=Output overview} Let's look into a sample plot, `out/manual/plotOut/svg_plots/massFractionSummary.svg`, and try to understand as much as we can about what it means and how it was generated. This plot contains how much of each type of mass is present in all cells. Since we simulated just one cell, it will be the same as the results for that cell. We can see that all of them grow more or less , perhaps as the start of an exponential. We can see that all of them grow more or less linearly, perhaps as the start of an exponential. We can see that all of them grow more or less linearly, perhaps as the start of an exponential. * total dry mass (mass excluding ) * mass * mass * mass * mass. The last label is not very visible on the plots, but we can deduce it from the source code. By grepping the title "Cell mass fractions" in the source code, we see the files: `` models/ecoli/analysis/cohort/massFractionSummary.py models/ecoli/analysis/multigen/massFractionSummary.py models/ecoli/analysis/variant/massFractionSummary.py `` which must correspond to the different `massFractionSummary` plots throughout different levels of the hierarchy. By reading `models/ecoli/analysis/variant/massFractionSummary.py` a little bit, we see that: * the plotting is done with , hurray * it is reading its data from files under `./out/manual/wildtype_000000/000000/generation_000000/000000/simOut/Mass/`, more precisely `./out/manual/wildtype_000000/000000/generation_000000/000000/simOut/Mass/columns//data`. They are binary files however. Looking at the source for `wholecell/io/tablereader.py` shows that those are just a standard serialization mechanism. Maybe they should have used the instead. We can also take this opportunity to try and find where the data is coming from. `Mass` from the `./out/manual/wildtype_000000/000000/generation_000000/000000/simOut/Mass/` looks like an ID, so we that and we reach `models/ecoli/listeners/mass.py`. From this we understand that all data that is to be saved from a simulation must be coming from listeners: likely nothing, or not much, is dumped by default, because otherwise it would take up too much disk space. You have to explicitly say what it is that you want to save via a listener that acts on each time step. \Image[https://upload.wikimedia.org/wikipedia/commons/9/94/E._Coli_Whole_Cell_model_by_Covert_Lab_minimal_nutrients_mass_fraction_summary.svg] {height=600} {title=Minimal condition mass fraction plot} {description=File name: `out/manual/plotOut/svg_plots/massFractionSummary.svg`} More plot types will be explored at