Ciro Santilli OurBigBook.com $£ Sponsor €¥ 中国独裁统治 China Dictatorship 新疆改造中心、六四事件、法轮功、郝海东、709大抓捕、2015巴拿马文件 邓家贵、低端人口、西藏骚乱
website.bigb
= Website
{wiki}

= The best websites of all time
{parent=Website}

Multi-user:
* <TV Tropes>{child}
* <WikiWikiWeb>{child}

Personal: <the best personal webpages of all time>{full}.

= Content moderation
{parent=Website}
{wiki}

= Deletionism and inclusionism
{parent=Content moderation}

* https://en.wikipedia.org/wiki/Deletionism_and_inclusionism_in_Wikipedia

= Deletionism
{parent=Deletionism and inclusionism}
{tag=Evil}

= Deletionist
{synonym}

https://meta.wikimedia.org/wiki/Deletionism

The problem of deletionism is that it removes users' confidence that their precious data will be safe. It's almost like having a database that constantly resets itself. Who will be willing to post on a website that deletes the content they created for free half of the time thus wasting people's precious time?

= Inclusionism
{parent=Deletionism and inclusionism}

= Closurism
{parent=Deletionism and inclusionism}

Term invented by <Ciro Santilli> to refer to <content moderation> policies that lock threads.

This is similar to <deletionism> but a bit less worse, as the pre-existing content is maintained. But new relevant content that comes up cannot be added in the future, so it is still bad.

= Online forums that lock threads after some time
{parent=Closurism}
{tag=Deletionism}
{tag=Evil}

= Online forums that lock threads after some time are evil
{synonym}

Like <Reddit>{child} (https://www.reddit.com/r/blog/comments/pze6d2/commenting_on_archived_posts_images_in_chat_and/[option to allow it per community added in late 2021]) and https://support.google.com/[].

And of course, <4chan> just takes that to a whole new level, usually closing on the same day, and then getting deleted within a week. Why would anyone contribute non-<illegal> content to that king of system?!

Ridiculous, so when new information comes out, we just duplicate all the old comments on a new thread again?

Remember, <Ciro Santilli> is the <Ciro Santilli's Stack Overflow contributions>[Necromancer God].

<Dan Dascalescu> agrees for <Reddit> specifically: https://www.reddit.com/r/TheoryOfReddit/comments/9oujwf/why_archiving_old_threads_is_a_bigger_problem/

= Reputation system
{parent=Content moderation}
{wiki}

= Activity tracker website
{parent=Website}

= Strava
{c}
{parent=Activity tracker website}
{wiki}

= Static website
{parent=Website}
{wiki=Static_web_page}

= Static site generator
{parent=Static website}

The best one is <OurBigBook Markup>{child} of course! :-)

= List of static site generators
{parent=Static site generator}

= Bookdown
{c}
{parent=List of static site generators}
{tag=R (programming language)}

https://github.com/rstudio/bookdown

Written in <R (programming language)>, but also relies on <pandoc>, so quite bad dependency wise.

Cross files references to IDs: yes. But no check by default for duplicates when doing automatic ID from title. Just automatically disambiguates with `-1`, `-2` suffixes, and links take the last one available.

Source page splitting: splits at h2 by default. If configurable, likely always af fixed level?

Has some nice image generation from inline code from standard R plotting functions.

<Hello world> documented at: https://bookdown.org/yihui/bookdown/get-started.html

<Hello world> on <Ubuntu 23.04> after installing <R (programming language)>:
``
sudo R -e 'install.packages("bookdown")'
git clone https://github.com/rstudio/bookdown-demo
cd bookdown-demo
Rscript -e 'bookdown::render_book("index.Rmd")'
xdg-open _book/index.html
``
The build CLI comes from: https://stackoverflow.com/questions/50888871/how-to-use-rscript-command-line-tool-to-build-a-book-in-bookdown

The installatoin `Rscript -e 'bookdown::render_book("index.Rmd")'` takes several minutes, it compiles a bunch of stuff from source apparently. but it did work.

= Hugo
{c}
{disambiguate=static site generator}
{parent=List of static site generators}

= Jekyll
{c}
{disambiguate=software}
{parent=List of static site generators}
{wiki}

= Jekyll
{c}
{synonym}

= Pelican
{c}
{disambiguate=static site generator}
{parent=List of static site generators}

A <Python> one:
* https://github.com/getpelican/pelican
* https://getpelican.com/

= Blog
{parent=Website}
{wiki}

= Blog comment hosting service
{parent=Blog}
{wiki}

= Disqus
{c}
{parent=Blog comment hosting service}
{wiki}

= Giscus
{c}
{parent=Blog comment hosting service}
{tag=Good}

https://github.com/giscus/giscus

= Medium
{disambiguate=website}
{c}
{parent=Blog}
{wiki}

While this has some of the metrics features that <Ciro Santilli> wants to implement for <OurBigBook.com>, it limits the number of articles your readeres can read.

How the <fuck> can you publish on a website that limits the number of views for your articles?!?! When all it has is static pages + some metrics?!?!

<Evil>. Just learn to use <GitHub Pages> for God's sake.

= WordPress
{c}
{parent=Blog}
{wiki}

= Collaborative writing platform
{parent=Website}

<Ciro Santilli> wants to rule this with <OurBigBook.com>.

= Wiki
{parent=Collaborative writing platform}
{wiki}

= Edit war
{c}
{parent=Wiki}

= HyperCard
{c}
{parent=Wiki}
{wiki}

This was the pre-<Internet> precursor of <wikis>. This program was likely venerable, shame it predates <Ciro Santilli>'s era.

But the thing was much more bloated it seems, and also included visual programming elements, and WYSISYG UI creation.

\Video[https://www.youtube.com/watch?v=FquNpWdf9vg]
{title=Hypercard by The Computer Chronicles (1987)}

= Wiki-binge
{c}
{parent=Wiki}

https://www.urbandictionary.com/define.php?term=wiki-binge

= Wiki by subject
{parent=Wiki}

= Mathematics wiki
{parent=Wiki by subject}

= BookofProofs
{c}
{parent=Mathematics wiki}

https://www.bookofproofs.org/

No open signup it seems. TODO CV of owner.

They are making a <proof assistant> to integrate into the website: https://github.com/bookofproofs/fpl/[], reminds <Ciro Santilli> of <website front-end for a mathematical formal proof system>.

= Encyclopedia of Math
{c}
{parent=Mathematics wiki}
{wiki}

https://encyclopediaofmath.org/wiki/Main_Page

Originally by <Springer>, but later moved to the European mathematical society.

= MathWorld
{c}
{parent=Mathematics wiki}
{wiki}

https://mathworld.wolfram.com/

Written mostly by <Eric W. Weisstein>.

Ciro once saw a printed version of the CRC "concise" encyclopedia of mathematics. It is about 12 cm thick. Imagine if it wasn't concise!!!

<Infinite Napkin> is the one-person <open source> replacemente we needed for it! And <OurBigBook.com> will be the final multi-person replacement.

= Eric W. Weisstein
{c}
{parent=MathWorld}
{wiki}

Ahh, this dude is just like <Ciro Santilli>, trying to create the ultimate natural sciences encyclopedia!

____
In 1995, Weisstein converted a Microsoft Word document of over 200 pages to hypertext format and uploaded it to his webspace at Caltech under the title Eric's Treasure Trove of Sciences.
____

= NLab
{c}
{parent=Mathematics wiki}
{wiki}

https://ncatlab.org

Decent encyclopedia of mathematics. Not much motivation, mostly statements though.

Created by:
* <John Baez>
* David Corfield
* Urs Schreiber

Unlike <Wikipedia>, they have a more sane forum commenting system, e.g. a page/forum pair:
* https://ncatlab.org/nlab/show/derivator
* https://nforum.ncatlab.org/discussion/887/derivator/

= PlanetMath
{c}
{parent=Mathematics wiki}

https://planetmath.org/

Based on <GitHub> pull requests: https://github.com/planetmath

Joe Corneli, of of the contributors, mentions this in a <OurBigBook.com>[cool-sounding "Peeragogy" context] at http://metameso.org/~joe/[]:
\Q[I earned my doctorate at The Open University in Milton Keynes, with a thesis focused on peer produced support for peer learning in the mathematics domain. The main case study was planetmath.org; the ideas also informed the development of “Peeragogy”.]

= ProofWiki
{c}
{parent=Mathematics wiki}

A <wiki> that gathers mathematical proofs.

URL: https://proofwiki.org/wiki/Main_Page

<MediaWiki>-based.

This appears to be the creator: https://github.com/externl "Joe George".

= Type of wiki
{parent=Wiki}

= Enterprise wiki
{parent=Type of wiki}

= LLM generated wiki
{c}
{parent=Type of wiki}
{tag=Large language model}

= Cosmopedia
{parent=LLM generated wiki}

* https://github.com/huggingface/cosmopedia
* https://huggingface.co/datasets/HuggingFaceTB/cosmopedia

\Q[Cosmopedia is a dataset of synthetic textbooks, blogposts, stories, posts and WikiHow articles generated by Mixtral-8x7B-Instruct-v0.1.The dataset contains over 30 million files and 25 billion tokens, making it the largest open synthetic dataset to date.]

= Kinnu
{c}
{parent=LLM generated wiki}
{title2=2021-}

https://kinnu.xyz/[].

App-only as of 2023, i.e. <University entry exam>[for children].

Humans make the table of contents, and then AI fills it. Ciro was thinking about doint the exact same thing at some point, maybe starting from Wikipedia categories.

Funding:
* 2023: \$6.5m https://www.uktech.news/education/kinnu-ai-funding-20230705

= Blockchain wiki
{c}
{parent=Type of wiki}

This section is about <wikis> that are hosted on a <blockchain> of some sort.

= Everipedia
{c}
{parent=Blockchain wiki}
{wiki}

= Wiki without notability requirements
{parent=Type of wiki}

= EverybodyWiki
{c}
{parent=Wiki without notability requirements}

* https://en.everybodywiki.com/Everybodywiki:Welcome <English (language)> homepage
* https://everybodywiki.com/

Appears to be a <Wikipedia> clone but with much lower/no notability requirements guidelines, which overcomes one of Wikipedia's main issues: <deletionism>.

They do have the interesting idea of importing deleted Wikipedia pages as a source of content, which leads to some epic "most viewed pages" such as https://en.everybodywiki.com/List_of_erotic_and_sex_workers_with_unnatural_death[] which currently reads:
\Q[Stop Being Pervs, Go Watch Lichfaop/Faoplich Instead and you can also visit MR Info 24 for more details.]

We can for example see Ciro Santilli's deleted entry <PsiQuantum> at: https://en.everybodywiki.com/PsiQuantum[], <Wikipedia> deletion page: https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/PsiQuantum[]. Their attribution is atrocious however, e.g. it does not seem possible to find any mention of "Ciro Santilli" on the edit history, which just points to the delete article which is not visible anymore. They could really get into trouble for this one day.

Their main use case, as suggested by the website itself, if for people/brands to create pages about themselves.

This combined with the lack of "one version of each page per person" seems like an explosive invitation for unsolvable edit wars.

The website is backed by a French startup: https://jobs.stationf.co/companies/wiki-valley[].

= Golden
{c}
{disambiguate=wiki}
{parent=Wiki without notability requirements}
{title2=2019}

= golden.com
{synonym}
{title2}

Website: https://golden.com

April 2024: merged with some fraud protection thing, is it sill a Wiki? Unclear, seem sto have lost that aspect: https://twitter.com/judegomila/status/1783028847983956430

Social media:
* https://twitter.com/golden

https://techcrunch.com/2019/04/30/golden-launch/
\Q[To state the obvious: <Wikipedia> is an incredibly useful website, but Gomila pointed out that notable companies and technologies like SV Angel, Benchling, Lisk and Urbit don’t currently have entries. Part of the problem is what he called Wikipedia’s “arbitrary notability threshold,” where pages are deleted for not being notable enough. (This is also what happened years ago to the Wikipedia page about yours truly — which I swear I didn’t write myself.)]
Exactly! <Deletionism on Wikipedia> is so sad, and especially for companies. In particular e.g. <Ciro Santilli> tried to create a page for <PsiQuantum>, and it got reverted... and now golden has one of the largest <Google> hits for it: https://golden.com/wiki/PsiQuantum-PBDGXRA

TODO how do they do moderation?

As of April 2024
\Q[Login is currently disabled.]
Asked at: https://twitter.com/cirosantilli/status/1777250258235302233 Their last tweets were from August 2023, so maybe they just silently shutdown? Their name is too generic and hard to search for efficietnly...

They do have <knowledge graph> built-in which is cool.

= WikiAlpha
{c}
{parent=Wiki without notability requirements}
{tag=MediaWiki instance}

https://en.wikialpha.org/wiki/Main_Page

\Q[WikiAlpha is an alternative to Wikipedia, where the main difference is that our deletion policy is far more lenient with regard to notability requirements. Basically, WikiAlpha is a near-indiscriminate collection of information in the form of articles on any topic: you can create an article about the band you just started, your pet dog, yourself, your house - as long as your content does not fall under our speedy deletion policy, it will likely remain on the site forever!]

= List of Wikis
{parent=Wiki}

= BookStack
{c}
{parent=List of Wikis}
{tag=Enterprise wiki}
{wiki}

Source: https://github.com/BookStackApp/BookStack

\Video[https://www.youtube.com/watch?v=WUvtzJfCAKE]
{title=10k <GitHub> Stars by BookStack (2022)}
{description=Answering to an AMA unfortunately :-) But some OK small bits of information trickled through.}

= Confluence
{c}
{disambiguate=software}
{parent=List of Wikis}
{tag=Enterprise wiki}
{wiki}

= DokuWiki
{c}
{parent=List of Wikis}
{wiki}

= Fandom
{c}
{disambiguate=Website}
{parent=List of Wikis}
{wiki}

= Know Your Meme
{c}
{parent=List of Wikis}
{wiki}

https://knowyourmeme.com/

The dominating <meme> <database> as of 2020.

= Nature Scitable
{parent=List of Wikis}
{tag=Nature (journal)}
{wiki}

As of 2022 visible at: https://www.nature.com/scitable

Apparently they had a separate URL as just https://scitable.com[], so they were somewhat serious about it before shutting it down.

As of 2022 marked:
\Q[This page has been archived and is no longer updated]
RIP.

https://www.nature.com/scitable/blog/student-voices/ has last entry 2015, so presumably that's the shutdown year.

Self description:
\Q[Using our platform, you can customize your own eBooks for your students. Create an online classroom. Contribute and share content and connect with networks of colleagues.]
so quite related to <OurBigBook.com>.

= Trillium Notes
{parent=List of Wikis}
{wiki}

https://github.com/zadam/trilium[].

Tree based organization at last.

Amazing <WYSIWYG>, including maths and tables, plus insane plugins like canvas mode, and specific file formats like code/mermaid diagrams/drawing mode.

Version history.

No multiuser features. Except for that, could have been a good starting point of an online multiuser thing such as <OurBigBook.com>!

Only possible to see one page at a time on output? Output chunking is a major feature of <OurBigBook>, I'm so proud.

Their tree based approach does have a problem however for the <OurBigBook.com> use case of sharing topics across users: every level forces is a scope. Which makes it basically impossible to reliably match topics across users.

HTML export keeps all data as HTMl is their native format. The files are mostly visible, but there is some CSS missing, it is not 100% like editor, notaby math is broken. There is also a hosted way of exposing: https://github.com/zadam/trilium/wiki/Sharing[].

Markdown export warns:
\Q[this preserves most of the formatting.]

Architecture: runs on local SQLite database via better-sqlite3. Data apparently stored in SQLite database at `~/.local/share/trilium-data`, no raw files.

Markup is stored as HTML as seen from: `sqlite3 document.db 'SELECT * from note_contents'`. HTML is their native storage format, quite interesting.

WYSIWYG based on https://ckeditor.com/ which is a dependency. It is kind of cool that the view in which you view the output is exactly the same as the one you edit in, and there is no intermediate format, just the HTML.

Math is <KaTeX> based.

= Wikipedia
{c}
{parent=List of Wikis}
{title2=2001}
{wiki}

Why Wikipedia sucks: <ourbigbook com/Wikipedia>{full}.

Best languages:
* https://en.wikipedia.org/wiki/Latin_Wikipedia[latin]
* https://en.wikipedia.org/wiki/Esperanto_Wikipedia[esperanto]. Other constructed languages: https://en.wikipedia.org/wiki/Wikipedia:List_of_constructed_languages_with_Wikipedias

The most important page of Wikipedia is undoubtedly: https://en.wikipedia.org/wiki/Wikipedia:Reliable_sources/Perennial_sources which lists the accepted and non accepted sources. Basically, the decision of what is true in this world.

Wikipedia is incredibly picky about copyright. E.g.: https://en.wikipedia.org/wiki/Wikipedia:Deletion_of_all_fair_use_images_of_living_people because "such portrait could be created". Yes, with a time machine, no problem! This does more harm than good... excessive!

Citing in Wikipedia is painful. Partly because of they have a billion different templates that you have to navigate. They should really have a system where you can easily reuse existing sources across articles! <How to use a single source multiple times in a Wikipedia article?>{full}

\Video[https://www.youtube.com/watch?v=_Rt0eAPLDkM]
{title=What Happened To Wikipedia's Founders?}
{description=
* https://youtu.be/_Rt0eAPLDkM?t=113 encyclopedia correction stickers. OMG!
* https://youtu.be/_Rt0eAPLDkM?t=201 Jimmy was a moderator on <MUD game genre> games
}

\Video[https://www.youtube.com/watch?v=j9-CovbP-7U]
{title=Inside the Wikimedia Foundation offices by <Wikimedia Foundation> (2008)}

= Wikipedia lore
{c}
{parent=Wikipedia}

\Video[https://www.youtube.com/watch?v=imPzvlwRnTg]
{title=What Mental Breakdown Of a Wikipedia Moderator looks like by Vince Vintage}

= Deletionism on Wikipedia
{parent=Wikipedia}
{tag=Deletionism}

https://en.wikipedia.org/wiki/Deletionism_and_inclusionism_in_Wikipedia

Some exmaples by <Ciro Santilli> follow.

Of the tutorial-subjectivity type:
* https://en.wikipedia.org/w/index.php?title=Isomorphism_theorems&oldid=976843241[This edit] perfectly summarizes how Ciro feels about Wikipedia (no particular hate towards that user, he was a teacher at the prestigious <Pierre and Marie Curie University> and https://en.wikipedia.org/wiki/Daniel_Lazard[actually as a wiki page about him]):
  \Q[rm a cryptic diagram (not understandable by a professional mathematician, without further explanations]
  which removed the only diagram that was actually understandable to non-Mathematicians, which <Ciro Santilli> had created, and received many upvotes at: https://math.stackexchange.com/questions/776039/intuition-behind-normal-subgroups/3732426#3732426[]. The removal does not generate any notifications to you unless you follow the page which would lead to infinite noise, and is extremely difficult to find out how to contact the other person. The removal justification is even somewhat <ad hominem>: how does he know <Ciro Santilli> is also not a professional Mathematician? :-) Maybe it is obvious because <there is value in tutorials written by beginners>[Ciro explains in a way that is understandable]. Also removal makes no effort to contact original author. Of course, this is caused by the fact that there must also have been a bunch of useless edits not done by Ciro, and there is no <reputation system> to see if you should ignore a person or not immediately, so removal author has no patience anymore. This is what makes it impossible to contribute to Wikipedia: your stuff gets deleted at any time, and you don't know how to appeal it. Ciro is going to regret having written this rant after Daniel replies and shows the diagram is crap. But that would be better than not getting a reply and not learning that the diagram is crap.
* https://en.wikipedia.org/w/index.php?title=Finite_field&type=revision&diff=1044934168&oldid=1044905041 on <finite fields> with edit comment "Obviously: X ≡ α". Discussion at https://en.wikipedia.org/wiki/Talk:Finite_field#Concrete_simple_worked_out_example Some people simply don't know how to explain things to beginners, or don't think Wikipedia is where it should be done. One simply can't waste time fighting off those people, writing good tutorials is hard enough in itself without that fight.

Notability constraints, which are are way too strict:
* even information about important companies can be disputed. E.g. once <Ciro Santilli> tried to create a page for <PsiQuantum>, a startup with \$650m in funding, and there was a deletion proposal because it did not contain verifiable sources not linked directly to information provided by the company itself: https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/PsiQuantum Although this argument is correct, it is also true about 90% of everything that is on Wikipedia about any company. Where else can you get any information about a <B2B> company? Their clients are not going to say anything. Lawsuits and scandals are kind of the only possible source... In that case, the page was deleted with 2 votes against vs 3 votes for deletion.
  \Q[should we delete this extremely likely useful/correct content or not according to this extremely complex system of guidelines"]
  is very similar to <Stack Exchange>'s own <Stack Overflow content deletion> issues. <Ain't Nobody Got Time for That>. "Ain't Nobody Got Time for That" actually has a Wiki page: https://en.wikipedia.org/wiki/Ain%27t_Nobody_Got_Time_for_That[]. That's notable. Unlike a \$600M+ company of course.
There are even a Wikis that were created to remove notability constraints: <Wiki without notability requirements>.

For these reasons reason why Ciro basically only contributes images to Wikipedia: because they are either all in or all out, and you can determine which one of them it is. And this allows images to be more attributable, so people can actually see that it was Ciro that created a given amazing image, thus overcoming Wikipedia's lack of <reputation system> a little bit as well.

Wikipedia is perfect for things like biographies, geography, or history, which have a much more defined and subjective expository order. But when it comes to "tutorials of how to actually do stuff", which is what <mathematics> and <physics> are basically about, Wikipedia has a very hard time to go beyond dry definitions which are only useful for people who already half know the stuff. But to learn from zero, newbies need tutorials with intuition and examples.

Bibiography:
* https://gwern.net/inclusionism from <gwern.net>:
  \Q[Iron Law of Bureaucracy: the downwards deletionism spiral discourages contribution and is how Wikipedia will die.]

= Wikipedia dumps
{parent=Wikipedia}

Per-table dumps created with <mysqldump> and listed at: https://dumps.wikimedia.org/[]. Most notably, for the English Wikipedia: https://dumps.wikimedia.org/enwiki/latest/

A few of the files are not actual tables but derived data, notably http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-all-titles-in-ns0.gz from <Download titles of all Wikipedia articles>

The tables are "documented" under: https://www.mediawiki.org/wiki/Manual:Database_layout[], e.g. the central "page" table: https://www.mediawiki.org/wiki/Manual:Page_table[]. But in many cases it is impossible to deduce what fields are from those docs.

= enwiki-latest-category.sql
{parent=Wikipedia dumps}

https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-category.sql.gz contains a list of categories. It only contains the categories and some counts, but it doesn't contain the subcategories and pages under each category, so it is a bit pointless.

The schema is listed at: https://www.mediawiki.org/wiki/Manual:Category_table

The SQL first defines the table:
``
CREATE TABLE `category` (
  `cat_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `cat_title` varbinary(255) NOT NULL DEFAULT '',
  `cat_pages` int(11) NOT NULL DEFAULT 0,
  `cat_subcats` int(11) NOT NULL DEFAULT 0,
  `cat_files` int(11) NOT NULL DEFAULT 0,
  PRIMARY KEY (`cat_id`),
  UNIQUE KEY `cat_title` (`cat_title`),
  KEY `cat_pages` (`cat_pages`)
) ENGINE=InnoDB AUTO_INCREMENT=249228235 DEFAULT CHARSET=binary ROW_FORMAT=COMPRESSED;
``
followed by a few humongous inserts:
``
INSERT INTO `category` VALUES (2,'Unprintworthy_redirects',1597224,20,0),(3,'Computer_storage_devices',88,11,0)
``
which we can see at: https://en.wikipedia.org/wiki/Category:Computer_storage_devices

Se see that https://en.wikipedia.org/wiki/Category:Computer_storage_devices_by_company
* https://en.wikipedia.org/wiki/Category:Computer_storage_devices is a subcategory of that category and it appears in that file.
* https://en.wikipedia.org/wiki/Acronis_Secure_Zone is a page of the category, and it does not appear
so it contains only categories.

We can check this with:
``
sed -s 's/),/\n/g' enwiki-latest-category.sql | grep Computer_storage_devices
``
and it shows:
``
(3,'Computer_storage_devices',88,11,0
(521773,'Computer_storage_devices_by_company',6,6,0
``
There doesn't seem to be any interlink between the categories, only page and subcategory counts therefore.

= enwiki-latest-categorylinks.sql
{parent=Wikipedia dumps}

https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-categorylinks.sql.gz

The schema is listed at: https://www.mediawiki.org/wiki/Manual:Categorylinks_table

On the SQL:
``
CREATE TABLE `categorylinks` (
  `cl_from` int(8) unsigned NOT NULL DEFAULT 0,
  `cl_to` varbinary(255) NOT NULL DEFAULT '',
  `cl_sortkey` varbinary(230) NOT NULL DEFAULT '',
  `cl_timestamp` timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),
  `cl_sortkey_prefix` varbinary(255) NOT NULL DEFAULT '',
  `cl_collation` varbinary(32) NOT NULL DEFAULT '',
  `cl_type` enum('page','subcat','file') NOT NULL DEFAULT 'page',
  PRIMARY KEY (`cl_from`,`cl_to`),
  KEY `cl_timestamp` (`cl_to`,`cl_timestamp`),
  KEY `cl_sortkey` (`cl_to`,`cl_type`,`cl_sortkey`,`cl_from`),
  KEY `cl_collation_ext` (`cl_collation`,`cl_to`,`cl_type`,`cl_from`)
) ENGINE=InnoDB DEFAULT CHARSET=binary ROW_FORMAT=COMPRESSED;
``

TODO what is `cl_from`? We've tried:
* `page_id`: nope, there is not `page_id` of 3

`cl_to` appears to always be a category string name.

The format appears to be described at: https://www.mediawiki.org/wiki/Manual:Categorylinks_table

A sample INSERT entry is:
``
(3,'Computer_storage_devices',88,11,0)
``

= Wikipedia HOWTO
{parent=Wikipedia}

= Download titles of all Wikipedia articles
{parent=Wikipedia HOWTO}

https://stackoverflow.com/questions/24474288/how-to-obtain-a-list-of-titles-of-all-wikipedia-articles

http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-all-titles-in-ns0.gz Characterization:
* contains redirects, e.g. https://en.wikipedia.org/wiki/"Ampere_North" redirects to https://en.wikipedia.org/wiki/Ampere_North,_New_Jersey and both are present. Noted in this comment: https://stackoverflow.com/questions/24474288/how-to-obtain-a-list-of-titles-of-all-wikipedia-articles#comment136016773_24474476

= Download titles of all Wikipedia articles without redirects
{parent=Wikipedia HOWTO}

* https://stackoverflow.com/questions/24474288/how-to-obtain-a-list-of-titles-of-all-wikipedia-articles/77248954#77248954
* https://stackoverflow.com/questions/70777208/titles-of-all-wikipedia-articles-without-redirect

= Download all Wikipedia categories
{parent=Wikipedia HOWTO}

Our WIP script: \a[wikipedia/import-categories.sh].

Related:
* https://opendata.stackexchange.com/questions/1533/download-wikipedia-articles-from-a-specific-category
* https://webapps.stackexchange.com/questions/16359/is-there-a-way-to-download-a-list-of-all-wikipedia-categories/172480#172480
* https://stackoverflow.com/questions/40119322/how-to-download-all-pages-inside-a-category-in-wikipedia
* category tree on Stack Overflow
  * https://stackoverflow.com/questions/17432254/wikipedia-category-hierarchy-from-dumps/77313490#77313490 Canon but no good answers.
  * https://stackoverflow.com/questions/12227134/how-to-fetch-category-tree-of-wiki
  * https://stackoverflow.com/questions/21782410/finding-subcategories-of-a-wikipedia-category-using-category-and-categorylinks-t[]. Actually explains it: https://stackoverflow.com/questions/21782410/finding-subcategories-of-a-wikipedia-category-using-category-and-categorylinks-t/21798259#21798259
  * https://stackoverflow.com/questions/27279649/how-to-build-wikipedia-category-hierarchy
* https://mdkzaman.com/knowledge-graph-from-wikipedia-category-hierarchy/

Consider:
* https://en.wikipedia.org/wiki/Category:Computer_storage_devices
* https://en.wikipedia.org/wiki/Category:Computer_data_storage
* https://en.wikipedia.org/wiki/Computer_storage_devices which redirects to: https://en.wikipedia.org/wiki/Computer_data_storage

Jewish_physicists

Let's observe them in <MySQL>:
``
mysql enwiki -e "select page_id, page_namespace, page_title, page_is_redirect from page where page_namespace in (0, 14) and page_title in ('Computer_storage_devices', 'Computer_data_storage')"
``
outputs:
``
+----------+----------------+--------------------------+------------------+
| page_id  | page_namespace | page_title               | page_is_redirect |
+----------+----------------+--------------------------+------------------+
|     5300 |              0 | Computer_data_storage    |                0 |
| 42371130 |              0 | Computer_storage_devices |                1 |
|   711721 |             14 | Computer_data_storage    |                0 |
|   895945 |             14 | Computer_storage_devices |                0 |
+----------+----------------+--------------------------+------------------+
``

``
mysql enwiki -e "select cl_from, cl_to from categorylinks where cl_from in (5300, 711721, 895945, 42371130)"
``
gives:
``
+----------+-----------------------------------------------------------------------+
| cl_from  | cl_to                                                                 |
+----------+-----------------------------------------------------------------------+
|     5300 | All_articles_containing_potentially_dated_statements                  |
|     5300 | Articles_containing_potentially_dated_statements_from_2009            |
|     5300 | Articles_containing_potentially_dated_statements_from_2011            |
|     5300 | Articles_with_GND_identifiers                                         |
|     5300 | Articles_with_NKC_identifiers                                         |
|     5300 | Articles_with_short_description                                       |
|     5300 | Computer_architecture                                                 |
|     5300 | Computer_data_storage                                                 |
|     5300 | Short_description_matches_Wikidata                                    |
|     5300 | Use_dmy_dates_from_June_2020                                          |
|     5300 | Wikipedia_articles_incorporating_text_from_the_Federal_Standard_1037C |
|   711721 | Computer_architecture                                                 |
|   711721 | Computer_data                                                         |
|   711721 | Computer_hardware_by_type                                             |
|   711721 | Data_storage                                                          |
|   895945 | Computer_data_storage                                                 |
|   895945 | Computer_peripherals                                                  |
|   895945 | Recording_devices                                                     |
| 42371130 | Redirects_from_alternative_names                                      |
+----------+-----------------------------------------------------------------------+
``

So we see that `cl_from` encodes the parent categories:
* parent categories of categories:
  * https://en.wikipedia.org/wiki/Category:Computer_data_storage[], which has ID `711721`, has parent categories: "Computer hardware by type", "Computer data", "Data storage", "Computer architecture". This matches exactly on the database. These are all encoded on the source code of the page:
    ``
    {{DEFAULTSORT:Storage}}
    [[Category:Computer hardware by type]]
    [[Category:Computer data|Storage]]
    [[Category:Data storage|Computer]]
    [[Category:Computer architecture]]
    ``
  * https://en.wikipedia.org/wiki/Category:Computer_storage_devices[] has parent categories: "Computer data storage", "Recording devices", "Computer peripherals". This matches exactly on the database.
* parent categories of pages:
  * https://en.wikipedia.org/wiki/Computer_storage_devices whish is a redirect gets the magic category "Redirects_from_alternative_names", a humongous placeholder with many thousands of pages: https://en.wikipedia.org/wiki/Category:Redirects_from_alternative_names
  * https://en.wikipedia.org/wiki/Computer_data_storage shows only two categories onthe web UI: "Computer data storage" and "Computer architecture". Both of these are present on the database and at the end of the source code:
    ``
    {{DEFAULTSORT:Computer Data Storage}}
    [[Category:Computer data storage| ]]
    [[Category:Computer architecture]]
    ``
    The others appear to be more magic. Two of them we can guess from the templates:
    ``
    {{short description|Storage of digital data readable by computers}}
    {{Use dmy dates|date=June 2020}}
    ``
    are likely `Use_dmy_dates_from_June_2020` and `Articles_with_short_description` but the rest is more magic and not necessarily present in-source.

So to find all articls and categories under a given category title, say https://en.wikipedia.org/wiki/Category:Mathematics we can run:
``
mariadb enwiki -e "select cl_from, cl_to, page_namespace, page_title from categorylinks inner join page on page_namespace in (0, 14) and cl_from = page_id and cl_to = 'Mathematics'"
``

= How to use a single source multiple times in a Wikipedia article?
{parent=Wikipedia HOWTO}

https://www.quora.com/On-Wikipedia-how-can-you-cite-the-same-source-more-than-once-without-them-becoming-separate-references

https://en.wikipedia.org/wiki/Help:Footnotes#Footnotes:_using_a_source_more_than_once gives the following method:

Definition, anywhere on article, likely ideally as the first usage:
``
<ref name="myname">{{cite web ...}}</ref>
``

And then you can use it later on as:
``
<ref name="myname" />
``
which automatically expands the exact same thing, or using the shortcut:
``
{{r|myname}}
``

To cite multiple pages of a book: https://en.wikipedia.org/wiki/Wikipedia:Citing_sources#Citing_multiple_pages_of_the_same_source[], the best method is to define and use the reference without adding the `p` or `location` in `cite` as:
``
<ref name="googleStory">{{cite book |title=The Google Story}}</ref>{{rp|p=123}}
``
Do not set the page in `cite`, otherwise it shows up on the references. Instead we use the https://en.wikipedia.org/wiki/Template:Rp[`{{rp}}` template]. And then use the reference with the https://en.wikipedia.org/wiki/Template:R[`{{r}}`] template as:
``
{{r|googleStory|p=456}}
``
or for multiple pages:
``
{{r|googleStory|pp=123, 156-158}}
``

= How to cite a book on Wikipedia
{parent=Wikipedia HOWTO}

To avoid duplication when citing multiple pages: <How to use a single source multiple times in a Wikipedia article?>{full}

A good big sample definition:
``
<ref name="googleStory">{{cite book |last1=Vise |first1=David |author-link1=David A. Vise |last2=Malseed |first2=Mark |author-link2=Mark Malseed |title=The Google Story |date=2008 |publisher=Delacorte Press |url=https://archive.org/details/isbn_9780385342728}}</ref>
``
There is also `title-link` to link to a wiki page. But it is incompatible with `url=` for <Internet Archive Open Library> links which is a shame.

= Wikipedia edit request
{c}
{parent=Wikipedia HOWTO}

https://en.wikipedia.org/wiki/Wikipedia:Edit_requests

So, it turns out that Wikipedia does have a (ultra obscure as usual) mechanism for <pull requests>. You learn a new one every day.

= Wikipedia subpages
{c}
{parent=Wikipedia HOWTO}

https://en.wikipedia.org/wiki/Wikipedia:User_pages

OMG they have that. Slightly slightly overlap with <OurBigBook.com>.

= History of Wikipedia
{c}
{parent=Wikipedia}

A 2022 clone of https://phabricator.wikimedia.org/source/mediawiki.git gives first commits from 2003 by:
* Lee Daniel Crocker: https://en.wikipedia.org/wiki/Lee_Daniel_Crocker
  \Q[He is best known for rewriting the software upon which Wikipedia runs, to address scalability problems.]
  so that gives a good notion of the last major rewrite.
* Brion Vibber

TODO when was wikipedia open sourced from <Nupedia>? The ealry days of Wikipedia are quite obscure due to its transition from Nupedia.

= Nupedia
{c}
{parent=History of Wikipedia}
{wiki}

= Wikipedia analytics
{parent=Wikipedia}

= How to view how many visits a Wikipedia page has?
{synonym}
{title2}

= Pageviews Analysis
{c}
{parent=Wikipedia analytics}

Cool tool that allows you to graphically visualize page viewc counts of specific pages. It offers somewhat similar insights to <Google Trends>.

Homepage: https://pageviews.wmcloud.org/

Documentation: https://meta.wikimedia.org/wiki/Pageviews_Analysis#Massviews

The homepage shows views of selected pages, e.g. when <Google> had their 25th birthday: https://pageviews.wmcloud.org/?project=en.wikipedia.org&platform=all-access&agent=user&redirects=0&start=2023-09-11&end=2023-10-01&pages=Cat|Dog|Larry_Page <Larry Page> briefly beat "Cat" and "Dog".

`/topviews` shows the most viewed pages for a given month: https://pageviews.wmcloud.org/topviews/?project=en.wikipedia.org&platform=all-access&date=2023-08&excludes= It is extremelly epic that https://en.wikipedia.org/wiki/XXX:_Return_of_Xander_Cage[XXX: Return of Xander Cage], a 2017 film, is on the top ten of the August 2023 month. The page was around 8th place on a <Google> search for "xxx": https://archive.ph/wip/giRY8 at the time. https://en.wikipedia.org/wiki/XXXX_(beer)[XXXX (beer)] was also on the top 20, followed by https://en.wikipedia.org/wiki/Sex[Sex] on 21.

= Wikimedia Foundation
{c}
{parent=Wikipedia}
{wiki}

= Wikimedia Foundation project
{c}
{parent=Wikimedia Foundation}
{wiki}

= Wikidata
{c}
{parent=Wikimedia Foundation project}
{tag=Ontology}
{wiki}

= It is not possible to teach natural sciences on Wikipedia
{parent=Wikipedia}

Because of edit wars and encyclopedic tone requirements. See also: <OurBigBook.com/Wikipedia>.

Thus <OurBigBook.com>.

= Wikipedia person
{c}
{parent=Wikipedia}
{wiki}

= Jimmy Wales
{c}
{parent=Wikipedia person}
{wiki}

One thing to note is that Jimmy was a finance worker before starting wikipdia, e.g. he had capital to hire Larry Sanger.

Maybe that's the way to go about it, make money first, and later on change the world.

Starting just after the beginning of the <Internet> can't hurt either. Though tooling must have been insane back then.

= Steven Pruitt
{c}
{parent=Wikipedia person}
{wiki}

\Video[https://www.youtube.com/watch?v=JhNczOuhxeg]
{title=Meet the man behind a third of what's on Wikipedia}

= MediaWiki
{c}
{parent=Wikipedia}
{wiki}

<Open source software> engine created for and used by <Wikipedia>.

= MediaWiki instance
{c}
{parent=MediaWiki}

https://en.wikialpha.org/wiki/Main_Page

= MediaWiki markup
{c}
{parent=MediaWiki}

https://www.mediawiki.org/wiki/Markup_spec

= How to reference a book in Wikipedia markup?
{parent=MediaWiki markup}

Their reference markup is incredibly overengineered, convoluted, and underdocumented, it is unbelivable!

Use the reference:
``
This is a fact.{{sfn|Schweber|1994|p=487}}
``

Define the reference:
``
===Sources===
{{refbegin|2|indent=yes}}
*{{Cite book|author-link=Silvan S. Schweber |title=QED and the Men Who Made It: Dyson, Feynman, Schwinger, and Tomonaga|last=Schweber|first=Silvan S.|location=Princeton|publisher=University Press|year=1994 |isbn=978-0-691-03327-3 |url=https://archive.org/details/qedmenwhomadeitd0000schw/page/492 |url-access=registration}}
{{refend}}
``

`sfn` is magic and matches the the author last name and date from the `Cite`, it is documented at: https://en.wikipedia.org/wiki/Template:Sfn

Unforutunately, if there are multiple duplicate `Cite`s inline in the article, it will complain that there are multiple definitions, and you have to first factor out the article by replacing all those existing `Cite` with `sfn`, and keeping just one `Cite` at the bottom. What a pain...

You can also link to a specific page of the book, e.g. if it is a book is on <Internet Archive Open Library> with:
``
{{sfn|Murray|1997|p=[https://archive.org/details/supermenstory00murr/page/86 86]}}
``

For multiple pages should use `pp=` instead of `p=`. Does not seem to make much difference on the rendered output besides showing `p.` vs `pp.`, but so be it:
``
{{sfn|Murray|1997|pp=[https://archive.org/details/supermenstory00murr/page/86 86-87]}}
``

= Ciro Santilli's Wikipedia contributions
{c}
{parent=Wikipedia}

Let's see how long they last:
* <Julian Schwinger>: https://en.wikipedia.org/w/index.php?title=Julian_Schwinger&oldid=1039812272 greatly expanded the Early life and career with information from the book <QED and the men who made it: Dyson, Feynman, Schwinger, and Tomonaga by Silvan Schweber (1994)>

= Wikimedia Commons
{c}
{parent=Wikipedia}
{wiki}

A really good option to store educational media such as <media rationale of Ciro Santilli's website>[images and video]!

Shame that like the rest of Wikimedia, their interface is so clunky and lacking obvious features.

= Scholarpedia
{c}
{parent=List of Wikis}
{wiki}

http://www.scholarpedia.org/article/Main_Page

This is basically what <Jimmy Wales> had originally set out to make <Wikipedia>, a peer reviewed thing.

But then he noticed the entry barrier was too high while inviding an economist to review an article he wrote, and just made the more open thing instead.

= WikiWikiWeb
{c}
{parent=List of Wikis}
{title2=1995-}

= C2 wiki
{c}
{synonym}
{title2}

https://wiki.c2.com

The venerable first <wiki>.

The pre-<Eternal September> feeling is palpable.

People could freely comment their thoughts and sign below, making it much closer to what <Ciro Santilli> wants <OurBigBook.com> to be. But with upvotes ;-)

Nothing can better encapsulate the nostalgia of early day Internet. Genius at times, banal at others, you will be forever in our hearts!

= GitBook
{c}
{parent=Collaborative writing platform}

https://www.gitbook.com/

This is good, and very close competitor to <OurBigBook.com>.

But they https://docs.gitbook.com/resources/gitbook-legacy/v2-differences[killed local build], so they are going to die.

= Overleaf
{c}
{parent=Collaborative writing platform}
{wiki}

= Crowdsourcing website
{parent=Website}

= Patreon
{c}
{parent=Crowdsourcing website}
{wiki}

= Online dating service
{parent=Website}
{wiki}

= Dating website
{synonym}
{title2}

= E-learning website
{parent=Website}
{wiki}

Generally, if something is labelled as "e-learning", it's not a good sign, as it implies that it adheres to the "teacher"/"student" separation which <Ciro Santilli> much despises: <E-learning websites must allow students to create learning content>.

= E-learning websites must keep content free, only charge for certification
{parent=E-learning website}

Charging for certification is fine. Creating exams and preventing cheating has a cost.

Another thing that is fine charging for is dedicated 1-to-1 tutor time. This is something <Udacity> is doing as of 2022.

https://www.investopedia.com/articles/investing/042815/how-coursera-works-makes-money.asp has a good mention:
\Q[
MOOCs were first created by people with utopian visions for the internet. This means the idea for platforms like Coursera was likely conceived without a business plan in mind. Nonetheless, Coursera has managed to monetize its platform. It is worth noting, however, that monetization has lead to the effective elimination of the original MOOC idea, which is predicated on ideals like free and open access, as well as the building of online communities.

<Coursera> users must pay to engage with the material in a meaningful way and take courses for individualistic purposes. This has been a consistent trend among all major online education platforms.
]
and it links to: https://www.freecodecamp.org/news/massive-open-online-courses-started-out-completely-free-but-where-are-they-now-1dd1020f59/[], very good article!

That is a fundamental guiding principle of <OurBigBook.com>. The educational content must be licensed <CC-BY SA>!

Perhaps the most reliable way of reaching this state is <E-learning websites must allow students to create learning content>.

Bibliography:
* https://academia.stackexchange.com/questions/86179/is-it-financially-worth-it-to-teach-a-mooc-e-g-coursera Is it financially worth it to teach a MOOC (e.g. Coursera)?
* https://www.classcentral.com/about amazing, they can make money just from ads! I wouldn't expect that they could scale like TripAdvisor, because travelling means very local knowledge, I would expect there to be much fewer MOOCs and for them to be more easily findable on Google. Good thing though, this website.

= E-learning websites must allow students to create learning content
{parent=E-learning website}

This is a key philosophy of <OurBigBook.com>!

Because <there is value in tutorials written by beginners>.

= Massive open online course
{parent=E-learning website}
{wiki}

= MOOC
{c}
{synonym}
{title2}

MOOCs are a bad idea. We don't want to simply map the pre-computer classroom to the Internet. The Internet allows, and requires, fundamentally new ways to do things. More like <Stack Overflow>/<Wikipedia>. More like <OurBigBook.com>.

= Learning management system
{parent=E-learning website}
{title2=LMS}
{wiki}

A more specific type of <E-learning website> generally run by a specific organization.

= Virtual learning environment
{synonym}
{title2}

A website, usually hosted by an <university>, that takes what is done in class, and pastes it online. It is already much more rational and efficient, and opens up the way for potential sharing outside of the institution (or by default paywalling as the <University of Oxford> did.

The fundametnal problem with VLEs is that they tend to not have enough incentives for students to contribute at all to the content. This is basically the major motivation behind <OurBigBook.com>.

= Moodle
{c}
{parent=Learning management system}
{wiki}

= List of e-learning websites
{parent=E-learning website}

= Coursera
{c}
{parent=List of e-learning websites}
{title2=2012}
{wiki}

Some courses at least allow you to see material for free, e.g.: https://www.coursera.org/learn/quantum-optics-single-photon/lecture/UYjLu/1-1-canonical-quantization[]. Lots of video focus as usual for <MOOCs>.

Some are paywalled: https://www.coursera.org/learn/theory-of-angular-momentum?specialization=quantum-mechanics-for-engineers

It is extremely hard to find the course materials without enrolling, even if enrolling for free! By trying to make money, they make their website shit.

The comment section does have a lot of activity: https://www.coursera.org/learn/statistical-mechanics/discussions/weeks/2[]! Nice. And works like a proper issue tracker. But it is also very hidden.

November 2023 topics:
* <quantum field theory>: no
* <condensed matter>: 1 by Rahul Nandkishore from Colorado Boulder: https://www.coursera.org/specializations/the-physics-of-emergence-introduction-to-condensed-matter

= EdX
{c}
{parent=List of e-learning websites}
{wiki}

<Harvard University> + <MIT> combo.

As of 2022:
* can't see course material before start date. Once archived, you can see it but requires login...
* on free mode, limited course access
Fuck that.

Also, they have an ICP.

November 2023 course search:
* <Condensed matter>: 4 hits, so not too bad
* <quantum field theory>: no hits

= FutureLearn
{c}
{parent=List of e-learning websites}
{wiki}

By the <Open University>. "Open" I mean.

Some/all courses expire in 4 weeks: https://www.futurelearn.com/courses/intro-to-quantum-computing[]. Ludicrous.

= Jordan Peterson's university online
{c}
{parent=List of e-learning websites}
{tag=Jordan Peterson}
{tag=Vaporware}
{wiki}

https://www.reddit.com/r/JordanPeterson/comments/gc23gd/petersons_online_university_still_a_thing/[].

\Video[https://www.youtube.com/watch?v=86FJapcRq1c]
{title=My online university and why it is needed interview with Jordan Peterson (2018)}
{description=Cheaper and online. Initial focus on social sciences.}

= Khan Academy
{c}
{parent=List of e-learning websites}
{tag=Not-for-profit}
{wiki}

Kudos for being a <not-for-profit>. Also, anyone can create content: <e-learning websites must allow students to create learning content>. Oh, but TODO is possible for anyone to make content publicly visible? Course join links lik: https://www.khanacademy.org/join/MJZ6NSV7 require login. https://webapps.stackexchange.com/questions/165132/how-to-create-a-course-that-is-publicly-visible-without-the-need-to-login-on-kha If that's the case, it is a fatal flaw not shared by <OurBigBook.com>.

Another cool aspect is that they have the "physical world teacher pull student accounts in" approach built-in quite well at course creation. This is a very good feature.

As of 2021 they were a bit struggling for money it seems: https://www.youtube.com/watch?v=I8XdUy-wyyM[]?

= Sal Khan
{c}
{parent=Khan Academy}
{wiki}

Like <Jimmy Wales>, he used to work in finance and then quit. What is it with those successful e-learning people??

= OpenStax
{c}
{parent=List of e-learning websites}
{wiki}

* https://openstax.org/
* https://cnx.org/

These people have good intentions.

The problem is that they don't manage to go critical because there's to way for students to create content, everything is manually curated.

You can't even publicly comment on the textbooks. Or at least <Ciro Santilli> hasn't found a way to do so. There is just a "submit suggestion" box.

This massive lost opportunity is even shown graphically at: https://cnx.org/about (https://web.archive.org/web/20201127013553/https://cnx.org/about[archive]) where there is a clear separation between:
* "authors", who can create content
* "students", who can consume content
Maybe this wasn't the case in their legacy website, https://legacy.cnx.org/content?legacy=true[], but not sure, and they are retiring that now.

Thus, <OurBigBook.com>. License: <CC BY>! So we could re-use their stuff!

By <Rice University>.

TODO what are the books written in?
* https://github.com/openstax/openstax-cms Uses Wagtail CMS. So presumaby they just Wagtail's <WYSIWYG>.
* https://github.com/openstax/os-webview

\Video[https://www.youtube.com/watch?v=RRymi-lFHpE]
{title=Richard Baraniuk on open-source learning by <TED (conference)> (2006)}

= Udacity
{c}
{parent=List of e-learning websites}
{wiki}

It is a shame that they refocused to more applied courses. This also highlights their highly "managed" approach to content creation. Their 2022 pitch on front page says it all:
\Q[for as few as 10 hours a week, you can get the in-demand skills you need to help land a high-paying tech job]
they are focused on the highly paid character of many software engineering jobs.

But one cool point of this website is how they hire tutors to help on the courses. This is a very good thing. It is a fair way of monetizing: <e-learning websites must keep content free, only charge for certification>.

= Internet forum
{c}
{parent=Website}
{wiki}

= 4chan
{parent=Internet forum}
{wiki}

<Online forums that lock threads after some time are evil>. What else needs to be said?

= Hackster.io
{c}
{parent=Internet forum}

https://www.hackster.io/

= Hacker News
{c}
{parent=Internet forum}
{wiki}

https://news.ycombinator.com/

The most popular programming news sharing forum of the 2010's by far. If your content gets shared there, and it stays on top for a day, the traffic peak will be incredible. <Reddit> posts are sure to follow.

Basically a programming-only <Reddit>-lite.

<Ciro Santilli> had a few of his content shared there as mentioned at <articles>.

= Get notifications from Hacker News comments
{c}
{parent=Hacker News}
{wiki}

https://news.ycombinator.com/item?id=29969399

Repeat after me. Inertia is all that matters. Features don't matter. But algorithms matter.

= Q&A website
{c}
{parent=Internet forum}
{wiki=Q&A_software}

= Quora
{c}
{parent=Q&A website}
{wiki}

Quora is crap in many, many senses, but in part due to some <bad Stack Overflow policies>, it is the best crap we've got for certain (mostly useless) subjects. Until <OurBigBook.com> dominates the world.

The worst thing about quora is that you cannot subscribe only to certain subjects on your feed. Quora just keeps pumping <shit> you never subscribed to, no matter what. Ciro, for sport, unfollowed every single idiotic subject it was proposing, but it didn't work, sooner or later Quora just keeps pumping more shit back. Mind you, some of that shit is fun. But it's still shit. Though on second thought, <YouTube> also randomly decides to reset Ciro's humongous "don't recomend this shitty channel" choices from time to time, which is not much different...

Other terrible things, they just seem to have an incredible ability of making the website worse and more annoying over time! Truly amazing:
* around 2022, quora started showing "related" answers to other questions, possibly before actualy answers to the question itself. This, together with an insane number of inline ads that look very similar to answers, makes it very hard to decide what is an actual answer or not!!! E.g.: people complaining: 
  * https://www.reddit.com/r/OutOfTheLoop/comments/uqyvfp/whats_the_deal_with_quora_answers_seemingly/
  * https://greatqinformation.quora.com/How-to-stop-Quora-showing-me-related-answers-which-really-arent-related-at-all 

  This "feature" is so bad that it is even comical. Quora looks more like a spambot than a Q&A site now. Unusable!
* around 2021, quora started expanding any link as a huge preview box that completely takes over the answer, and it is very hard to stop it from doing so
* Quora used to show question details beyond the title by default, but stopped: https://www.reddit.com/r/OutOfTheLoop/comments/uqyvfp/comment/jd6go1b/?utm_source=share&utm_medium=web2x&context=3

See also: https://cirosantilli.com/china-dictatorship/quora for a coverage of the intense pro-CCP astroturfing present on the website.

\Include[stack-overflow]{parent=q-and-a-website}

= TeachMeAsap.com
{c}
{parent=Q&A website}
{title2=2022}

They sent one of the rare spams Ciro actually was interested in!!! Likely going down lists of top <Stack Overflow> users.

They have some kind of cryptocurrency, TCHME token, as a reward. Ciro wonders if the value of TCHME will ever be high enough to serve as a valid incentive.

Also, what is the total TCHME supply? Can the website devs issue as much as they want? They do giveaways e.g. as shown at: https://twitter.com/TeachMeAsap/status/1621353671840899072

And a centralized system with a certralized marketplace would work just as well for the initial phases. But fair play, the idea is interesting.

= LessWrong
{c}
{parent=Internet forum}
{wiki}

* https://www.lesswrong.com/
* https://www.reddit.com/r/OutOfTheLoop/comments/3ttw2e/what_is_lesswrong_and_why_do_people_say_it_is_a/

<Ciro Santilli> dislikes the fact that they take themselves too seriously. Ciro prefers the jokes and tech approach.

= Eliezer Yudkowsky
{c}
{parent=LessWrong}
{wiki}

= Reddit
{c}
{parent=Internet forum}
{wiki}

= Subreddit
{synonym}

<Ciro Santilli>'s account is https://www.reddit.com/user/cirosantilli/[], see also: <accounts>.

= View top posts of previous months or years in Reddit
{parent=Reddit}

No good per-sub way as of 2022:
* https://www.reddit.com/r/help/comments/27eziq/view_top_posts_of_a_specific_timespan/
* https://www.reddit.com/r/help/comments/9ebxl3/how_do_i_find_old_posts_on_a_subreddit/
* https://www.reddit.com/r/help/comments/aywras/how_do_i_search_reddit_for_posts_in_a_specific/
* https://www.reddit.com/r/modhelp/comments/etsomx/how_to_get_top_posts_of_past_months_of_subreddit/
* https://www.reddit.com/r/redditdev/comments/kaf1yz/finding_top_post_of_specific_month/
* https://www.reddit.com/r/changelog/comments/k663qy/introducing_rereddit_go_back_in_time_to_see_top/
* https://www.reddit.com/r/help/comments/stui9i/is_it_possible_to_look_up_the_top_posts_of_the/

= The Student Room
{c}
{parent=Internet forum}
{wiki}

https://www.forbes.com/sites/joewhitwell/2019/04/12/the-student-room-founder-charles-delingpole-talks-building-a-business-at-university/?sh=74645472643b The Student Room Founder Charles Delingpole Talks Building A Business At University (2019)

They could have been <Facebook>!

Founder: https://www.linkedin.com/in/delingpole

= Usenet
{c}
{parent=Internet forum}
{wiki}

= Usenet personality
{c}
{parent=Usenet}
{wiki}

<Ciro Santilli> does the same via <Google> searches and <Twitter>/<Reddit> searches for himself, you can't invent anything new nowadays:
\Q[Kibo was known for his high-volume but thoughtful posts, but achieved Usenet celebrity circa 1991 by writing a small script to grep his entire Usenet feed for instances of his name, and then answering personally whenever and wherever he was mentioned, giving the illusion that he was personally reading the entire feed.]

= Usenet newsgroup
{c}
{parent=Usenet}
{wiki}

= Big 8
{c}
{disambiguate=Usent}
{parent=Usenet newsgroup}
{wiki}

= Eternal September
{c}
{parent=Usenet}
{title2=1993}
{wiki}

= Mailing list
{parent=Website}
{tag=Essays by Ciro Santilli}
{wiki}

It boggles <Ciro Santilli>'s mind that people use <mailing list> to collaborate on projects!

The only explanation is that the dinosaurs who created the projects are unable to adapt to new superior technologies.

Yes, Ciro is talking to you, big fundamental projects from last century: <Linux kernel>, <GNU Compiler Collection> (https://gcc.gnu.org/lists.html[]), <Binutils> (https://sourceware.org/binutils/[]), etc.

Some of you are already using Bugzilla for the bugs, so kudos. But if you've seen their benefit, why you still use the mailing list for patches?

Advantages of mailing lists:
* threaded replies, which almost no issue tracker has. <GitHub> feature request: https://github.com/isaacs/github/issues/837

Disadvantages: everything else:
* cannot subscribed to a single thread. Which forces you to create an email filter for each one of them you subscribe to.
* no metadata, notably the notion of closing / merging, but also upvotes

  You have to read thirty messages before you can know if the bug was solved or not.
* it is insanely hard to reply to messages from before you were subscribed: https://webapps.stackexchange.com/questions/23197/reply-to-mailman-archived-message/115088#115088

  This forces everyone to subscribe to all lists, and then set up email filters to not be flooded with emails.
* hard to apply patches locally to test them out: https://stackoverflow.com/questions/5062389/how-to-use-git-am-to-apply-patches-from-email-messages/49082916#49082916

  Unless they use Patchwork, which adds one more website on top of the mess.

  And then <Gmail> corrupts your patches, and you are forced to use `git send-email`, which does not work on some network configurations: https://stackoverflow.com/questions/28038662/how-to-solve-unable-to-initialize-smtp-properly-when-using-using-git-send-ema or setup ThunderBird.
* often have to subscribe to post at all, thus cluttering your inbox further
* you can edit posts to make them clearer.

  Yes, people could vandalize their answers when they get mad, and threads might stop making sense after edits. But this can be solved with an undeletable post history like Stack Overflow has (but not any other tracker does).

  Or archive.org :-)

  In any case, what do you think will happen more often and have greater impact:
  * people vandalize their posts
  * people fix their silly typos and improve content
* searchable by author, keyword, etc. without Google. Yes, mailing list trackers could have decent implementations to overcome that. But no, GNU Mailman which everyone uses does not have it. Google barely indexes it.

  And I don't think Google properly indexes many of the mailing list archives for some reason: I never get hits for my own posts a week later, while I often do on GitHub issues.
* people have to learn about top posting vs inline posting, and this requires infinite education of new users
* Line comments in code reviews like GitHub and GitLab.

  On mailing lists: either put a comment in the middle of a huge patch and let other people find it, or (more likely) copy paste the part of the patch that you are talking about.
* most mail web UIs suck.

  OK, this is not an unsolvable or intrinsic problem, but still a problem.

  E.g.: `ezmlm` it is not possible to see the entire content in a single page: https://gcc.gnu.org/ml/gcc/2015-07/threads.html[].

  Unless you like reading threads backwards and with 4 levels of `>` quotations.

  The alternative: do like LLVM and send attachments. Yes, I we all love opening up attachments on our browsers.

  The real solution: everyone can create branches and pull requests. Also has the benefit of running <CI> on the pull requests.

Not sure:
* you can have infinitely many trackers to replicate data in case apocalypse happens in some part of the world.

  Although I'm not sure this is an advantage, as you don't know anymore which one is the canonical trackers an advantage, as you don't know anymore which one is the canonical tracker.

  And all web interfaces already have an API to export messages, and someone has already scripted it to import from any web UI to any web UI for you.

  And GitHub offers infinite precise history transparently on its API.

Smart people who agree with Ciro:
* https://news.ycombinator.com/item?id=13631069
* https://softwareengineering.stackexchange.com/questions/191961/why-do-some-big-projects-like-git-and-debian-only-use-a-mailing-list-and-not-a#comment779146_256479

= Online marketplace
{parent=Website}
{wiki}

= Fiverr
{c}
{parent=Online marketplace}
{wiki}

= Review site
{parent=Website}
{wiki}

= Rate My Professors
{c}
{parent=Review site}

= Website genre
{parent=Website}
{wiki}

= Web portal
{parent=Website genre}

= DigitalDreamDoor
{c}
{parent=Web portal}

https://digitaldreamdoor.com/

Ahh, this brings good memories of <Ciro Santilli>'s musical formative teenage years scouring the web for the best art humanity had ever produced in certain generes. And it still is a valuable resource as of the 2020's!

= Personal web page
{parent=Website genre}
{wiki}

= Personal website
{synonym}

= The best personal webpages of all time
{parent=Personal web page}

These are basically technically minded people that <Ciro Santilli> feels have similar <Ciro Santilli's psychology and physiology>[interests/psychology] to him, and who <graphomania>[write too much for their own good]:
* <cat-v.org>
* <gwern.net>. Dude's a bit overly obsessed with the popup preview though! "new Wikipedia popups (this 7th implementation enables recursive WP popups)" XD
* <settheory.net> by <Sylvain Poirier>
* <HyperPhysics>
* <Orange Papers>

Maybe one day these will also be legendary, who knows:
* <Sandy Maguire>'s blog: https://sandymaguire.me[], e.g. https://sandymaguire.me/blog/burnout/
* https://solitaryroad.com/physics.html About https://solitaryroad.com/a790.html

Another category Ciro admires are the "<computational physics> visualization" people, these people will go to <Heaven>:
* https://rafael-fuente.github.io/

Related:
* <James Somers>

Institution led:
* http://www.biology.arizona.edu/ The Biology Project

= cat-v.org
{c}
{parent=The best personal webpages of all time}
{tag=The best personal webpages of all time}

http://cat-v.org/ by https://en.wikipedia.org/wiki/Rob_Pike[Rob Pike], co-creator of https://en.wikipedia.org/wiki/Go_(programming_language)[Go], looong time Unixer, and some kind of leader of a https://en.wikipedia.org/wiki/Plan_9_from_Bell_Labs[9p] resurrection cult. That one's spicy. E.g.: http://harmful.cat-v.org/[], Ciro's version: <good and evil>.

= HyperPhysics
{c}
{parent=The best personal webpages of all time}
{wiki}

http://hyperphysics.phy-astr.gsu.edu/hbase/hframe.html

Created by Dr. Rod Nave from Georgia State University, where he worked from 1968 after his post-doc in North Wales on molecular <spectroscopy>.

While there is value to that website, it always feels like it <ourbigbook com/Wikipedia>[falls a bit too short as too "encyclopedic" and too little "tutorial-like"]. Most notably, it has very little on the <history of physics>/experiments.

<Ciro Santilli> likes this Rod, he really practices some good <braindumping>, just look at how he documented his life in the pre-<social media> <Internet> dark ages: http://hyperphysics.phy-astr.gsu.edu/Nave-html/nave.html

The website evolved from a <HyperCard> stack, as suggested by the website name, mentioned at: http://hyperphysics.phy-astr.gsu.edu/hbase/index.html[].

Shame he was too old for <CC BY-SA>, see "Please respect the Copyright" at http://hyperphysics.phy-astr.gsu.edu/hbase/index.html[].

https://exhibits.library.gsu.edu/kell/exhibits/show/nave-kell-hall/capturing-a-career has some good photo selection focused on showing the department, and has an interview.

Kell hall is a building of GSU that was demolished in 2019: https://atlanta.curbed.com/2020/1/31/21115980/gsu-georgia-state-atlanta-kell-hall-demolition-park-library-north

= Piracy website
{parent=Website}
{wiki}

= Shadow library
{parent=Piracy website}
{wiki}

= Z-Library
{c}
{parent=Shadow library}
{wiki}

= Web archiving
{parent=Website}
{wiki}

The remedy to cowardice, inattention, censorship and amorality.

Due to <Ciro Santilli's campaign for freedom of speech in China>, <Ciro Santilli> maintains information on this at mostly at:
* https://cirosantilli.com/china-dictatorship/wayback-machine
* https://cirosantilli.com/china-dictatorship/archive-today

<Dan Dascalescu>'s "Web page archiving" comparison table: https://web.archive.org/web/20130922192354/http://wiki.dandascalescu.com/reviews/online_services/web_page_archiving

= Digital preservation
{parent=Web archiving}
{wiki}

= Archive.today
{c}
{parent=Web archiving}
{wiki}

= archive.is
{synonym}
{title2}

https://cirosantilli.com/china-dictatorship/archive-today

= Creator of Archive.today
{parent=Archive.today}

= Denis Petrov of Archive.Is
{c}
{synonym}
{title2}

* https://drive.google.com/file/d/1JTPVd09NPaGH-KzGv2jU3XXcFiJAoUjw/view some crazy due investigating, let's see how long until it goes down, posted at: 
  * https://www.reddit.com/r/COPYRIGHT/comments/1bcqf3y/archivetoday_archiveis_copyright_victims/
  * https://webapps.stackexchange.com/questions/145817/on-which-country-are-the-creators-and-servers-of-archive-today-archive-is-base/175600#175600
  Points to:
  * https://www.linkedin.com/in/denispetrov/
  "Alex Conferno" is also brought up: https://twitter.com/conferno
* https://www.reddit.com/r/DataHoarder/comments/12trawt/has_anyone_ever_actually_spoken_to_denis_petrov/
* https://gyrovague.com/2023/08/05/archive-today-on-the-trail-of-the-mysterious-guerrilla-archivist-of-the-internet/[]. Trended on <Hacker News>: https://news.ycombinator.com/item?id=37009598
* https://gigazine.net/gsc_news/en/20240326-archive-today/

Other mentions of "Denis Petrov":
* https://webmasters.stackexchange.com/questions/88257/deny-access-to-archive-is

= Internet Archive
{c}
{parent=Web archiving}
{wiki}

= Internet Archive Open Library
{c}
{parent=Internet Archive}
{{wiki=Internet_Archive#Open_Library}}

Previously called "Lending Library" it seems: https://help.archive.org/hc/en-us/articles/360016554912-Borrowing-From-The-Lending-Library

You can borrow online books from them for a few hours/days: https://help.archive.org/hc/en-us/articles/360016554912-Borrowing-From-The-Lending-Library This is the most amazing thing ever made!!! You can even link to specific pages, e.g. https://archive.org/details/supermenstory00murr/page/80/mode/2up

They seem to a have a separate URL with the same content as well for some reason: https://openlibrary.org/[], classic messy <Internet Archive> style.

Bastards are suing them https://www.theverge.com/2020/6/1/21277036/internet-archive-publishers-lawsuit-open-library-ebook-lending[]: Hachette, Penguin Random House, Wiley, and HarperCollins

It is quite hard to decide if an upload is from the official legal lending library, or just some illegal upload, e.g.:
* https://archive.org/details/TheGoogleStory likely illegal
* https://archive.org/details/isbn_9780385342728 likely legal
so the URLs are basically the same style. Some legality indicators:
* `Access-restricted-item`: true
* present in the collection: https://archive.org/details/internetarchivebooks?tab=about

= Wayback Machine
{c}
{parent=Internet Archive}
{wiki}

https://cirosantilli.com/china-dictatorship/wayback-machine

= Wayback Machine rate limit
{c}
{parent=Wayback Machine}
{wiki}

https://archive.org/details/toomanyrequests_20191110 says 15 archives / minute, but apparently aslo 15 retrievals per minutes on Wikipedia, after which 5 min blacklist. After that, you start getting some 429s, and after that, server refuses to connect at al.

CDX: no limits apparently, they might just throttle you? Made 10k requets on bash loop and was going fine. But not that if you get blacklisted by create/fetch requests blacklist, server fails to connect here as well.

= Search Wayback Machine by IP
{parent=Wayback Machine}

https://archive.org/post/1025445/is-there-a-way-to-search-by-ip-address-not-http

= Wayback Machine full text search
{parent=Wayback Machine}

* https://www.reddit.com/r/DataHoarder/comments/kv6drc/wayback_machine_will_full_text_search_ever_be/
* https://webapps.stackexchange.com/questions/169608/searching-by-keyword-in-a-website-in-the-wayback-machine

= List all domains from the Wayback Machine
{c}
{parent=Wayback Machine}

* https://archive.org/post/1055220/how-to-query-for-all-the-websites-that-end-in-combr
* https://archive.org/details/WebArchiveDomainFiles only a random list with per-<ccTLDs> upon request of (paid presumably) partners. As of 2023 only contains the Netherlands: https://archive.org/details/Dotnl-2016-present-domains-in-wayback-domainyear-of-last-capture

= Archive Team
{c}
{parent=Internet Archive}
{wiki}

= Web page
{parent=Website}
{wiki}

= Video sharing website
{c}
{parent=Website}
{wiki}

= YouTube
{c}
{parent=Video sharing website}
{tag=Google acquisition}
{wiki}

<Ciro Santilli> publishes videos of this not-so-common visual programming experiments on his YouTube channel occasionally: https://www.youtube.com/c/CiroSantilli[]. Ciro should however not be lazy and also upload each video produced to <Wikimedia Commons>, since YouTube does not offer a download option even for videos marked with a <Creative Commons license>: https://www.quora.com/Can-I-download-Creative-Commons-licensed-YouTube-videos-to-edit-them-and-use-them/answer/Tarmo-Toikkanen[]!

This is also where Ciro's downtime converged to in his early 30's, since he long lost patience for stupid <video games> and <television series>.

Ciro developed one interesting technique: while scrolling through YouTube's useless recommendations, when he understands what a channel is about, he either immediately:
* subscribes if it is amazing and then "Don't recommend channel"
* otherwise just "Don't recommend channel" immediately
This helps to keep this feed clean of boring stuff he already knows about. There is unfortunately an \i[infinite] amount of useless videos out there however on the topics of:
* sports
* music, mostly idiotic top of the charts
* news and political commentary
* food
* programming tutorials. <Meh>, got <Stack Overflow>.
* stuff that is <having more than one natural language is bad for the world>[not in English], and notably languages that Ciro does not even speak!
* motorcycles
* https://en.wikipedia.org/wiki/ASMR[ASMR]
* cute animals
* gaming and movie commentary. Ciro is interested only in a very specific number of <video games>
* nature life, e.g. hiking, cycling, or living in isolation, this Ciro enjoys
* science for kids (<popular science>)
and no matter how much you say you don't want to hear about them, YouTube juts keeps on sending more.

Things Ciro hates about YouTube:
* you can't follow or ignore a subject, only indirectly tell the algorithm about that. Once you click a popular cat video, you will be forced to watch cat videos for all eternity.

Likely <FFmpeg is the backend of YouTube>.

Bought by <Google> in 2006.

\Video[https://www.youtube.com/watch?v=XAJEXUNmP5M]
{title=YouTube: From Concept to Hypergrowth Jawed Karim (2006)}
{description=YouTube co-founder explains that the key enabling technology for YouTube was the addition of video capabilities to <Adobe Flash>[Macromedia Flash 7].}

= YouTube poop
{parent=YouTube}

* https://www.youtube.com/channel/UCDyR_C_QVjZR24ze0fl5S_Q Goat-on-a-Stick channel

\Video[https://www.youtube.com/watch?v=g-sgw9bPV4A]
{title=Kazoo Kid - Trap Remix by Mike Diva (2016)}

\Video[https://www.youtube.com/watch?v=3Q12xOukVAI]
{title=Ravioli Remix: Black and Yellow by Wiz Krablifa by TheDoubleAgent (2015)}

\Video[https://www.youtube.com/watch?v=Fc1P-AEaEp8&list=PLcZOZrP1P_V6uAU4QhldipGBW86qJFvLk&index=13]
{title=Afraid of Technology by adarkenedroom (2008)}
{description=TODO source show, appears "Brass Eye", TODO episode https://www.reddit.com/r/videos/comments/jpyfi/technology_scares_the_crap_out_of_me/}

= youtube-dl
{parent=YouTube}

https://github.com/ytdl-org/youtube-dl

This thing dowloads <YouTube> videos. The thing downloads <Twitter> videos. The thing downloads BBC videos. It is just <Godlike>.

= YouTube channel
{parent=YouTube}

= The best YouTube channels
{parent=YouTube channel}

* https://www.youtube.com/channel/UCM2YmsRUeIbRkqjgNm0eTGQ Journeyman Pictures. Basically a VICE-like, focused on <fucked> up things happening in poor countries or regions.
* <Mediocre Amateur>{child}
* <youTube comedy channel>{child}

= The best scientific YouTube channels
{parent=The best YouTube channels}

* <mathematics YouTube channels>{child}
* <particle physics YouTube channels>{child}
  * <Dietterich Labs>{child}