Ciro Santilli OurBigBook.com $£ Sponsor €¥ 中国独裁统治 China Dictatorship 新疆改造中心、六四事件、法轮功、郝海东、709大抓捕、2015巴拿马文件 邓家贵、低端人口、西藏骚乱
ciro-santilli-s-projects.bigb
= Ciro Santilli's projects

= Projects
{synonym}

Major projects can be seen at: <the most important projects done by Ciro Santilli>{full}.

A summary of minor projects is given at: <Ciro Santilli's minor projects>.

This section is a dump for anything else, to keep those sacred first sections that show on the top of the homepage clean.

= OurBigBook
{c}
{parent=Ciro Santilli's projects}

= OurBigBook Project
{synonym}
{title2}

= OurBigBook Markup
{c}
{parent=OurBigBook}
{tag=Lightweight markup language}
{tag=Personal knowledge base}

The <markup language> of <OurBigBook.com>.

Also used on <Ciro Santilli's website> as a <static website> via the <OurBigBook CLI>.

The one <markup language> to rule them all?

Documentation at: https://docs.ourbigbook.com[].

= OurBigBook CLI
{c}
{parent=OurBigBook}

Official <Command-line interface> to convert a directory of <OurBigBook Markup> files into a <static website>. See also: https://cirosantilli.com/ourbigbook/ourbigbook-cli

= OurBigBook Library
{c}
{parent=OurBigBook}

Base <JavaScript> library that implements the <OurBigBook Markup>. Use by both:
* <OurBigBook CLI>
* <OurBigBook Web>

= OurBigBook Web
{c}
{parent=OurBigBook}

The website system that runs <OurBigBook.com>. For further information see:
* <OurBigBook.com>: rationale
* https://cirosantilli.com/ourbigbook/ourbigbook-web[]: project documentation
Relies on the <OurBigBook Library> to compile <OurBigBook Markup>.

\Include[ourbigbook-com]

= OurBigBook feature
{c}
{parent=OurBigBook}

= OurBigBook topic feature
{c}
{parent=OurBigBook feature}

More info at: https://docs.ourbigbook.com#ourbigbook-web-topics

= OurBigBook dynamic tree
{c}
{parent=OurBigBook feature}

More info at: https://docs.ourbigbook.com/ourbigbook-web-dynamic-article-tree

= x86 bare metal examples
{c}
{parent=Ciro Santilli's projects}
{splitSuffix}

https://github.com/cirosantilli/x86-bare-metal-examples

As mentioned at <Linux Kernel Module Cheat>{full}, this should be merged into that other project.

= Ciro Santilli's naughty projects
{c}
{parent=Ciro Santilli's projects}

If <Ciro Santilli> weren't a <Ciro Santilli's campaign for freedom of speech in China>[natural born activist], he chould have made an excellent <intelligence analyst>! See also: <Being naughty and creative are correlated>{full}.
* <Stack Overflow Vote Fraud Script>
* <GitHub> makes Ciro feel especially naughty:
  * <All GitHub Commit Emails>: he extracted (almost) all Git commit emails from <GitHub> with <Google BigQuery>
  * https://github.com/cirosantilli/test-many-commits-1m/[A repository with 1 million commits]: likely the https://www.quora.com/Which-GitHub-repo-has-the-most-commits/answer/Ciro-SantilliI[live repo with the most commits as of 2017]
  * https://stackoverflow.com/questions/20099235/who-is-the-user-with-the-longest-streak-on-github/27742165#27742165[An 100 year GitHub streak], likely longest ever when that existed. It was consuming too much <server> resources however, which led to GitHub admins manually https://web.archive.org/web/20151021135921/https://github.com/cirosantilli/[turning off his contribution history].
  * https://github.com/cirosantilli/test-octopus-100k[A repository with a 100k commit Git octopus merge]. Now that is a true https://softwareengineering.stackexchange.com/questions/314215/can-a-git-commit-have-more-than-2-parents/377903#377903[Cthulhu merge].
  * https://github.com/isaacs/github/issues/1718[500 on adoc infinite header xref recursion]: that was fun while it lasted

Outside this website:
* https://cirosantilli.com/china-dictatorship/zhihu-censorship-of-hao-haidong

= All GitHub Commit Emails
{c}
{parent=Ciro Santilli's naughty projects}
{tag=Open-source intelligence}
{tag=Ciro Santilli's data projects}

https://github.com/cirosantilli/all-github-commit-emails

In this project <Ciro Santilli> extracted (almost) all Git commit emails from <GitHub> with <Google BigQuery>! The repo was later taken down by <GitHub>. Newbs, censoring publicly available data!

Ciro also created a beautifully named variant with one email per commit: https://github.com/cirosantilli/imagine-all-the-people[]. True art. It also had the effect of breaking this "what's my first commit tracker": https://twitter.com/NachoSoto/status/1761873362706698469

= Facebook profile face dump
{c}
{parent=Ciro Santilli's naughty projects}
{tag=Ciro Santilli's data projects}

In 2016 Ciro made a script downloaded <Facebook> profile pictures.

This was possible at the time without any login by using a 2010 profile ID dump from originally announced at: https://blog.skullsecurity.org/2010/return-of-the-facebook-snatchers since profile picture access was not authenticated.

The profile ID dump was downloadable through a <BitTorrent> named `fbdata.torrent` of about 2.8GB, mostly compressed. Doing:
``
find . -type f | xargs sha256sum | sha256sum
``
on Ubuntu 20.04 gives:
``
2c9a739c9c5495e38ebab81fc67411b7c6562f139dcb8619901a3f01230efdd5
``
This dump widely reported e.g. on <Hacker News> at: https://news.ycombinator.com/item?id=1554558[].

At some point however, Facebook finally started to require tokens to view public profile pictures, thus making such further collection impossible, e.g. as of 2021: https://developers.facebook.com/docs/graph-api/reference/v9.0/user/picture[] mentions:
\Q[Querying a User ID (UID) now requires an access token.]
This is also mentioned e.g. at: https://stackoverflow.com/questions/11442442/get-user-profile-picture-by-id[]. This major privacy flaw was therefore finally addressed at some point, making it impossible to reproduce this project.

Ciro downloaded 10 thousand of those pictures, and did facial extraction with: https://stackoverflow.com/questions/13211745/detect-face-then-autocrop-pictures/37501314#37501314

He then created single a video by joining 10 thousand of those cropped faces which can be uploaded e.g. to <YouTube>. Ciro later decided it was better to make those videos private however, as sooner later he'd lose his account for it.

<Companies> like <YouTube> blocking this kind of content is the type of thing that makes companies take longer to fix such gaping privacy issues, and is a bit like <security through obscurity>. A video makes it clear to everyone that there is a privacy issue very effectively. But people prefer to hide and look away, and then 99% of people who know nothing about tech get their privacy busted by actual criminals/government spies and never learn about it.

But now that Facebook finally fixed it, it's fine, no need for the video anymore.

= Ciro Santilli's data projects
{parent=Ciro Santilli's projects}

<Ciro Santilli> has enjoyed doing projects dealing with with lots of data! They usually have a large overlap with <Ciro Santilli's naughty projects>, but not always!

= Wikipedia CatTree
{c}
{parent=Ciro Santilli's data projects}
{splitSuffix}
{tag=Ciro Santilli's minor projects}

This mini-project walks the category hierarchy <Wikipedia dumps> and dumps them in various simple formats, <HTML> being the most interesting!
* <HTML> dumps: https://cirosantilli.com/wikipedia-cattree/
* methodology: https://stackoverflow.com/questions/17432254/wikipedia-category-hierarchy-from-dumps/77313490#77313490

Scripts used:
* \a[wikipedia/import-sqlite.sh]
* \a[wikipedia/sqlite_preorder.py]
* \a[wikipedia/wikipedia-cattree.sh]

\Image[https://raw.githubusercontent.com/cirosantilli/media/master//Wikipedia_CatTree.png]
{title=<Mathematics> dump of <Wikipedia CatTree>}
{source=https://cirosantilli.com/wikipedia-cattree/Mathematics}

\Include[ciro-santilli-s-open-source-contributions]{parent=Ciro Santilli's projects}