Ciro Santilli  Sponsor 中国独裁统治 China Dictatorship 新疆改造中心、六四事件、法轮功、郝海东、709大抓捕、2015巴拿马文件 邓家贵、低端人口、西藏骚乱
= Ciro Santilli's projects

= Projects

Major projects can be seen at: <the most important projects done by Ciro Santilli>{full}.

A summary of minor projects is given at: <Ciro Santilli's minor projects>.

This section is a dump for anything else, to keep those sacred first sections that show on the top of the homepage clean.

= OurBigBook
{parent=Ciro Santilli's projects}

= OurBigBook Project

= OurBigBook Markup
{tag=Lightweight markup language}
{tag=Personal knowledge base}

The <markup language> of <>.

Also used on <Ciro Santilli's website> as a <static website> via the <OurBigBook CLI>.

The one <markup language> to rule them all?

Documentation at:[].

= OurBigBook CLI
{tag=Static site generator}

Official <Command-line interface> to convert a directory of <OurBigBook Markup> files into a <static website>. See also:

= OurBigBook Library

Base <JavaScript> library that implements the <OurBigBook Markup>. Use by both:
* <OurBigBook CLI>
* <OurBigBook Web>

= OurBigBook Web

The website system that runs <>. For further information see:
* <>: rationale
*[]: project documentation
Relies on the <OurBigBook Library> to compile <OurBigBook Markup>.


= OurBigBook feature

= OurBigBook topic feature
{parent=OurBigBook feature}

More info at:

= OurBigBook dynamic tree
{parent=OurBigBook feature}

More info at:

= x86 bare metal examples
{parent=Ciro Santilli's projects}

As mentioned at <Linux Kernel Module Cheat>{full}, this should be merged into that other project.

= Ciro Santilli's naughty projects
{parent=Ciro Santilli's projects}

If <Ciro Santilli> weren't a <Ciro Santilli's campaign for freedom of speech in China>[natural born activist], he chould have made an excellent <intelligence analyst>! See also: <Being naughty and creative are correlated>{full}.
* <Stack Overflow Vote Fraud Script>
* <GitHub> makes Ciro feel especially naughty:
  * <All GitHub Commit Emails>: he extracted (almost) all Git commit emails from <GitHub> with <Google BigQuery>
  *[A repository with 1 million commits]: likely the[live repo with the most commits as of 2017]
  *[An 100 year GitHub streak], likely longest ever when that existed. It was consuming too much <server> resources however, which led to GitHub admins manually[turning off his contribution history].
  *[A repository with a 100k commit Git octopus merge]. Now that is a true[Cthulhu merge].
  *[500 on adoc infinite header xref recursion]: that was fun while it lasted

Outside this website:

= All GitHub Commit Emails
{parent=Ciro Santilli's naughty projects}
{tag=Open-source intelligence}
{tag=Ciro Santilli's data projects}

In this project <Ciro Santilli> extracted (almost) all Git commit emails from <GitHub> with <Google BigQuery>! The repo was later taken down by <GitHub>. Newbs, censoring publicly available data!

Ciro also created a beautifully named variant with one email per commit:[]. True art. It also had the effect of breaking this "what's my first commit tracker":

= Facebook profile face dump
{parent=Ciro Santilli's naughty projects}
{tag=Ciro Santilli's data projects}

In 2016 Ciro made a script downloaded <Facebook> profile pictures.

This was possible at the time without any login by using a 2010 profile ID dump from originally announced at: since profile picture access was not authenticated.

The profile ID dump was downloadable through a <BitTorrent> named `fbdata.torrent` of about 2.8GB, mostly compressed. Doing:
find . -type f | xargs sha256sum | sha256sum
on Ubuntu 20.04 gives:
This dump widely reported e.g. on <Hacker News> at:[].

At some point however, Facebook finally started to require tokens to view public profile pictures, thus making such further collection impossible, e.g. as of 2021:[] mentions:
\Q[Querying a User ID (UID) now requires an access token.]
This is also mentioned e.g. at:[]. This major privacy flaw was therefore finally addressed at some point, making it impossible to reproduce this project.

Ciro downloaded 10 thousand of those pictures, and did facial extraction with:

He then created single a video by joining 10 thousand of those cropped faces which can be uploaded e.g. to <YouTube>. Ciro later decided it was better to make those videos private however, as sooner later he'd lose his account for it.

<Companies> like <YouTube> blocking this kind of content is the type of thing that makes companies take longer to fix such gaping privacy issues, and is a bit like <security through obscurity>. A video makes it clear to everyone that there is a privacy issue very effectively. But people prefer to hide and look away, and then 99% of people who know nothing about tech get their privacy busted by actual criminals/government spies and never learn about it.

But now that Facebook finally fixed it, it's fine, no need for the video anymore.

= Ciro Santilli's data projects
{parent=Ciro Santilli's projects}

<Ciro Santilli> has enjoyed doing projects dealing with with lots of data! They usually have a large overlap with <Ciro Santilli's naughty projects>, but not always!

= Wikipedia CatTree
{parent=Ciro Santilli's data projects}
{tag=Ciro Santilli's minor projects}

This mini-project walks the category hierarchy <Wikipedia dumps> and dumps them in various simple formats, <HTML> being the most interesting!
* <HTML> dumps:
* methodology:

Scripts used:
* \a[wikipedia/]
* \a[wikipedia/]
* \a[wikipedia/]

{title=<Mathematics> dump of <Wikipedia CatTree>}

\Include[ciro-santilli-s-open-source-contributions]{parent=Ciro Santilli's projects}