This section contains the a list of cool things Ciro Santilli has been up to in chronological order, including small quick ones. Many/most of those are also posted on Ciro Santilli's accounts controlled by Ciro Santillis such as:
For a more theme-oriented version of the best results see: Section "The best articles by Ciro Santilli".
For OurBigBook Project updates see: docs.ourbigbook.com/news
I finally took a day to edit the Cool data embedded in the Bitcoin blockchain section from Aratu Week 2024 Talk by Ciro Santilli: My Best Random Projects into a proper YouTube video. The amount of effort that goes into every minute of video editing never ceases to amaze me.
Announcements:
- mastodon.social/@cirosantilli/113764420506911687
- x.com/cirosantilli/status/1875157694270841024
- www.linkedin.com/posts/cirosantilli_my-bitcoin-inscription-museum-images-and-activity-7280924162838126592-BVLX/
- www.facebook.com/cirosantilli/posts/pfbid02kN3sVVTViekYsgyqmN1pdcTp81ca7rJSmofk7X3DkdXYL6Rb8tEd78LoLYw7dEMSl
In 2024 I was user #25 with the most reputation gained on Stack Overflow.
This is up from #38 in 2023 is even though I have answered less questions than before.
This is likely because LLMs have killed users that just answered lots of easy new questions, and favored those like me who only answer more important questions found through Google.
I was #13 on the last quarter, so this is likely to go even higher in 2025. More details at: Section "Ciro Santilli's Stack Overflow contributions"
Announcements:
I've been thinking lightly about adding full text search to OurBigBook.
For example, at docs.ourbigbook.com/news/article-and-topic-id-prefix-search article search was added, but it only finds if you search something that appears right at the start of a title, e.g. for:you'd get a hit for:but not for
Fundamental theorem of calculus
fundamental
calculus
To do this efficiently, we need full text search, which PostgreSQL implements.
But finding a clean way to generate test data for testing out the speedup was not so easy and exploration into this led me to publishing a few new slightly improved methods where Googlers can now find them:
- unix.stackexchange.com/questions/97160/is-there-something-like-a-lorem-ipsum-generator/787733#787733 I propose a neat random "sentence" generator using common CLI tools like
grep
andsed
and the pre-installed Ubuntu dictionary/usr/share/dict/american-english
:grep -v "'" /usr/share/dict/american-english | shuf -r | paste -d ' ' $(printf "%4s" | sed 's/ /- /g') | sed -e 's/^\(.\)/\U\1/;s/$/./' | head -n10000000 \ > lorem.txt
- to achieve that, I also proposed two superior "join every N lines" method for the CLI: stackoverflow.com/questions/25973140/joining-every-group-of-n-lines-into-one-with-bash/79257780#79257780, notably this awk poem:
seq 10 | awk '{ printf("%s%s", NR == 1 ? "" : NR % 3 == 1 ? "\n" : " ", $0 ) } END { printf("\n") }'
- to achieve that, I also proposed two superior "join every N lines" method for the CLI: stackoverflow.com/questions/25973140/joining-every-group-of-n-lines-into-one-with-bash/79257780#79257780, notably this awk poem:
- stackoverflow.com/questions/3371503/sql-populate-table-with-random-data/79255281#79255281 I propose:
- a clean PostgreSQL random string stored procedure that picks random characters from an allowed character list
CREATE OR REPLACE FUNCTION random_string(int) RETURNS TEXT as $$ select string_agg(substr(characters, (random() * length(characters) + 1)::integer, 1), '') as random_word from (values('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789- ')) as symbols(characters) join generate_series(1, $1) on 1 = 1 $$ language sql;
- first generating PostgreSQL data as CSV, and then importing the CSV into PostgreSQL as a more flexible method. This can also be done in a streaming fashion from stdin which is neat.
python generate_data.py 10 | psql mydb -c '\copy "mytable" FROM STDIN'
- a clean PostgreSQL random string stored procedure that picks random characters from an allowed character list
Finally I did a writeup summarizing PostgreSQL full text search: Section "PostgreSQL full-text search" and also dumped it at: www.reddit.com/r/PostgreSQL/comments/12yld1o/is_it_worth_using_postgres_builtin_fulltext/ for good measure.
This one was way harder than my previous fun with "find the oldest people who won a given prize" (Nobel Prize/Oscar) mastodon.social/@cirosantilli/112689376315990248 because unlike those prizes where all the decisions are centralized, countries are much more complicated beasts, with changing currencies and international recognition.
This was a good experience to see a few ways in which Wikidata is inconsistent, with the same concept being expressed in multiple different ways, e.g. "end time" property of the current vs the superior "end time" qualifier.
Particularly bad is the notion of a "deprecated rank", that should really not exist.
This is exactly the type of semi interactive data munching that I like to do, a bit in the same vein as CIA 2010 covert communication websites and Cool data embedded in the Bitcoin blockchain.
As you might imagine, the secret services use exactly this type of knowledge modelling to do their dirty business, e.g. Gaffer by the GCHQ.
If only I weren't such a rebel, I'd be a perfect fit for the intelligence agencies.
This is the best monstrosity I had the patience to come up with:It got quite close to the ISO 4217 list.
SELECT
?currency
(GROUP_CONCAT(DISTINCT ?currencyIsoCode; SEPARATOR=", ") AS ?currencyIsoCodes)
?currencyLabel
(GROUP_CONCAT(DISTINCT ?countryLabel; SEPARATOR=", ") AS ?countries)
WHERE {
?country wdt:P31/wdt:P279* wd:Q6256. # is country
?country p:P38 ?countryHasCurrency.
?countryHasCurrency ps:P38 ?currency.
?countryHasCurrency wikibase:rank ?countryHasCurrencyRank.
OPTIONAL {
?currency p:P498 ?currencyHasIsoCode.
?currencyHasIsoCode ps:P498 ?currencyIsoCode.
}
FILTER NOT EXISTS {?country wdt:P576 ?countryAbolished}
FILTER NOT EXISTS {?currency wdt:P576 ?currencyAbolished}
FILTER NOT EXISTS {?currency wdt:P582 ?currencyEndTime}
FILTER NOT EXISTS {?countryHasCurrency pq:P582 ?countryHasCurrencyEndtime}
FILTER (?countryHasCurrencyRank != wikibase:DeprecatedRank)
FILTER (!bound(?currencyHasIsoCode) || ?currencyHasIsoCode != wikibase:DeprecatedRank)
# TODO makes query take timeout? Why? Needed to exclude PLZ.
FILTER NOT EXISTS {?currencyHasIsoCode pq:P582 ?currencyHasIsoCodeEndtime}
SERVICE wikibase:label {
bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".
?currency rdfs:label ?currencyLabel .
?country rdfs:label ?countryLabel .
}
}
GROUP BY ?currency ?currencyLabel
ORDER BY ?currencyIsoCodes ?currencyLabel
I was drawn into this waste of time after I noticed that someone had managed to create the Wikipedia of PsiQuantum which I had tried earlier but got deleted: mastodon.social/@cirosantilli/113488891292906243, and then I made the mistake of having a look at the Wikidata page of PsiQuantum.
Announcements:
I also had one more fun with: opendata.stackexchange.com/questions/15750/structured-data-for-nobel-prizes/21847#21847 getting some basic info about Nobel Prize winners, and noticed one, John Sulston, 2002 Nobel Prize in Physiology and Medicine laureate, who likely has the wrong place of birth on his Nobel Prize profile: www.nobelprize.org/prizes/medicine/2002/sulston/facts/ which is funny. I suggested the change now. Edit they fixed it after I pointed it out:
Another highlight was 1913 Nobel Prize in Chemistry laureate Alfred Werner who born either in Mulhouse in Alsace, France, or in "Yo no sé qué me pasó" ("I don't know what happened to me" in Spanish), a 1986 song by Mexican singer Juan Gabriel.
Announcements:
Also at opendata.stackexchange.com/questions/21849/how-to-get-a-list-of-all-nobel-prize-winners-who-never-had-a-doctorate-from-wiki/21850#21850 I tried to get the list of Nobel Prize laureates who don't have a PhD. I think the query was correct, but Wikidata data is just too incomplete. Related:
I edited the VOD of the talk Aratu Week 2024 Talk by Ciro Santilli: My Best Random Projects about the CIA 2010 covert communication websites a bit and published it at: www.youtube.com/watch?v=TFfuzZC5Qpc.
Announcements:
GitHub forbade our China Dictatorship auto-reply bot, the reason given is because they forbid comment reply bots in general. Though it was cool to see a junior support staff person giving out what obviously triggered the action:before a more senior one took over.
We've received a large volume of complaints from other users indicating that the comments and issues are unrelated to the projects they were working on.
Ciro was slightly saddened but not totally surprized by the bloodbath against him on the Reddit the threads he created:
- www.reddit.com/r/github/comments/1g7acv6/github_forbade_me_from_running_a_bot_that_would/ deleted by admins becausewhich is stupid, obviously we should be able to discuss GitHub policies in that sub.
We don't work for GitHub and we can't help you with your GitHub support problems. You'll just need to be patient.
Also good highlight to user whoShotMyCowReply:Has GitHub also forbidden you from, say, getting a job
Reply:No, a 120,000 USD donation did that: cirosantilli.com/sponsor#1000-monero-donation
Many successful people are neurodiverse comes to mind.Can't hate on the grind but I think you should also consider psychiatric help
- www.reddit.com/r/China/comments/1g7aa6k/american_programming_website_github_forbade_me/: also deleted without reason
So we observe once again the stupidity of deletionism towards anything that is considered controversial. The West is discussion fatigued, and would rather delete discussion than have it.
We also se people against you having freedom to moderate your own repositories as you like it, with bots or otherwise. Giving up freedoms for nothing, because "bot is evil".
Announcements:
academia.stackexchange.com/questions/213576/do-copyright-transfer-of-papers-to-publishers-affect-when-the-paper-enters-the-p Do copyright transfer of papers to publishers affect when the paper enters the public domain since copyright belongs to a corporation and not persons?
I'm asking a law question for a change, because I enjoy skimming through important old papers and uploading parts of them where everyone can legally enjoy them.
Announcements:
I like the Falun Mine for two reasons:
- some cool chemical discoveries have been made with a relation to the mine, notably tantalum and selenium, added a section to Wikipedia: en.wikipedia.org/w/index.php?title=Falun_Mine&oldid=1245374294#Discovery_of_new_elements I used the book discovery Of The Elements by Mary Elvira Weeks as my primary reference.
- it is the Chinese version of the Scunthorpe problem due to a naming conflict with Falun Gong, a censored new religion that was banned in China
Announcements:
Whenever a user creates an issue or comment on China Dictatorship, the bot now automatically creates a new issue with one of the latest news from Duty Machine: github.com/duty-machine/duty-machine
Sample created issue: github.com/cirosantilli/china-dictatorship/issues/1322 Script: github.com/cirosantilli/china-dictatorship/blob/ab6a46c511afaaf6c9e68ba8813c2b2cf9d9638c/action.js#L195
Duty Machine is a bot repo that automatically scrapes Chinese language news from major news outlets such as the New York Times or Radio Free Asia which ensures that China Dictatorship news will always be new.
It's the war of the anonymous bots against the little pinks, part of asymmetric information warfare: cirosantilli.com/china-dictatorship/asymmetric-information-warfare
Announcements:
superuser.com/questions/420885/is-there-a-face-recognition-command-line-tool/1852394#1852394 played with the
face_recognition
Python package: github.com/ageitgey/face_recognition Cute CLI API, but disappointing accuracy. Also at:Thanks Adam Geitgey for putting that repo up.
Announcements:
Under Section "Publication by Marie Curie" I did a quick overview of the papers in which Marie Curie and collaborators publish the existence of new elements polonium and radium. Both are very understandable (except the chemistry), and have some cute terminology. I also cited those papers on her Wikipedia page: en.wikipedia.org/w/index.php?title=Marie_Curie&diff=1240252528&oldid=1238097626 Another good exercise in "old paper finding" + "Wikipedia markup/rules" as I looked at the Comptes rendus de l'Académie des Sciences a bit.
This was kickstarted by YouTube recommending me the following good video:
which led me into yet a quick nuclear physics binge. I shouldn't do this to myself. I also ended up writing some tentative answers on Quora:
Announcements:
I tried to use every single free offline text-to-speech engine that would run on Ubuntu 24.04 without too much hassle to see if any of them sounded natural. pico2wave was the overall winner so far, but it is not perfect.
I've been noticing a gap between the "AI" SOTA and what is actually packaged well enough to be usable by a general audience.
Also played a bit more with OpenAI Whisper: askubuntu.com/questions/24059/automatically-generate-subtitles-close-caption-from-a-video-using-speech-to-text/1522895#1522895 Mind blowing performance and perfect packaging as well, kudos.
Announcements:
- en.wikipedia.org/wiki/Scott_Hassan I delved into a bit of Wikipedia drama on the page of Scott Hassan, initial coder of Google Search, which I created an am the main contributor.Originally I had added some details about this messy divorce which saw coverage in major publications such as the New York Times: www.nytimes.com/2021/08/20/technology/Scott-Hassan-Allison-Huynh-divorce.html and Scott used puppets to remove those at several points in time over the years.Those removals were then reverted by other editors, not myself, indicating that editors wanted the details there.While preparing to finally decide this through moderation, I ended up finding that the divorce details should likely have been left out according to Wikipedia rules, because Scott is "relatively unknown" and a "low profile individual":and so I ended up removing them myself.This is yet once again deletionism on Wikipedia weakening the site, and making @OurBigBook stronger :-) Here is the uncensored one: Scott HassanI spent time on this partly because I'm mildly obsessed with founding myths of companies, but also partly to better understand the moderation process of Wikipedia.
- unix.stackexchange.com/questions/256138/is-there-any-decent-speech-recognition-software-for-linux/613392#613392 cool to see that the Vosk open source speech recognition software by twitter.com/alphacep now has a convenient command line interface called vosk-transcriber!It allows you to just:
vosk-transcriber -m ~/var/lib/vosk/vosk-model-en-us-0.22 -i in.ogg -o out.srt -t srt
to extract a subtitle file out.srt from a .ogg audio input file.Accuracy is a bit meh, but we'll take it! - video.stackexchange.com/questions/33531/how-to-remove-background-from-video-without-green-screen-on-the-command-line/37392#37392 tested this AI video background remover github.com/nadermx/backgroundremover by @nadermx. It had a few glitches, but I had fun.unix.stackexchange.com/questions/233832/merge-two-video-clips-into-one-placing-them-next-to-each-other/774936#774936 I then learned how to stack videos side-by-side with ffmpeg to create this side-by-side demo. It also works for GIFs! stackoverflow.com/questions/30927367/imagemagick-making-2-gifs-into-side-by-side-gifs-using-im-convert/78361093#78361093Posted at:
- Just found out that my Lenovo ThinkPad P14s has an infrared camera, and recorded a quick test video on Ubuntu 23.10 with:
fmpeg -y -f v4l2 -framerate 30 -video_size 640x360 -input_format gray -i /dev/video2 -c copy out.mkv
- mastodon.social/@cirosantilli/112261675634568209
- twitter.com/cirosantilli/status/1778981935257116767
- www.facebook.com/cirosantilli/posts/pfbid027M3n2p8snE9otAWdHtJ3ig2AhrXoDGv4h68o1z8agHceQBbFHZpEoxg7KZbiWAgWl
- www.linkedin.com/feed/update/urn:li:activity:7184755892410576897/
- www.youtube.com/watch?v=o1ZeR6pmf6o
- commons.wikimedia.org/wiki/File:Infrared_video_of_Ciro_Santilli_waving_recorded_on_Lenovo_ThinkPad_P14s_with_FFmpeg_6.0_on_Ubuntu_23.10.webm