updates.bigb
= Updates
{scope}
This section contains the a list of cool things <Ciro Santilli> has been up to in chronological order, including small quick ones. Many/most of those are also posted on <Ciro Santilli>'s <accounts>[accounts] such as:
* https://twitter.com/cirosantilli
* https://mastodon.social/@CiroSantilli
For a more theme-oriented version of the best results see: <articles>{full}.
For <OurBigBook Project> updates see: https://docs.ourbigbook.com/news
= ARC-AGI-2
{parent=Updates}
{created=2025-10-18}
https://github.com/arc-dsl-2/arc-dsl-2
I've created a quick fork of <ARC-DSL> which defines a hand crafted Domain Specific Language (DSL) approach to help solve <ARC-AGI> problems.
I basically just merged outstanding pull requests on the original repo that were needed to make things run.
It would be cool to see if those rules also solve <ARC-AGI-2> problems well, but lazy now.
<ARC-AGI-2> is a very interesting benchmark which mixes some symbolic and other visual elements, and is readily solvable by non-expert humans, but has so far resisted transformers to a large degree.
Part of me would like to focus more on less visual aspects of AI, but it is still of interest.
It is funny how many early (semi)-retired fintech/bigtech bros that are interested in the project, I saw several of them on the forums.
I'd be tempted if I were in that position too I must confess. Maybe in 15 years time for me the way things are looking.
Kudos to these people who do something cool and open when they don't need money: https://www.reddit.com/r/Fire/comments/15x4w7r/comment/jx7dn16/ It is also the case of <Jimmy Wales> from <Wikipedia> for example, who used to work in finance.
Announcements:
* https://mastodon.social/@cirosantilli/115391408509591927
* https://x.com/cirosantilli/status/1979448691036410280
* https://www.linkedin.com/feed/update/urn:li:activity:7385215044638310401/
= Getting banned from Project Euler
{parent=Updates}
{created=2025-10-02}
I have been banned from <Project Euler> for life, and cannot login to my previous account https://projecteuler.net/profile/cirosantilli.pn
The ban happened within 12 hours of me publishing a solution to <Project Euler problem 961> https://github.com/lucky-bai/projecteuler-solutions/pull/94 which was one-shot by a free <GPT-5> account as <Matharena> had alerted me to being possible: https://matharena.ai/?comp=euler--euler&task=4&model=GPT-5+%28high%29&run=1
The problem leaderboard contains several people solved the problem within minutes of it being released, so almost certainly with an <LLM>.
I'm a huge believer in <Give answers to problem sets>[giving answers to problems], and I take the ban with pride.
It is funny to see that people waste their time policing this kind of useless stuff.
Project Euler likely has many fun problems, and can be a useful machine learning benchmark.
The "secret club" mentality is their only blemish, and incompatible with open science.
They should also make sure that LLMs don't one shot their future problems BEFORE publishing them!
Announcements:
* https://mastodon.social/@cirosantilli/115305320623020419
* https://x.com/cirosantilli/status/1973783262725349881
* https://www.linkedin.com/feed/update/urn:li:activity:7379549294766292992/
= Backing up <CIA> website archives for research and posterity
{parent=Updates}
This is a quick update to the article: <CIA 2010 covert communication websites>{full}
I've downloaded and uploaded copies of the archives of the CIA websites as follows:
* all <CIA 2010 covert communication websites/cqcounter>[cqcounter screenshots] where <cqcounter> was the best source to: https://github.com/cirosantilli/media/tree/master/cia-2010-covert-communication-websites/screenshots/cqcounter[]. That commercial website does not inspire much trust, e.g. now the main pages like http://cqcounter.com/site/internationalwhiskylounge.com.html were giving an error:
``
[1114: The table 'access' is full] ( 1114 : The table 'access' is full )
``
so I'm glad to have saved their precious screenshots at a safer place.
* all <Wayback Machine> archives to: https://github.com/cirosantilli/cia-2010-websites-dump[]. The exports were done with https://github.com/StrawberryMaster/wayback-machine-downloader by Felipe https://x.com/opapeldetrouxa which is an up-to-date fork of https://github.com/hartator/wayback-machine-downloader and the tool seemed to work very well. I've also edited that better working fork at the top answer of: https://superuser.com/questions/828907/how-to-download-a-website-from-the-archive-org-wayback-machine/957298#957298
The cqcounter screenshots don't offer too much information, but having the wayback machine ones could actually reveal new fingerprints and other website information leaks.
We've had a very quick look, and while there was nothing mind blowing, there were some small finds.
Starting https://web.archive.org/web/20041215182457/http://alljohnny.com/index.html[December 2004, the "Submit your favored carlson quote" of alljohnny.com] was mind blowingly switched to point to https://web.archive.org/web/20050328224840/https://washington.serversecured.net/~alljohnn/cgi-bin/memlog.cgi[\https://washington.serversecured.net/~alljohnn/cgi-bin/memlog.cgi] thus likely leaking the control site URL. Beauty. It previously pointed to https://web.archive.org/web/20040901162621/https://secure.alljohnny.com/cgi-bin/memlog.cgi[]
https://web.archive.org/web/20110203000411/http://mynepalnews.com/[mynepalnews.com] actually has several archives for a https://web.archive.org/web/20110204095803/http://mynepalnews.com:80/stats/[/stats] path which contains HTML reports generated by Webalizer, an analytic tracker that tracks the source of incoming traffic!!! It is hard to believe that the CIA would have left that there. Particularly ridiculous is the presence of `inurl:cgi server_software` at https://web.archive.org/web/20110204095809/http://mynepalnews.com:80/stats/usage_200805.html which is almost certainly a <#Google dork> search, which we know is something that the Iranians used to find the websites. That search hits under https://web.archive.org/web/20110204095351/http://mynepalnews.com:80/cgi-bin/check.cgi[/cgi-bin/check.cgi]. That page is itself os some interest containing `SERVER_ADMIN = mmadev@mmadev.com`. https://web.archive.org/web/20110204095815/http://mynepalnews.com:80/stats/usage_200806.html also reveals several request IPs. Even if this is not a CIA website, there's a chance we could find the IP of the Iranian counter-intelligence in these IP list, it's mind blowing. There's lots of referrer spam too as well. Further HTML inspection however seems to show close relationship to that HTML and other confirmed hits.
https://web.archive.org/web/20101024140137/http://globaltourist.net/[globaltourist.net], if is actually a hit, likely has a https://web.archive.org/web/20031221163711/http://globaltourist.net/[a 2003 archive], which would be our earliest hit archive so far.
A fun fact is that looking at the source code of: https://web.archive.org/web/20130828122833/http://euronewsonline.net/euro_bus.php we noticed an interesting comment:
``
<!-- ImageReady Slices (enewsweather.psd) -->
``
which clarifies that the CIA likely used Adobe ImageReady to cut up the images for <CIA 2010 covert communication websites/Split header images>:
> Adobe ImageReady was a bitmap graphics editor that was shipped with Adobe Photoshop for six years. It was available for Windows, Classic Mac OS and Mac OS X from 1998 to 2007. ImageReady was designed for web development and closely interacted with Photoshop
We also understand that the tool likely outputs the layout to HTML directly, and leaks the adobe projects filenames (.pds files) in the process.
= Understanding the state of 3x3 <matrix multiplication>
{parent=Updates}
After yet another awesome announcement by <Deepmind> that it had improved theoretical 4x4 matrix multiplication reducing the number of scalar multiplications with its <AlphaEvolve> system, I decided to have a look at the smallest open size 3x3 to understand what was going on in there.
I've dumped what I gathered at:
* https://math.stackexchange.com/a/5065930/53203
* https://www.reddit.com/r/math/comments/1komgnr/til_you_can_multiply_two_3x3_matrices_with_only/
Announced at:
* https://mastodon.social/@cirosantilli/114523670306611862
* https://x.com/cirosantilli/status/1923749179458773429
= Translation of <Xi Jinping> saying those against raise their hands
{parent=Updates}
{tag=China Dictatorship}
I can't believe I had missed that meme until now, and I couldn't resist the temptation to translate it
\Video[https://www.youtube.com/watch?v=mLJOsj0sOR8]
{title=<Xi Jinping> saying those against raise their hands (2017)}
\Video[https://www.youtube.com/watch?v=9iMFeINLUpk]
{title=<Xi Jinping> saying those against raise their hands (2022)}
\Video[https://www.youtube.com/watch?v=SCbat53DgSc]
{title=Those against raise their hands <Xi Jinping> remix by <Ciro Santilli>}
{description=I also couldn't resist a <Kdenlive> exercise in voice sampling. It was remarkably easy to do, not bad.}
\Video[https://www.youtube.com/watch?v=-Sg50edXPdM]
{title=Al Capone hits a homerun scene from the film The Untouchables}
\Video[https://www.youtube.com/watch?v=Yg48GsMJeaE]
{title=Committee meeting scene from the film The Death of Stalin}
Announced at:
* https://mastodon.social/@cirosantilli/114511641933313877
* https://x.com/cirosantilli/status/1922977763541033384
* https://github.com/cirosantilli/china-dictatorship/issues/231
* https://github.com/cirosantilli/china-dictatorship/issues/1483
= Two Linux Kernel Module Cheat videos
{parent=Updates}
I made not one but two quick presentation videos about my project <Linux Kernel Module Cheat>, an emulation setup to study and develop the <Linux kernel> and more:
* https://www.youtube.com/watch?v=HDJFyCma32U[]: a presentation of me talking about it, edited up from <Aratu Week 2024 Talk by Ciro Santilli>[my earlier presentation at Aratu Week 2024]
* https://www.youtube.com/watch?v=fgDhe1tN50o[]: a demo of me running actually the project
I had meant to do this editing for a while and kept pushing it off because editing hurts, but finally sat down did it, partly prompted by my quick recent updates made to the projects part of <post OurBigBook job search round 2025>. At first I was thinking of making a single video, but after I recorded the demo a bit it seemed like two separate ones would make more sense.
I also created a bug report for <Kdenlive>, the video editor that I used, for a freeze that happens if you try to shift + delete the last item of the timeline: https://bugs.kde.org/show_bug.cgi?id=504103[]. Kdenlive is a good editor, but unfortunately it has new freezes and crashes relatively often.
One more useless task that I get off my head, on to the next!
\Video[https://www.youtube.com/watch?v=HDJFyCma32U]
{title=Linux Kernel Module Cheat presentation}
\Video[https://www.youtube.com/watch?v=fgDhe1tN50o]
{title=Linux Kernel Module Cheat demo}
Announced at:
* https://mastodon.social/@cirosantilli/114499747693243770
* https://x.com/cirosantilli/status/1922218407455277419
* https://www.linkedin.com/feed/update/urn:li:share:7327984418143281152/
* https://www.facebook.com/cirosantilli/posts/pfbid02UVvFPMJU3UBhSdPxJ2Ev2LrektF7TMmeU8YQw1UE58mT5jMKjuGXYj5DNeUnWzT7l
= Post OurBigBook job search round 2025
{parent=Updates}
I shouldn't be doing this on funded <OurBigBook> time which is until the end of May, but I was getting too nervous and decided to start a casual job search to test the waters.
In particular I want to see if I can get past the HR lady step without toning down my online profiles. If nothing works out for the next round I'll be hiding anything too spicy like:
* prominently seeking funding for <OurBigBook> on my <LinkedIn> profile
* <CIA 2010 covert communication websites> references. This will be my first job hunt since I have published that article. Wish me luck.
* https://cirosantilli.com/china-dictatorship/gay-putin[gay Putin] profile picture on <Stack Overflow>
Another interesting point is to see if French companies are more likely to reply given that <Ciro Santilli> studied at <École Polytechnique> which the French worship.
\Image[https://web.archive.org/web/20240707132335im_/https://cdn.i-scmp.com/sites/default/files/styles/1200x800/public/images/methode/2017/04/06/0a2ae706-1a94-11e7-b4ed-ac719e54b474_1280x720_145124.jpg?itok=1PDxSxTA]
{title=https://cirosantilli.com/china-dictatorship/gay-putin[Gay Putin], currently used in <Ciro Santilli's Stack Overflow profile>}
{description=Ciro's profiles may be a bit too much for the HR ladies who reject his job applications on the spot. To be fair, perhaps not enough years of experience for certain applications and job hopping may have something to do with it too. But since they don't ever tell you anything not to get sued, we'll never know.}
I'm looking in particular either for:
* <machine learning>-adjacent jobs in companies that seem to be doing something that could further <AGI>, e.g. <automatic code generation> or <robotics> would be ideal
* <quantum computing>
* <systems programming>, which is what I actually have work experience with
I spent the last two weeks doing that:
* one week browsing everything of interest in <London> and <Paris> and sending applications to anything that seemed both relevant and interesting. Maintaining an application list at: <Job application by Ciro Santilli>{full}.
* one week on a very laborious but somewhat interesting take home exercise for <Linux Kernel> engineer a <Canonical (company)>, makers of <Ubuntu>.
I had a week to finish 5 practical coding and packaging questions, and I tried to do everything as perfectly as possible, but I somewhat underestimated the amount of work and wait needed to do everything and didn't manage to finish question 4 and missed 5. Oops let's see how that goes.
At least this had a few good outcomes for the Internet as I tried to document things as nicely as I could where they were missing from <Google> as usual:
* I re-tested <Linux Kernel Module Cheat> and made some small improvements. Things still worked from a <Ubuntu 24.10> host (using Docker to <Ubuntu 22.04>), and I also checked that kernel 6.8 builds and GDB step debugs after adding the newly required config `CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT`, also mentioned that at: https://stackoverflow.com/questions/5416406/why-are-there-no-debug-symbols-in-my-vmlinux-when-using-gdb-with-proc-kcore/79598380#79598380[Why are there no debug symbols in my vmlinux when using gdb with /proc/kcore?]
* I contributed some simple updates to https://github.com/martinezjavier/ldd3 getting it closer to work on Linux kernel v6.8. That repository aims to keep the venerable examples from Linux kernel module book <LDD3> alive on newer kernels, and is a very good source for kernel module developers.
* https://stackoverflow.com/questions/37507320/how-to-compile-a-linux-kernel-module/79594642#79594642[How to compile a Linux kernel module?]: wrote a quick Ciro-approved tutorial
* https://stackoverflow.com/questions/6034745/dynamic-array-in-linux-kernel-module/79601790#79601790[Dynamic array in Linux kernel module]: I gave an educational example of a dynamic byte array (like std::string) using the kvmalloc family of allocators
* <quickemu>: this is a good <emulator manager> and I think I'll be using it for <Ubuntu> images when needed from now on. I wrote:
* https://askubuntu.com/questions/884534/how-to-run-ubuntu-desktop-on-qemu/1545712#1545712[How to run Ubuntu desktop on QEMU?]: an introductory tutorial to the software as their README is not that good as is often the case. It's hard for project authors to predict what new users want or not. This is my second answer to this question, the https://askubuntu.com/questions/884534/how-to-run-ubuntu-desktop-on-qemu/1046792#1046792[previous one] focusing on a more manual approach without third party helpers.
* https://www.reddit.com/r/commandline/comments/v85fmx/how_to_share_folder_between_guesthost_quickemu/[How to share folder between guest/host? (Quickemu)]: I explained how to setup a 9p mount to share a directory between guest and host
* https://askubuntu.com/questions/496549/error-you-must-put-some-source-uris-in-your-sources-list/857433#857433[Error :: You must put some 'source' URIs in your sources.list]: updated this answer for <Ubuntu 24.04>. This issue comes up when you want to do either of:
```
sudo apt build-dep
sudo apt source
```
which don't work by default, and my answer explains how to do it from the GUI and CLI. The CLI method is specially important for <Docker> images. Since Ubuntu doesn't offer a stable CLI method for this, the method breaks from time to time and we have to find the new config file to edit.
* https://askubuntu.com/questions/248914/what-is-hardware-enablement-hwe/1546835#1546835p[What is hardware enablement (HWE)?]: I learned a bit better how Ubuntu structures its kernel releases for each <Ubuntu release>
\Image[https://web.archive.org/web/20250506101112if_/https://i.sstatic.net/3sqTiQlD.png]
{title=Linux kernel version used for each <Ubuntu release>}
{height=600}
{description=In particular, Ubuntu has HWE kernels which are updated kernels for older releases. E.g.
* 24.04.0 and 24.04.1 had kernel 6.8
* 24.04.2 moved up to kernel 6.11, the same one used in 24.10, to support newer hardware
}
{source=https://ubuntu.com/about/release-cycle}
Some of the main issues I had were:
* <compiling Linux kernel for Ubuntu> is extremely slow. I was used to compiling for embedded system with <Buildroot>, which finishes in minutes, but for Ubuntu is hours, presumably because they enable as many drivers as possible to make a single ISO work on as many different computers as possible, which makes sense, but also makes development harder
* my <QEMU> setup for Ubuntu was not quite as streamlined and I relearned a few things and set up <quickemu>. By chance I had recently come across <quickemu> for testing <OurBigBook> on <MacOS>, but I had to learn a bit how to set it up reasonably too
I'll make sure to add two weeks of OurBigBook work after May to make up for this.
= 60 new CIA website screenshots discovered on CQ Counter
{parent=Updates}
{scope}
{tag=CIA 2010 covert communication websites}
= I inspected CIA websites that have no Wayback Machine archive on CQ Counter to reveal new previously unseen screenshots
{synonym}
This is an update to the article: <CIA 2010 covert communication websites>{full}
While procrastinating I suddenly remembered that http://cqcounter.com/siteinfo/ has screenshots of many many old websites, and I decided to look at possible hits in known IP ranges for which the Wayback Machine archive was broken.
Luckily I had already maintained a clear list of known domains in IP ranges which had no or broken wayback machine archive, so I just went over those.
This led to finding 60 novel screenshots of previously examined domains that are in common CIA-style, thus confirming them as hits beyond reasonable doubt in my mind. This also publicly revealed for the first time how a few new websites looked like, and what was their content, and in particular the target language, which could sometimes not be easily determined from the domain name alone.
This novel CQ Counter screenshot interpretation, plus a few new random discoveries and a slight relaxation of fingerprint requisites described described below moves us to 473 hits up from the previous 397!
The newly found websites were all just soulless bulk or mildly cute like the vast majority of them, but I did find found a few new screenshots of <CIA 2010 covert communication websites/USA spying on its own allies>[CIA websites that targeted other democracies]:
* http://cqcounter.com/whois/www/affairesdumonde.com.html[affairesdumonde.com] (France)
* http://cqcounter.com/whois/www/romulusactualites.com.html[romulusactualites.com] (France)
* http://cqcounter.com/whois/www/ordenpolicial.com.html[ordenpolicial.com] (Spain)
* http://cqcounter.com/whois/www/vejaaeuropa.com.html[vejaaeuropa.com] (Brazil)
* http://cqcounter.com/whois/www/european-footballer.com.html[european-footballer.com] (Croatia)
I've also decided to now classify https://web.archive.org/web/20110424044637/http://www.garanziadellasicurezza.com:80/[garanziadellasicurezza.com] (Italy) as a hit due to various forms of supporting evidence being present. The archive is very broken however unfortunately.
\Image[https://github.com/cirosantilli/media/blob/master/cia-2010-covert-communication-websites/screenshots/cqcounter/affairesdumonde.com.gif?raw=true]
{title=2011 <CIA 2010 covert communication websites/cqcounter> archive of http://cqcounter.com/whois/www/affairesdumonde.com.html[affairesdumonde.com] targeting France}
{height=545}
{source=http://cqcounter.com/whois/www/affairesdumonde.com.html}
\Image[https://github.com/cirosantilli/media/blob/master/cia-2010-covert-communication-websites/screenshots/cqcounter/romulusactualites.com.gif?raw=true]
{title=2011 <CIA 2010 covert communication websites/cqcounter> archive of http://cqcounter.com/whois/www/romulusactualites.com.html[romulusactualites.com] targeting France}
{height=545}
{source=http://cqcounter.com/whois/www/romulusactualites.com.html}
\Image[https://github.com/cirosantilli/media/blob/master/cia-2010-covert-communication-websites/screenshots/cqcounter/ordenpolicial.com.gif?raw=true]
{title=2011 <CIA 2010 covert communication websites/cqcounter> archive of http://cqcounter.com/whois/www/ordenpolicial.com.html[ordenpolicial.com] targeting Spain}
{height=545}
{source=http://cqcounter.com/whois/www/ordenpolicial.com.html}
\Image[https://github.com/cirosantilli/media/blob/master/cia-2010-covert-communication-websites/screenshots/cqcounter/vejaaeuropa.com.gif?raw=true]
{title=2011 <CIA 2010 covert communication websites/cqcounter> archive of http://cqcounter.com/whois/www/vejaaeuropa.com.html[vejaaeuropa.com] targeting Brazil}
{height=545}
{source=http://cqcounter.com/whois/www/vejaaeuropa.com.html}
\Image[https://github.com/cirosantilli/media/blob/master/cia-2010-covert-communication-websites/screenshots/cqcounter/european-footballer.com.gif?raw=true]
{title=2011 <CIA 2010 covert communication websites/cqcounter> archive of http://cqcounter.com/whois/www/european-footballer.com.html[european-footballer.com] targeting Croatia}
{height=545}
{source=http://cqcounter.com/whois/www/european-footballer.com}
The fingerprint of "having a visually similar CQ Counter screenshot" is definitely weaker than a Wayback Machine archive as we only have a screenshot and can't inspect the HTML to find the communication mechanism. But when the screenshot is perfectly in CIA style and in a known IP range, the evidence is too strong and we'll consider it as a hit moving forward.
I'm also going to reclassify a few previously known domains in confirmed IP ranges as hits as hits either when:
* they have Wayback Machine archives with matching visual style
* they have broken Wayback Machine archives but with indication of comms or known HTML elements like rss-item
This is a slight moving of goalposts, but those cases just feel overwhelmingly probably.
I love how this project has led me to use whatever random sources come in hand! CQ Counter is the ONLY website that I know of besides the Wayback Machine that has historical screenshots of a huge number of domains. Their database is VERY complete. But they are so obscure!
They even have the old IP of the domain. But because they don't have reverse IP to domain reverse search, and are heavily CAPTCHAed preventing search engines from properly indexing them, we can't use them to fill in existing IP ranges... So the search for the most complete DNS database that doesn't cost 15k USD like DomainTools continues https://www.reddit.com/r/OSINT/comments/1j8uasm/does_domaintools_offer_historical_reverse_ip_ie/
Interestingly a large number of the websites with broken Wayback Machine are from regions outside of the USA, presumably being slower to load from Wayback Machine US-based servers makes he archives more likely to break.
Announced at:
* https://mastodon.social/@cirosantilli/114341595369273933
* https://x.com/cirosantilli/status/1912095004706677144
* https://www.linkedin.com/feed/update/urn:li:ugcPost:7317939076110671872/
* https://www.facebook.com/cirosantilli/posts/pfbid0fJsysJFNpQTmRcpHdVfSmNRYoUtCXRxoN5BNwGMFPwPLn1KYebVFBhqe8NQZBXqCl
= New IP ranges established
{parent=60 new CIA website screenshots discovered on CQ Counter}
I have come to realize that a few of the websites do seem to use virtual hosting, i.e. multiple domains per IP, and I put a bit more manual effort into looking at known possible IPs that had a relatively small number of domains in them.
This led either to finding a few new domains, or placing existing domains in the same IP as another domains.
From now on I'll consider any IP with more than two hits to be an "IP range".
The new finds around other pre-existing domains are:
* 199.19.110.7 (https://web.archive.org/web/20110209183352/http://theworldnewsfeeds.com/[theworldnewsfeeds.com]):
* https://web.archive.org/web/20110106000954/http://classymotors.net/[classymotors.net]
* https://web.archive.org/web/20100331193337/http://www.russiansportsworld.com/[russiansportsworld.com]
* https://cqcounter.com/whois/www/urbestbod.com.html[urbestbod.com]
* 207.150.191.68 (https://web.archive.org/web/20110201212849/http://technologypresstoday.com/[technologypresstoday.com])
* https://web.archive.org/web/20100512232600/http://kickofffootballnews.com/[kickofffootballnews.com]
* 216.93.248.194 (https://web.archive.org/web/20110201124007/http://esmundonoticias.com/[esmundonoticias.com])
* https://cqcounter.com/whois/www/tech-geek-news.com.html[tech-geek-news.com]
* 216.104.38.110 (https://web.archive.org/web/20110207144440/http://all-sport-headlines.com/[all-sport-headlines.com])
* https://web.archive.org/web/20110202203450/http://wahidfutbol.com/[wahidfutbol.com]
* https://web.archive.org/web/20110207183336/http://wildbirds-seasia.com/[wildbirds-seasia.com]
Furthermore, I now found new hits on nearby IPs of 209.162.192.49 https://web.archive.org/web/20100429002010/http://rastadirect.net/[rastadirect.net] which was given by Reuters, thus establishing a new IP range there. Apparently I had simply failed to check IPs around one of the possible reverse IPs for it. The new finds are:
* 209.162.192.44 https://web.archive.org/web/20110202185205/http://thejewelofsouthamerica.com/[thejewelofsouthamerica.com]
* 209.162.192.51 https://web.archive.org/web/20110208004120/http://yellow-chair-report.com/[yellow-chair-report.com]
* 209.162.192.57 https://web.archive.org/web/20101229082120/http://globalnewsreports.net/[globalnewsreports.net]
* 209.162.192.59 https://web.archive.org/web/20091219195929/http://easytravelsite.net/[easytravelsite.net]
\Image[https://github.com/cirosantilli/media/blob/master/cia-2010-covert-communication-websites/screenshots/thejewelofsouthamerica.com.jpg?raw=true]
{title=2010 <Wayback Machine> archive of https://web.archive.org/web/20110202185205/http://thejewelofsouthamerica.com/[thejewelofsouthamerica.com]}
{height=1024}
{source=https://web.archive.org/web/20110202185205/http://thejewelofsouthamerica.com/}
= Mass Deface III pastebin better understood
{parent=60 new CIA website screenshots discovered on CQ Counter}
I also understood a bit better the Mass Deface III pastebin https://pastebin.com/CTXnhjeS discovered by https://x.com/shakirov2036[Oleg Shakirov] which contains some hits: I think that the hits are purely coincidental when some hacker broke into "Condor Hosting" systems and then defaced several websites it contained, inadvertently also taking down some CIA websites along the day which is funny.
And this seems to be a small host chosen by the CIA, so it contained a disproportionate dense concentration of CIA hits. But the original hackers likely had no idea of what they did. https://www.zone-h.com/mirror/id/18994983 suggests that Iranian hacker group Sejeal was behind the defacing.
= Virtual IPs squeezed further on viewdns.info
{parent=60 new CIA website screenshots discovered on CQ Counter}
I also squeezed a few of the previously known IPs without clear range a bit harder on https://viewdns.info[], as I now understand that there do exist a few websites that share the same IPs. This led to X entirely new hits, and also me moving a few domains that were previously marked as "unknown range" to a specific IP when two or more domains were found in a given IP.
= whoisXMLAPI whois history squeezed further and better understood
{parent=60 new CIA website screenshots discovered on CQ Counter}
I also squeezed <whoisXMLAPI> harder but nothing much came out.
The vast majority of domains use <domainsbyproxy.com> privacy which does not seem to leak any information on their whois except dates which appear well spread out.
I did notice however that some of the sites are registered with <Network Solutions, LLC> and a few others in Godaddy without <domainsbyproxy.com>. These have names of people on them, and I did as many <whoisXMLAPI> searches for those names as I had the patience for.
A few had another known hit on the results, and a new hit domain came out of this: http://cqcounter.com/whois/www/rolling-in-rapids.com.html[rolling-in-rapids.com] which as it turns out has no Wayback machine archive, but does have a CQ Counter archive which allowed me to confirm the hit page style. That one was found by reverse searching for the registrant of `alljohnny.com`, "Glaze, L." on https://tools.whoisxmlapi.com/reverse-whois-search[] and its IP matches 65.218.91.9 from `welcometonyc.net`.
If anyone would like to donate 140 USD to dump into whoisXMLAPI I could dump all the known hit histories and have a look at them to see if anything else comes out on reverse search.
\Image[https://github.com/cirosantilli/media/blob/master/cia-2010-covert-communication-websites/screenshots/cqcounter/rolling-in-rapids.com.gif?raw=true]
{height=545}
{source=http://cqcounter.com/whois/www/rolling-in-rapids.com.html}
= Better understanding and understanding of IP range owners
{parent=60 new CIA website screenshots discovered on CQ Counter}
I also started to better note down the IP owner and location of each IP range from <viewdns.info> at <CIA 2010 covert communication websites/Hits with nearby IP hits>, as this is an important information which could offer further clues. All IPs in each range belong to the same provider, since IPs are generally bought in blocks. For example:
* 62.22.60.49 telecom-headlines.com was owned by the company UUNET and hosted from Spain, and the same is true for neighboring IPs such as:
* 62.22.60.48: currentcommunique.com
* 62.22.60.52: collectedmedias.com
* 63.131.229.12 cyberreportagenews.com was owned by the company ADHOST and hosted from Coeur d'Alene - United States. Interestingly US-based hosts also offer city-level information while foreign ones don't.
These don't necessarily tell us directly who the CIA hosted with, since in some cases hosting providers can indirectly rent out IPs from other providers, e.g. Heroku uses AWS. But it does suggest that some nearby IP ranges were done on the same hosting provider while others weren't.
= Possible cute internal information leaks on a few sites
{parent=60 new CIA website screenshots discovered on CQ Counter}
I'm not sure about this and it's not very useful, but the following were cute.
216.105.98.132 europeantravelcafe.com is a very likely hit that:
* had a "Plan Your Trip" link in 2010: https://web.archive.org/web/20100724024623/http://www.europeantravelcafe.com/ linking to an external website: https://secure-cert.net/~etc/transport.html
* and had the link removed in 2011: https://web.archive.org/web/20110201192245/http://europeantravelcafe.com/
This suggests that this was an internal site management link for the site operators which was later noticed and removed across versions, leaking the management method in the process.
\Image[https://github.com/cirosantilli/media/blob/master/cia-2010-covert-communication-websites/screenshots/www.europeantravelcafe.com-arrow.jpg?raw=true]
{title=2010 <Wayback Machine> archive of https://web.archive.org/web/20100724024623/http://www.europeantravelcafe.com/[www.europeantravelcafe.com]}
{description=The suspicious "Plan Your Trip" link that https://web.archive.org/web/20110201192245/http://europeantravelcafe.com/[was later removed] is highlighted with an arrow made by us.}
{height=1024}
{source=https://web.archive.org/web/20100724024623/http://www.europeantravelcafe.com/}
https://web.archive.org/web/20110207150839/http://webofcheer.com/[199.187.208.12 webofcheer.com] has an exceedingly weird HTML page title:
> pg1c
which feels like it could be a leak of an internal identifier for this website, or perhaps even worse, for the CIA program itself.
= Full list of newly found screenshots
{parent=60 new CIA website screenshots discovered on CQ Counter}
* http://cqcounter.com/whois/www/affairesdumonde.com.html[affairesdumonde.com]
* http://cqcounter.com/whois/www/american-historyonline.com.html[american-historyonline.com]
* http://cqcounter.com/whois/www/arborstribune.org.html[arborstribune.org]
* http://cqcounter.com/whois/www/atthemovies.biz.html[atthemovies.biz]
* http://cqcounter.com/whois/www/aviation-navigation.com.html[aviation-navigation.com]
* http://cqcounter.com/whois/www/bookmarksthis.com.html[bookmarksthis.com]
* http://cqcounter.com/whois/www/bosniakbusinessnews.com.html[bosniakbusinessnews.com]
* http://cqcounter.com/whois/www/cityworldnewsnow.com.html[cityworldnewsnow.com]
* http://cqcounter.com/whois/www/classic-rocktopia.com.html[classic-rocktopia.com]
* http://cqcounter.com/whois/www/classicalmusic4arab.com.html[classicalmusic4arab.com]
* http://cqcounter.com/whois/www/delacorne.com.html[delacorne.com]
* http://cqcounter.com/whois/www/dynamicworldnews.com.html[dynamicworldnews.com]
* http://cqcounter.com/whois/www/econfutures.com.html[econfutures.com]
* http://cqcounter.com/whois/www/efiinvestment.com.html[efiinvestment.com]
* http://cqcounter.com/whois/www/etherealinspirations.net.html[etherealinspirations.net]
* http://cqcounter.com/whois/www/european-footballer.com.html[european-footballer.com]
* http://cqcounter.com/whois/www/focusonbokeh.com.html[focusonbokeh.com]
* http://cqcounter.com/whois/www/foodwineandsuch.com.html[foodwineandsuch.com]
* http://cqcounter.com/whois/www/fuenteneta.com.html[fuenteneta.com]
* http://cqcounter.com/whois/www/garundipost.com.html[garundipost.com]
* http://cqcounter.com/whois/www/gazingvoyage.com.html[gazingvoyage.com]
* http://cqcounter.com/whois/www/global-headlines.com.html[global-headlines.com]
* http://cqcounter.com/whois/www/goldeportesnoticias.com.html[goldeportesnoticias.com]
* http://cqcounter.com/whois/www/greek-news.info.html[greek-news.info]
* http://cqcounter.com/whois/www/i7diver.com.html[i7diver.com]
* http://cqcounter.com/whois/www/ilat-news.com.html[ilat-news.com]
* http://cqcounter.com/whois/www/internationalwhiskylounge.com.html[internationalwhiskylounge.com]
* http://cqcounter.com/whois/www/itechnewstoday.com.html[itechnewstoday.com]
* http://cqcounter.com/whois/www/kioni-sailing.com.html[kioni-sailing.com]
* http://cqcounter.com/whois/www/large-format-news.com.html[large-format-news.com]
* http://cqcounter.com/whois/www/luxuryfive.net.html[luxuryfive.net]
* http://cqcounter.com/whois/www/news-unlimited.info.html[news-unlimited.info]
* http://cqcounter.com/whois/www/newsforthetech.com.html[newsforthetech.com]
* http://cqcounter.com/whois/www/newsidori.com.html[newsidori.com]
* http://cqcounter.com/whois/www/noticiasdenuestromundo.com.html[noticiasdenuestromundo.com]
* http://cqcounter.com/whois/www/nrgconsultingandnews.com.html[nrgconsultingandnews.com]
* http://cqcounter.com/whois/www/ordenpolicial.com.html[ordenpolicial.com]
* http://cqcounter.com/whois/www/pohandakhbar.com.html[pohandakhbar.com]
* http://cqcounter.com/whois/www/prebitinvestment.com.html[prebitinvestment.com]
* http://cqcounter.com/whois/www/premierstriker.com.html[premierstriker.com]
* http://cqcounter.com/whois/www/rolling-in-rapids.com.html[rolling-in-rapids.com]
* http://cqcounter.com/whois/www/romulusactualites.com.html[romulusactualites.com]
* http://cqcounter.com/whois/www/russiaupdate.com.html[russiaupdate.com]
* http://cqcounter.com/whois/www/screencentral.info.html[screencentral.info]
* http://cqcounter.com/whois/www/sportsman-elite.com.html[sportsman-elite.com]
* http://cqcounter.com/whois/www/talkingpointnews.info.html[talkingpointnews.info]
* http://cqcounter.com/whois/www/teclafinance.com.html[teclafinance.com]
* http://cqcounter.com/whois/www/the-golden-rule.info.html[the-golden-rule.info]
* http://cqcounter.com/whois/www/the-news-zone.com.html[the-news-zone.com]
* http://cqcounter.com/whois/www/the-tech-mind.com.html[the-tech-mind.com]
* http://cqcounter.com/whois/www/theventurenews.info.html[theventurenews.info]
* http://cqcounter.com/whois/www/todaysportscores.com.html[todaysportscores.com]
* http://cqcounter.com/whois/www/topfootballnewsonline.com.html[topfootballnewsonline.com]
* http://cqcounter.com/whois/www/travel-passage.com.html[travel-passage.com]
* http://cqcounter.com/whois/www/urbestbod.com.html[urbestbod.com]
* http://cqcounter.com/whois/www/urouttahere.com.html[urouttahere.com]
* http://cqcounter.com/whois/www/vejaaeuropa.com.html[vejaaeuropa.com]
* http://cqcounter.com/whois/www/waronfilmonline.com.html[waronfilmonline.com]
* http://cqcounter.com/whois/www/worldfeedstoday.com.html[worldfeedstoday.com]
* http://cqcounter.com/whois/www/worldnewsandtravel.com.html[worldnewsandtravel.com]
= New Bitcoin Base58 messages found due to a new paper: Bitcoin Burn Addresses: Unveiling the Permanent Losses and Their Underlying Causes
{parent=Updates}
{tag=Cool data embedded in the Bitcoin blockchain}
This is an update to the article: <Cool data embedded in the Bitcoin blockchain/Base58 messages>{full}.
While self Googling a bit, I found this paper <Cool data embedded in the Bitcoin blockchain/Bitcoin Burn Addresses: Unveiling the Permanent Losses and Their Underlying Causes> by Mohamed el Khatib and Arnaud Legout briefly cited <Cool data embedded in the Bitcoin blockchain>.
In that paper, they attempted to find <Base58> Bitcoin addresses that looked as if they were fake and used only in order to contain Base58 data.
While we had previously explored a few <Base58> messages, this was something that had been done mostly ad-hoc simply by looking at transactions with large amounts of unspent outputs. As a result, any smaller messages were missed.
By looking through the data produced by the those researchers, we managed to find many new <Base58> messages, and highlighted many of the most interesting early messages at: <Cool data embedded in the Bitcoin blockchain/Base58 messages>{full}.
The cutest new example is https://www.blockchain.com/explorer/transactions/btc/dea183908e40e0cebfee6a0d8362b299e07cf193fbc02ffd3308b43781eca208[tx dea183908e40e0cebfee6a0d8362b299e07cf193fbc02ffd3308b43781eca208] (2011-11-24) containing Eric Lombrozo's minimalistic wedding contract to Sandra Sandic:
``
1EricLombrozoXXXXXXXXXXXXXXXWACBVB
1969SandraSandicXXXXXXXXXXXXXvdEiU
``
Announced at:
* https://mastodon.social/@cirosantilli/114218387671424003
* https://x.com/cirosantilli/status/1904211594000715781
= 44 new CIA websites
{parent=Updates}
{tag=CIA 2010 covert communication websites}
This is an update to the article: <CIA 2010 covert communication websites>{full}
I found 44 new covert websites made by the CIA around 2010 bringing the total to 397!
Most websites were boring as usual, but one was slightly cooler: https://web.archive.org/web/20110207150839/http://webofcheer.com/[webofcheer.com] is a comedy fansite featuring Johnny Carson, Charles Chaplin, Rowan Atkins (of Mr. Bean fame), The Three Stooges and some other <Americans> no one knows about anymore. There must have been a massive Johnny Carson amongst the contractors at that time, given that we previously also knew about https://web.archive.org/web/20040113025122/http://alljohnny.com/[`alljohnny.com`], a site dedicated fully to him! Both of these sites also serve as some of the earliest examples we've got so far, dating back to 2004 and 2005.
\Image[https://github.com/cirosantilli/media/blob/master/cia-2010-covert-communication-websites/screenshots/webofcheer.com.jpg?raw=true]
{title=2011 <Wayback Machine> archive of https://web.archive.org/web/20110207150839/http://webofcheer.com/[webofcheer.com]}
{height=717}
{source=https://web.archive.org/web/20040113025122/http://alljohnny.com/}
\Image[https://github.com/cirosantilli/media/blob/master/cia-2010-covert-communication-websites/screenshots/webofcheer.com-scroll.jpg?raw=true]
{title=2011 <Wayback Machine> archive of https://web.archive.org/web/20110207150839/http://webofcheer.com/[webofcheer.com] scrolled to show Johnny Carson}
{height=717}
{source=https://web.archive.org/web/20040113025122/http://alljohnny.com/}
\Image[https://github.com/cirosantilli/media/blob/master/cia-2010-covert-communication-websites/screenshots/alljohnny.com.jpg?raw=true]
{title=2004 <Wayback Machine> archive of https://web.archive.org/web/20040113025122/http://alljohnny.com/[alljohnny.com]}
{description=This one was a previously known website featuring Johnny Carson.}
{height=717}
{source=https://web.archive.org/web/20040113025122/http://alljohnny.com/}
Another cool discovery is that I found the <#Getty Images> source of the #Jedi boy on their <Star Wars> themed site https://web.archive.org/web/20101230033220/http://starwarsweb.net/[starwarsweb.net]: https://web.archive.org/web/20101230033220/http://starwarsweb.net/ The photo can still be licensed today as of 2025: https://www.gettyimages.co.uk/detail/photo/little-jedi-royalty-free-image/172984439[]. I found it by searching for "jedi boy" on https://gettyimages.co.uk. The photo is credited to username `madisonwi`, presumably an alias of a photographer from <#Madison, Wisconsin>. Inspired by this I <reverse image searched> and found the source of many other stock images from other websites, and I pinged their authors whenever I could locate them e.g. https://x.com/cirosantilli/status/1899750172260806711[].
\Image[https://web.archive.org/web/20250312060328if_/https://media.gettyimages.com/id/172984439/photo/little-jedi.jpg?s=2048x2048&w=gi&k=20&c=vKTUlTKpYtPcL8wQobbZMQEhU1CZefV7VlXxDExqZn8=]
{title=Stock photo of a Jedi boy from Getty Images used on https://web.archive.org/web/20101230033220/http://starwarsweb.net/[starwarsweb.net]}
{height=700}
{source=https://www.gettyimages.co.uk/detail/photo/little-jedi-royalty-free-image/172984439}
\Image[https://raw.githubusercontent.com/cirosantilli/media/master/cia-2010-covert-communication-websites/screenshots/starwarsweb.net.jpg]
{title=2010 <Wayback Machine> archive of https://web.archive.org/web/20101230033220/http://starwarsweb.net/[starwarsweb.net]}
{height=1400}
There were two small advances that led to the discovery of new domains:
* while looking for a way to procrastinate I decided to scrape https://justdropped.com/drops/ for fun. That website lists expired domain names and see if it would yield any new results.
I had already <CIA 2010 covert communication websites/Expired domain trackers>[scrapped other expired domain websites before and used that data], and I hoped that this one would provide some new domain hits, even though it had very large overlap with the other websites I had scraped domains from previously.
Such domain name lists tend to contain all SCAM domains in existence, since those inevitably expire once the scammers are caught.
* even more importantly, I noticed by chance that I was being too strict on a small part of my fingerprinting which was excluding a few good domains, by removing any hits that had multiple archives of the <CIA 2010 covert communication websites/Communication mechanism>
With those two new developments, I then kicked off my pre-existing search pipelines searching for domain names with the word `news` on them, an amazingly efficient heuristic because many of the websites were disguised as news aggregators, and after a few hours theses new hits emerged. A few of those also led to the discovery of new IPs which then led to new domains.
One entirely new IP range was found around https://web.archive.org/web/20090420131359/http://fastnews-online.com/[fastnews-online.com] from 208.93.112.105 to 208.93.112.125. There were many domain names with very promising names in the range, but unfortunately for some reason most didn't have <Wayback Machine> Archives so I didn't count them as hits as per my guidelines.
\Image[https://raw.githubusercontent.com/cirosantilli/media/master/cia-2010-covert-communication-websites/screenshots/fastnews-online.com.jpg]
{title=2009 <Wayback Machine> archive of https://web.archive.org/web/20090420131359/http://fastnews-online.com/[fastnews-online.com]}
{height=800}
Also the newly found https://web.archive.org/web/20110228051404/http://todaysengineering.com/[todaysengineering.com] at 208.254.38.39 appears to form an IP range with the previously known https://web.archive.org/web/20110207210023/http://nejadnews.com/[nejadnews.com] at 208.254.38.56, but I couldn't find any other domains in the region with our current data sources.
\Image[https://raw.githubusercontent.com/cirosantilli/media/master/cia-2010-covert-communication-websites/screenshots/todaysengineering.com.jpg]
{title=2011 <Wayback Machine> archive of https://web.archive.org/web/20110228051404/http://todaysengineering.com/[todaysengineering.com]}
{height=800}
All other domains either slot into previously known IP ranges, or more commonly don't currently have a known IP, though they would likely just slot in existing ranges if we had better data.
Thanks to <#Jack Rhysider> from the <#Darknet Diaries> podcast for pointing me to the existing of the <CIA 2010 covert communication websites/Reuters article>[2022 Reuters article] that kickstarted my research on the subject!
One outcome of this update is that I've increased my <jq> level to better automate the maintenance of the https://github.com/cirosantilli/media/blob/master/cia-2010-covert-communication-websites/hits.json[hits.json] file were I store all the known websites in <JSON> format. I love that tool so much, I managed to merge two JSONs with it removing duplicates and then sort the JSON as desired. Beauty.
The full list of newly found websites is:
* https://web.archive.org/web/20110202110916/http://cellar-notes.com/[cellar-notes.com]
* https://web.archive.org/web/20110201114414/http://dailywellnessnews.com/[dailywellnessnews.com]
* https://web.archive.org/web/20110202185635/http://differentviewtoday.com/[differentviewtoday.com]
* https://web.archive.org/web/20110209045123/http://dryterrainnews.com/[dryterrainnews.com]
* https://web.archive.org/web/20100206221718/http://euronewsonline.net/[euronewsonline.net]
* https://web.archive.org/web/20090420131359/http://fastnews-online.com/[fastnews-online.com]
* https://web.archive.org/web/20110203041325/http://financecentraltoday.com/[financecentraltoday.com]
* https://web.archive.org/web/20101229090542/http://globalcitizennews.net/[globalcitizennews.net]
* https://web.archive.org/web/20101229090625/http://globalinvestmentnews.net/[globalinvestmentnews.net]
* https://web.archive.org/web/20110201180802/http://inkfreenews.com/[inkfreenews.com]
* https://web.archive.org/web/20110202031020/http://internationalnewsworthiness.com/[internationalnewsworthiness.com]
* https://web.archive.org/web/20110202054628/http://intoworldnews.com/[intoworldnews.com]
* https://web.archive.org/web/20100513182623/http://lasthournews.com/[lasthournews.com]
* https://web.archive.org/web/20100513190714/http://latinamericanewsbeat.com/[latinamericanewsbeat.com]
* https://web.archive.org/web/20100514160029/http://localtoglobalnews.com/[localtoglobalnews.com]
* https://web.archive.org/web/20100515102926/http://magneticfieldnews.com/[magneticfieldnews.com]
* https://web.archive.org/web/20100517070603/http://middle-east-newstoday.com/[middle-east-newstoday.com]
* https://web.archive.org/web/20100923090646/http://mideasttoday.net/[mideasttoday.net]
* https://web.archive.org/web/20110207171340/http://mydailynewsreport.com/[mydailynewsreport.com]
* https://web.archive.org/web/20110203000411/http://mynepalnews.com/[mynepalnews.com]
* https://web.archive.org/web/20130704042538/http://nbanewsroundup.com/[nbanewsroundup.com]
* https://web.archive.org/web/20110207210023/http://nejadnews.com/[nejadnews.com]
* https://web.archive.org/web/20110208004005/http://networkconnectionsite.com/[networkconnectionsite.com]
* https://web.archive.org/web/20110208063146/http://news-and-sports.com/[news-and-sports.com]
* https://web.archive.org/web/20101226034643/http://newsdelivered.net/[newsdelivered.net]
* https://web.archive.org/web/20110106212939/http://pondernews.net/[pondernews.net]
* https://web.archive.org/web/20110128181622/http://profile-news.com/[profile-news.com]
* https://web.archive.org/web/20110201122747/http://purlicue-news.com/[purlicue-news.com]
* https://web.archive.org/web/20110208085035/http://sandstormnews.com/[sandstormnews.com]
* https://web.archive.org/web/20110131054130/http://segomonews.com/[segomonews.com]
* https://web.archive.org/web/20110201184753/http://shadesofnews.com[shadesofnews.com]
* https://web.archive.org/web/20110201212849/http://technologypresstoday.com/[technologypresstoday.com/]
* https://web.archive.org/web/20110202221408/http://the-news-scene.com/[the-news-scene.com]
* https://web.archive.org/web/20110202151142/http://thefootball-life.com/[thefootball-life.com]
* https://web.archive.org/web/20091209022355/http://thefreshnews.com/[thefreshnews.com]
* https://web.archive.org/web/20110202221328/http://thenewsofpakistan.com/[thenewsofpakistan.com]
* https://web.archive.org/web/20110207201846/http://totallynewsnow.com/[totallynewsnow.com]
* https://web.archive.org/web/20080314234452/http://travelxtreme.net/[travelxtreme.net]
* https://web.archive.org/web/20110207150839/http://webofcheer.com/[webofcheer.com]
* https://web.archive.org/web/20110208191615/http://wiredworldnews.com/[wiredworldnews.com]
* https://web.archive.org/web/20101226225311/http://world-news-online.net/[world-news-online.net]
* https://web.archive.org/web/20110210004831/http://worldaroundyunnan.com/[worldaroundyunnan.com]
* https://web.archive.org/web/20110210051430/http://worldofonlinenews.com/[worldofonlinenews.com]
Announced at:
* https://mastodon.social/@cirosantilli/114156495883418926
* https://x.com/cirosantilli/status/1900249928653271334
* https://www.facebook.com/cirosantilli/posts/pfbid02LbrfezGmFik582d6H7ZEoCf9bwpU73vyivdGLVbbzWjejWLS5Rv9EjGNXBPQppUBl
* https://www.linkedin.com/posts/cirosantilli_httpslnkdineyu8qwc-i-found-44-new-covert-activity-7306015949374058496-X5zl/
= OurBigBook Project Update March 2025
{parent=Updates}
{scope}
= OurBigBook Project Update March 2024
{synonym}
This is a summary of the status of the <OurBigBook Project>, focusing notably on the past 9 months that I've been able to devote fully to it starting June 2024 notably due to the anonymous <Sponsor/1000 Monero donation> and other supporters.
I have 3 months left and after unless some crazy person gives more money, I'll go back to some generic programming job that could be done by many other people so that my wife won't kill me. Hopefully I'll find something in <quantum computing> or <AGI research> this time that is not too boring, but we'll see.
I should also note that I have raised my requirement for a second year full time from 100k USD to 200k USD, such that there are about only \b[144k USD] missing as of writing, a bargain. See also <sponsor>{full}. I have also set a \b[2M USD retirement goal] in case someone wants to free me to lurk after university students for the rest of my life. Creepy.
The reason for this increase is partly because I'm jealous watching my university peers getting relatively richer and richer than me. More seriously though, as I'm likely going to be looking for a job soon, I don't want to scare employers off too much thinking that it is likely that I'll be leaving in a few months too easily. Plus inflation and the natural lack of security that such endeavour brings.
\Video[https://www.youtube.com/watch?v=887jyJZnYkA]
{title=<OurBigBook Project> 2025 one year funded work debrief by <Ciro Santilli>}
{description=This video debriefs the end of the 1-year-funded <Sponsor/1000 Monero donation> work in 2025, in addition to some personal remarks and what might come next.}
= Metrics and rationales
{parent=OurBigBook Project Update March 2025}
Long story short, the project is so far a complete failure on the most important metric: number of regular users, which current sits at exactly one: myself.
There were notable users who found the project online and who actually tried to use the website for some content and provided extremely valuable feedback:
* http://ourbigbook.com/pioyi
* https://ourbigbook.com/sidstuff
Unfortunately after the period of a few weeks they stopped using it to follow their other priorities instead. Which is of course totally fine, however sad.
I still believe that the <OurBigBook Web> feature is a significant tech innovation that could make the website go big.
I also believe that the project gets many fundamentals of braindumping right, notably the infinitely deep table of contents without forced scoping, e.g.:
``
- Mathematics
- Calculus
``
does not make Calculus have an ID orr URL of `mathematics/calculus`, rather it's just `calculus`.
\Image[https://raw.githubusercontent.com/ourbigbook/ourbigbook-media/master/feature/x/hilbert-space-arrow.png]
{title=Internal cross file internal link uses only the leaf ID `hilbert-space`.}
{border}
{height=571}
But there is a fundamental difficulty in reaching critical mass to that self-sustaining point, as people don't seem to be convinced by these logical "my system is better" argument alone, as opposed to having them <Google> into stuff they need now and then understand that the project is awesome.
A closely related critical mass issue is that existing big multiuser knowledge base websites such as <Stack Overflow> and <Wikipedia> have a tremendous advantage on <PageRank>. No matter how useless a Wikipedia article about something is, it will always be on top of <Google> within a week of creation for title hits. And since the main goal of publishing your stuff is to get it seen, it makes much more sense for writers to publish on such existing websites whenever possible, because anywhere else it is way way less likely to be seen by anybody.
Even I end up writing way more on <Stack Overflow> than on <OurBigBook> as a programmer. But I still believe that there is a value to OurBigBook, for the usual reasons of:
* it allows you to organize a more global view of a subject, i.e. a book. Even I write answers on <Stack Overflow>, I also tend to organize links to these answers in a structured ways here, see e.g. big topics such as <SQL>
* <deletionism> and overly narrowness of allowed topics/style
Perhaps what saddens me the most is that even on <GitHub> stars/<Twitter>/Hacker news terms there is almost no interest in the project despite the fact that I consider that it has innovations, while many other <note taking apps> as well in the thousands of stars. Maybe I'm just delusional and all the tech that I'm doing is completely useless?
Part of the issue is probably linked to the fact that most other note taking apps focus on "help me organize my ideas so I can make more money" and often completely ignore "I want to publish my knowledge", and stuff that helps you make money is always easier to sell and promote.
OurBigBook on the other hand a huge focus on "I want to publish me knowledge". It aims almost single mindedly in being the best tool ever for that. However this doesn't make money for people, and therefore there are going to be way less potential users.
I do believe strongly that all it takes is a few users for the project to snowball. For some people, once you start braindumping, it is very addictive, and you never want to stop basically. So with only a few of those we can open large parts of undergrad knowledge to the world. But these people are few, and so far I haven't been able to find even a single one like me, and on top of that convince them that I have created the ultimate system for their knowledge publishing desires.
Another general lesson is that I should perhaps aimed for greater compatibility with existing systems such as <Obsidian>. Taking something that many people already know and use can have a huge impact on acceptance. E.g. anything that touches <Obsidian> can reach thousands of stars: https://github.com/KosmosisDire/obsidian-webpage-export[]. Note taking apps that aim for "<markdown>" compatibility also tend to fare better, even if in the end you inevitably have to extend the Markdown for some of your features. And <WYSIWYG>, which I want but don't have, is perhaps the ultimate familiarity.
Another issue compared to other platforms is that <OurBigBook> just came out late. <Obsidian> launched in 2020. <Roam Research> and <Trillium Notes> also came earlier. And it is hard to fight the advantage already gained by those on the "I'm going to take some personal notes" area. I do believe however that there a strong separation between "these are my personal notes" and "I want to publish these". Once you decide to publish your knowledge, you immediately start to write in a different way, and it is very hard to convert pre-existing "private" notes into ones suitable for public consumption.
= I ended up doing tech rather than content as usual
{parent=OurBigBook Project Update March 2025}
At first I had intended to create a lot more content for the world class university located where I lived, but I ended up not doing that and just improving the project tech instead.
There are a few reasons for this, good or bad:
* as a tech nerd, my natural tendency is to first sit down by myself and code to solve big general problems rather than go out and try to solve specific people's specific problems to obtain money and users
* at one point I got the feeling that helping students with a bunch of small courses might be useful, and that instead I might get more impact by instead by focusing on creating content for a next big thing area such as:
* <quantum computing>
* <AGI research>
because many of the courses are fundamentally useless by design due to misalignment between university and reality.
I'm still not sure what to do about that, but I do think I'll try to do a bit of course solving at least and see how it goes.
One thing I've learned first hand through <Ciro Santilli's Stack Overflow contributions> and <Linux Kernel Module Cheat> is that the barrier to make money from a useful open source learning project that benefits a large number of people a little bit is huge, perhaps infinite, and that it might be better to instead focus more intensely on fewer users. This insight pushes me more towards going for solving local courses.
Another consideration that supports going for courses is that being close to students is perhaps my only unfair advantage. There is likely no one else in the world in the same position that I'm at, with some "free time" to chill with undergrads and help them with 100% of my undivided attention and passion.
A point that pulls me towards the big tutorials however is that my time is almost up, and focusing on them would increase the chances that I will be work in those fields afterwards. This feeling may go against the best interests of the project, but it is perhaps an inevitable self preservation consideration unless someone decides to free me from that forever with the \b[2M] :-)
* the entry barrier to help students of a top university is rather high. The students are already extremely busy and pressured (this is pe), and if it is in the slightest hard to explain their problems to you because you are not fluent enough in their subject, they will find a faster way to obtain the knowledge and never come to you.
* I also did a bit of procrastinating with a few quick few exploration into cute programming projects. Nothing too crazy long however, just the usual. It's in my nature to have broad interests, and perhaps only such a person can make a <OurBigBook.com>. I'm not a fast worker. But I never stop. Once something is in my "this must be done or learnt list", I just keep coming back to it again and again until it happens.
The downsides of going for tech first are severe:
* you risk being misaligned with what users want and spend enormous amounts of time on useless features
* it is also rather demotivating that you are working hard on a really cool feature but you know that there are no users yet so no one will benefit from it, and that this feature alone is not enough to attract the users anyways
There are however counterpoints to these as for anything else:
* I'm a user and I'm always improving it for myself. If there are other people like me out there, they will love it. If there aren't, perhaps I'll never be able to do anything that caters for them well enough anyways.
* as the two users made me understand, once someone touches your thing, they expect it to be perfect, and their standards are extremely high. This is understandable in part given the large number of <note taking apps> in existence, and notably <WYSIWYG> ones. As such, there is some rationale for improving tech.
= How the tech improved
{parent=OurBigBook Project Update March 2025}
In any case, the outcome of that is that the tech has improved. And I have done a relatively good job of clearly publishing any "more user visible" improvements to https://docs.ourbigbook.com/news and social media such as
* https://mastodon.social/@OurBigBook
* https://x.com/OurBigBook
though it is important to note that there have been more than one "fix a hard bug" weeks that were not published because they would just bore readers.
During this period the main focus has been on improving <OurBigBook Web>, i.e. the dynamic website that powers <OurBigBook.com>. There are two reasons for that:
* Web is what has the <OurBigBook topics feature> for mind-melding, which is the killer feature of OurBigBook compared to other <note taking apps> and therefore deserves the highest levels of priority
Static website generation is an indispensable escape valve that ensures that your content can be published forever even if <OurBigBook.com> goes down one day, which it won't as long as I live. But the innovation is Web.
* static website generation was closer to good enough, but web was much further and is fundamentally harder.
I'm extremely satisfied with <OurBigBook> static website generation and haven't touched it as much. It wasn't easy to reach this state, but I'm there.
But Web is a different and much more complex beast.
Making CLI software that will run on a person's local computer under full trust and building a bunch of HTML from <lightweight markup> in bulk is one thing.
But making a public dynamic website that has to continuously maintain a coherent database state on granular updates, while giving users some trust but not enough for them to blow everything up is on a totally different level. See e.g. https://docs.ourbigbook.com/news/signup-ip-blacklist-vpn-detection-and-account-locking[the recent SPAM attack we've had to fend off].
\Image[https://raw.githubusercontent.com/ourbigbook/ourbigbook-media/master/feature/spam/crypto.png]
{title=Screenshot showing voting manipulated SPAM as the most highly upvoted article on <OurBigBook.com>}
{border}
{height=971}
{source=https://web.archive.org/web/20250228110911/https://ourbigbook.com/go/articles?sort=score}
And then there's also the issue of front-end being mega-hard to get right.
As a result, Web is now way less buggy and much more usable.
If you look through the list of Web updates, there is nothing specifically mind blowing. The core ideas have largely crystallized, and we are just trying to making them click. I have a few more punches up my sleeve, but the core is decided.
\Image[https://raw.githubusercontent.com/ourbigbook/ourbigbook-media/master/feature/search/full-text-calculus-fun-arrow.png]
{title=OurBigBook Web search}
{border}
{description=This is one of the many basic quality of life improvements that have been done on OurBigBook Web.}
{height=738}
{source=https://ourbigbook.com/go/articles?body=false&search=calculus%20fun}
\Image[https://raw.githubusercontent.com/ourbigbook/ourbigbook-media/master/feature/announce/modal.png]
{title=OurBigBook Web article announcement}
{description=Another cute new feature, you can send an email to your followers about a new amazing article you created.}
{height=349}
{source=https://ourbigbook.com/cirosantilli/chain-rule}
Web process has been somewhat slower than what I'd like. Of course, it is the case of any project that things are easily said than done. But there are two other main structural factors that have played into it:
* I have my first baby now, and we're learning how to deal with that on the fly.
For example, we could have put him on childcare a bit earlier, but due to inexperience we've kept him a bit longer than we maybe should have.
Things are well sorted out now, but not matter how good your support system is, at the end of the day, and more often night, it is you the parents that have to deal with a lot of inevitable baby issues. Unless you want them to turn into psychopaths and drug addicts that is, which I don't. I've reached the point of semi failure middle age that the baby feels like my best moonshot.
All of this sets a fundamental limit on how many hours you can work per week.
But at least with the donations I was able to work on OurBigBook at all. Because if it weren't for that, I would have to focus entirely on the generic job instead and OurBigBook would have been put on hold.
* the choice of Web stack. I was allured by <Next.js>. I can see the beauty and usefulness of a <Node.js> render front-end that also runs on backend and hydration. That is awesome.
But:
* <React> is insanely hard to learn and understand. Furthermore, it is also hard to understand the performance problem that it solves, and actually have a benchmark where this problem is solved faster than just delivering some HTML files with ad-hoc Js on top.
* the lack (or perhaps excess of shitty) actual web framework like <Ruby on Rails> and <Django (web-framework)> means that I have to rediscover the wheel many times over for all the essential support activities like testing, login and so one
At this point a rewrite is out of the question. I've managed to master things well enough to get a decent result, https://github.com/ourbigbook/ourbigbook/issues/361[and given up on the few things that I couldn't for the life of me achieve], after documenting them very well for posterity of course.
Aside from Web, there was only one thing that received a significant improvement, and that was the https://docs.ourbigbook.com/visual-studio-code[OurBigBook VS Code extension]. The extension is not perfect, and it is not the "final UI", which has to be some <WYSIWYG> implementation, and there are some fundamental limitations that cannot be overcome without patching VS Code itself. However, the extension is already extremely usable, and I'm writing this on it right now. Basics like syntax highlighting, jump to definition and autocomplete are very useful and usable.
\Image[https://raw.githubusercontent.com/ourbigbook/ourbigbook-media/master/feature/vscode/tree.png]
{title=Tree navigation in the <OurBigBook> Visual Studio Code extension}
{border}
{height=1100}
= What might be next
{parent=OurBigBook Project Update March 2025}
OK, I need to do content. I know :-) At the university I'm at, the only department that is open is the mathematics one. Both:
* physically, I'm sitting next to some students right now, though they don't yet know that their saviour is just next to them.
* in terms of publishing the course materials online. Many of them even have solution
All other courses extremely closed, notably Physics, which is the other course I'd consider. There are upsides and downsides for going for Mathematics:
* upside:
* maths doesn't change with time
* maths doesn't require experiments
* downside: most of it is useless compared to Physics
If I were free to choose, I might go for Physics instead. But maths isn't hard, and I think I'll just go with the hand I'm dealt this time to start with.
Tech wise, the big things are the following ones to which I have given different levels of architectural consideration (i.e. read: I'm afraid they'll be fucking hard and that I'll spend a month on yet another useless feature that won't help get a single user). I don't think I'll do those before at least a little bit of content, we'll see:
* <WYSIWYG>: this is not a question of if, but when and how. Even I miss it when dealing with images. I was particularly impressed by <Trillium Notes>, and might consider forking it or reusing some of its components
* perfect two way sync from web to local: https://github.com/ourbigbook/ourbigbook/issues/326[]
Currently, after much effort, publishing from local to web is extremely good.
But pulling back changes that you make on web UI locally is not really possible. A basic version can be made easily, but a great version requires some thought.
In particular, preventing accidental rewrite on simultaneous local + web edits require edit history to be in place.
The rationale here is that users would start editing on Web with a low entry barrier. And as they become more committed to the project, they would eventually transition to having editing most of their content locally from a desktop, with the exception of a few minor edits on the go when they are on a cell phone, and which we want to very easily and automatically be pulled back to local as soon as they open an editor on their laptop.
I.e. we want to add a downwards arrow to the following diagram:
\Image[https://raw.githubusercontent.com/ourbigbook/ourbigbook-media/master/feature/local-editing/bigb-publish-to-web-or-static-editor-logos.svg]
{height=600}
Smaller cute tech that I might do before content "real quick" include:
* move more into community tagging rather than just community topic-ing:
* https://github.com/ourbigbook/ourbigbook/issues/359
* https://github.com/ourbigbook/ourbigbook/issues/360
* automatic topic rendering for plaintext! https://github.com/ourbigbook/ourbigbook/issues/356[]. In particular this could open the doors for AI generated content.
Another thing I really want to do before time is up is to create a video summarizing my <philosophy of education>. I want it to be as fun and funny and sad as possible, with silly moving animated images and slides, not just me talking to the camera. Although all of the points I intend to talk about have undoubtedly been covered by others, it is something that I feel so strongly about that I would like to tell others about it more personally. If I start it it will likely take a few days to get done, and I'm not sure wha the final quality would be. It is a bit sad to not do "project work", but I think I'll end up doing it regardless. Class it under "fundraising" if you will, as it may help to find other like minded but rich people.
= Conclusion and feelings
{parent=OurBigBook Project Update March 2025}
It is a bit sad to work on a project that no one cares about. You're not sure if you're crazy or a visionary. And it is kind of lonely.
I sometimes wonder if I would be happy doing this for the rest of my life if I could. And if it would have any impact at all no matter for how long I do it. My feelings in that area go from slightly depressed to slightly excited about the potential a few times every week.
As we all know, living and making life choices means sacrificing other things that could have been. When I was in France in 2015, I started a masters course in AI/robotics with the idea of doing a <PhD> and <AGI research> later on but quit half way because I felt university was such a waste of time.
But come now the <AI boom>, and although I still believe <education is broken>, I might have been much better off financially/reputationally if I had withstood the bullshit followed that path. Instead I sacrificed that for <Being proud of low level programming is stupid>[nerding about low level programming] and <open educational content>.
It is hard to get such ideas off one's mind. But the fact is, for better or worse, I've started walking the path of educational reform and sacrificed others along the way, and this is the path that I'm further ahead than other people, and perhaps I should pursue it further to a possible conclusion. Also this path has the advantage that it is not fully exclusive from other academic endeavors as we will always need content about the new flashy things that keep coming up.
So yeah, it's hard, but here I am, and I'll go as far as I can without going into <Charles Bukowski> levels of personal sacrifice.
Announcements:
* https://mastodon.social/@cirosantilli/114133596150105395
* https://x.com/cirosantilli/status/1898784529986343262
* https://www.linkedin.com/feed/update/urn:li:share:7304550428942163970/
* https://www.facebook.com/cirosantilli/posts/pfbid02C94jbFwBofKnJaWhaGGV1aBkDr97fZ6Czdg56YShuqrXn5SXysGiBAARvzgpqxA8l
= Quick fun with the Common Crawl web graph
{parent=Updates}
https://github.com/cirosantilli/cirosantilli.github.io/issues/198[]. Previously at: https://stackoverflow.com/questions/31321009/best-more-standard-graph-representation-file-format-graphson-gexf-graphml/79467334#79467334 but <deletionism on Stack Overflow>[Stack Overflow fucking deleted the question].
I wanted to do a quick exploration of <open PageRank implementation and data>.
My general motivation for this is that a <PageRank>-like algorithm could be useful for more accurate user and article ranking on <OurBigBook>, see: <ourbigbook.com/PageRank-like ranking>{full}
But it could also be just generally cool to apply it to other <graph> datasets, e.g. for computing an <Wikipedia internal PageRank>.
A quick <Google> reveals only <Open PageRank>, but their methods are apparently closed source.
Then I had a look at the <Common Crawl web graph> data to see if I could easily calculate it myself, and... they already have it! See: <Common Crawl web graph official PageRank>{full}
Their graph dumps are in <BVGraph> <graph file format>, which is the native format of the <WebGraph (software)> framework, which implements the format and algorithms such as <PageRank>.
The only thing I miss is a command line interface to calculate the PageRank. That would be so awesome.
The more I look at it the more I love <Common Crawl>.
Announcements:
* https://mastodon.social/@cirosantilli/114070985511493835
* https://x.com/cirosantilli/status/1894777704517406852
In cc-main-2024-25-dec-jan-feb-domain-ranks.txt:
* `cirosantilli.com` was ranked ~453k
* `ourbigbook.com` was at ~606k
= Introductory video for Bitcoin inscription museum
{parent=Updates}
I finally took a day to edit the <Cool data embedded in the Bitcoin blockchain> section from <Aratu Week 2024 Talk by Ciro Santilli> into a proper YouTube video. The amount of effort that goes into every minute of video editing never ceases to amaze me.
\Video[https://www.youtube.com/watch?v=6XJ6wZBqgUo]
{title=My <Bitcoin inscription> museum by <Ciro Santilli>}
{height=600}
Announcements:
* https://mastodon.social/@cirosantilli/113764420506911687
* https://x.com/cirosantilli/status/1875157694270841024
* https://www.linkedin.com/posts/cirosantilli_my-bitcoin-inscription-museum-images-and-activity-7280924162838126592-BVLX/
* https://www.facebook.com/cirosantilli/posts/pfbid02kN3sVVTViekYsgyqmN1pdcTp81ca7rJSmofk7X3DkdXYL6Rb8tEd78LoLYw7dEMSl
= I was top user 25 on Stack Overflow in 2024
{parent=Updates}
In 2024 I was user \#25 with the most reputation gained on Stack Overflow.
This is up from \#38 in 2023 is even though I have answered less questions than before.
This is likely because <LLMs> have killed users that just answered lots of easy new questions, and favored those like me who only answer more important questions found through Google.
I was \#13 on the last quarter, so this is likely to go even higher in 2025. More details at: <Ciro Santilli's Stack Overflow contributions>{full}
Announcements:
* https://mastodon.social/@cirosantilli/113754291241197245
* https://x.com/cirosantilli/status/1874509852791492647
* https://www.linkedin.com/feed/update/urn:li:activity:7280275957393833985/
\Image[https://raw.githubusercontent.com/cirosantilli/media/refs/heads/master/ciro-santilli-stack-overflow-stats.png]
{title=<Ciro Santilli>'s <Stack Overflow> stats}
{description=Further methodology details at: <image Ciro Santilli's Stack Overflow stats>.}
{height=900}
= Generating test data for full text search tests
{parent=Updates}
I've been thinking lightly about adding full text search to <OurBigBook>.
For example, at https://docs.ourbigbook.com/news/article-and-topic-id-prefix-search article search was added, but it only finds if you search something that appears right at the start of a title, e.g. for:
``
Fundamental theorem of calculus
``
you'd get a hit for:
``
fundamental
``
but not for
``
calculus
``
To do this efficiently, we need full text search, which <PostgreSQL> implements.
But finding a clean way to generate test data for testing out the speedup was not so easy and exploration into this led me to publishing a few new slightly improved methods where Googlers can now find them:
* https://unix.stackexchange.com/questions/97160/is-there-something-like-a-lorem-ipsum-generator/787733#787733 I propose a neat random "sentence" generator using common CLI tools like <grep> and <sed> and the pre-installed Ubuntu dictionary `/usr/share/dict/american-english`:
``
grep -v "'" /usr/share/dict/american-english |
shuf -r |
paste -d ' ' $(printf "%4s" | sed 's/ /- /g') |
sed -e 's/^\(.\)/\U\1/;s/$/./' |
head -n10000000 \
> lorem.txt
``
* to achieve that, I also proposed two superior "join every N lines" method for the CLI: https://stackoverflow.com/questions/25973140/joining-every-group-of-n-lines-into-one-with-bash/79257780#79257780[], notably this <awk> poem:
``
seq 10 | awk '{ printf("%s%s", NR == 1 ? "" : NR % 3 == 1 ? "\n" : " ", $0 ) } END { printf("\n") }'
``
* https://stackoverflow.com/questions/3371503/sql-populate-table-with-random-data/79255281#79255281 I propose:
* a clean <PostgreSQL> random string stored procedure that picks random characters from an allowed character list
``
CREATE OR REPLACE FUNCTION random_string(int) RETURNS TEXT as $$
select
string_agg(substr(characters, (random() * length(characters) + 1)::integer, 1), '') as random_word
from (values('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789- ')) as symbols(characters)
join generate_series(1, $1) on 1 = 1
$$ language sql;
``
* first generating <PostgreSQL> data as <CSV>, and then importing the CSV into PostgreSQL as a more flexible method. This can also be done in a streaming fashion from stdin which is neat.
``
python generate_data.py 10 | psql mydb -c '\copy "mytable" FROM STDIN'
``
* https://stackoverflow.com/questions/16020164/psqlexception-error-syntax-error-in-tsquery/79437030#79437030 regarding the safe generation of prefix search `tsquery` from user inputs without query errors, I've learned about `websearch_to_tsquery` and further highlighted a possible `tsquery -> text -> tsquery` approach that might be correct for prefix searches
* https://stackoverflow.com/questions/67438575/fulltext-search-using-sequelize-postgres/79439253#79439253 I put everything together into a minimal <Sequelize> example, read for usage in <OurBigBook>
Finally I did a writeup summarizing PostgreSQL full text search: <PostgreSQL full-text search>{full} and also dumped it at: https://www.reddit.com/r/PostgreSQL/comments/12yld1o/is_it_worth_using_postgres_builtin_fulltext/[] for good measure.
= Getting a list of all currencies from Wikidata with SPARQL
{parent=Updates}
https://opendata.stackexchange.com/questions/1560/how-can-i-get-a-list-of-currencies-from-wikidata/21839#21839
I've had a bit more fun with <SPARQL> and <Wikidata>.
This one was way harder than my previous fun with "find the oldest people who won a given prize" (<Nobel Prize>/<#Oscar>) https://mastodon.social/@cirosantilli/112689376315990248 because unlike those prizes where all the decisions are centralized, countries are much more complicated beasts, with changing currencies and international recognition.
This was a good experience to see a few ways in which Wikidata is inconsistent, with the same concept being expressed in multiple different ways, e.g. "end time" property of the current vs the superior "end time" qualifier.
Particularly bad is the notion of a "https://www.wikidata.org/wiki/Help:Ranking#Deprecated_rank[deprecated rank]", that should really not exist.
This is exactly the type of semi interactive data munching that I like to do, a bit in the same vein as <CIA 2010 covert communication websites> and <Cool data embedded in the Bitcoin blockchain>.
As you might imagine, the <secret services> use exactly this type of knowledge modelling to do their dirty business, e.g. https://github.com/gchq/Gaffer[Gaffer] by the <GCHQ>.
If only I weren't such a rebel, I'd be a perfect fit for the <intelligence agencies>.
This is the best monstrosity I had the patience to come up with:
``
SELECT
?currency
(GROUP_CONCAT(DISTINCT ?currencyIsoCode; SEPARATOR=", ") AS ?currencyIsoCodes)
?currencyLabel
(GROUP_CONCAT(DISTINCT ?countryLabel; SEPARATOR=", ") AS ?countries)
WHERE {
?country wdt:P31/wdt:P279* wd:Q6256. # is country
?country p:P38 ?countryHasCurrency.
?countryHasCurrency ps:P38 ?currency.
?countryHasCurrency wikibase:rank ?countryHasCurrencyRank.
OPTIONAL {
?currency p:P498 ?currencyHasIsoCode.
?currencyHasIsoCode ps:P498 ?currencyIsoCode.
}
FILTER NOT EXISTS {?country wdt:P576 ?countryAbolished}
FILTER NOT EXISTS {?currency wdt:P576 ?currencyAbolished}
FILTER NOT EXISTS {?currency wdt:P582 ?currencyEndTime}
FILTER NOT EXISTS {?countryHasCurrency pq:P582 ?countryHasCurrencyEndtime}
FILTER (?countryHasCurrencyRank != wikibase:DeprecatedRank)
FILTER (!bound(?currencyHasIsoCode) || ?currencyHasIsoCode != wikibase:DeprecatedRank)
# TODO makes query take timeout? Why? Needed to exclude PLZ.
FILTER NOT EXISTS {?currencyHasIsoCode pq:P582 ?currencyHasIsoCodeEndtime}
SERVICE wikibase:label {
bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".
?currency rdfs:label ?currencyLabel .
?country rdfs:label ?countryLabel .
}
}
GROUP BY ?currency ?currencyLabel
ORDER BY ?currencyIsoCodes ?currencyLabel
``
It got quite close to the ISO 4217 list.
I was drawn into this waste of time after I noticed that someone had managed to create the Wikipedia of <PsiQuantum> which I had tried earlier but got deleted: https://mastodon.social/@cirosantilli/113488891292906243[], and then I made the mistake of having a look at the <Wikidata> page of <PsiQuantum>.
\Image[https://files.mastodon.social/media_attachments/files/113/509/536/025/260/785/original/428906b92ab725f5.jpg]
{title=500,000 Transnistrian ruble banknote 1997 series}
{description=This is one of the most widely used currencies which does not have an ISO 4217 code.}
Announcements:
* https://mastodon.social/@cirosantilli/113509491731720236
* https://x.com/cirosantilli/status/1858846619359219848
I also had one more fun with: https://opendata.stackexchange.com/questions/15750/structured-data-for-nobel-prizes/21847#21847 getting some basic info about <Nobel Prize> winners, and noticed one, <John Sulston>, <2002 Nobel Prize in Physiology and Medicine> laureate, who likely has the wrong place of birth on his Nobel Prize profile: https://www.nobelprize.org/prizes/medicine/2002/sulston/facts/ which is funny. I suggested the change now. Edit they fixed it after I pointed it out:
* bad: https://web.archive.org/web/20241008022931/https://www.nobelprize.org/prizes/medicine/2002/sulston/facts/
* good: https://web.archive.org/web/20241127133204/https://www.nobelprize.org/prizes/medicine/2002/sulston/facts/
Another highlight was <#1913 Nobel Prize in Chemistry> laureate <#Alfred Werner> who born either in Mulhouse in Alsace, France, or in "Yo no sé qué me pasó" ("I don't know what happened to me" in <Spanish (language)>), a https://www.youtube.com/watch?v=83PeS6bKla0[1986 song by Mexican singer Juan Gabriel].
Announcements:
* https://mastodon.social/@cirosantilli/113528952716463018
* https://x.com/cirosantilli/status/1860088866335785187
\Image[https://upload.wikimedia.org/wikipedia/commons/f/fe/John_Sulston_%282008%29.jpg]
Also at https://opendata.stackexchange.com/questions/21849/how-to-get-a-list-of-all-nobel-prize-winners-who-never-had-a-doctorate-from-wiki/21850#21850[] I tried to get the list of <rebel without a PhD>[Nobel Prize laureates who don't have a PhD]. I think the query was correct, but <Wikidata> data is just too incomplete. Related:
* https://www.reddit.com/r/NoStupidQuestions/comments/mv85av/has_anybody_without_a_phd_ever_won_the_nobel/
* https://www.quora.com/Has-anyone-ever-won-a-Nobel-Prize-without-a-PhD
= CIA 2010 websites video
{parent=Updates}
I edited the VOD of the talk <Aratu Week 2024 Talk by Ciro Santilli> about the <CIA 2010 covert communication websites> a bit and published it at: https://www.youtube.com/watch?v=TFfuzZC5Qpc[].
\Video[https://www.youtube.com/watch?v=TFfuzZC5Qpc]
{title=How I found a <Star Wars> website made by the <CIA> by <Ciro Santilli>}
Announcements:
* https://mastodon.social/@cirosantilli/113356261451137096
* https://x.com/cirosantilli/status/1849035082037686768
* https://www.linkedin.com/feed/update/urn:li:share:7254800878476402688/
* https://www.facebook.com/cirosantilli/posts/pfbid0Pzo2DKNj9TpWzhFsNqKWxGujPbCaesCRWtH1MQ1mzXUZaBq1Ca9sacjUYjrkD79kl
= GitHub blocked the China Dictatorship bot
{parent=Updates}
https://github.com/cirosantilli/china-dictatorship/issues/1330
On September 2024, <GitHub> forbade our <China Dictatorship> auto-reply bot, the reason given is because they forbid comment reply bots in general. Though it was cool to see a junior support staff person giving out what obviously triggered the action:
> We've received a large volume of complaints from other users indicating that the comments and issues are unrelated to the projects they were working on.
before a more senior one took over.
Ciro was slightly saddened but not totally surprized by the bloodbath against him on the Reddit the threads he created:
* https://www.reddit.com/r/github/comments/1g7acv6/github_forbade_me_from_running_a_bot_that_would/ deleted by admins because
> We don't work for GitHub and we can't help you with your GitHub support problems. You'll just need to be patient.
which is stupid, obviously we should be able to discuss GitHub policies in that sub.
Also good highlight to user whoShotMyCow
> Has GitHub also forbidden you from, say, getting a job
Reply:
> No, a 120,000 USD donation did that: https://cirosantilli.com/sponsor#1000-monero-donation
Reply:
> Can't hate on the grind but I think you should also consider psychiatric help
<Many successful people are neurodiverse> comes to mind.
* https://www.reddit.com/r/China/comments/1g7aa6k/american_programming_website_github_forbade_me/[]: also deleted without reason
So we observe once again the stupidity of <deletionism> towards anything that is considered controversial. The West is discussion fatigued, and would rather delete discussion than have it.
We also se people against you having freedom to moderate your own repositories as you like it, with bots or otherwise. Giving up freedoms for nothing, because "bot is evil".
Announcements:
* https://mastodon.social/@cirosantilli/113330579182535172
* https://x.com/cirosantilli/status/1847391285495406947
= Does copyright transfer of papers to publishers affect when the paper enters the public domain?
{parent=Updates}
https://academia.stackexchange.com/questions/213576/do-copyright-transfer-of-papers-to-publishers-affect-when-the-paper-enters-the-p Do copyright transfer of papers to publishers affect when the paper enters the public domain since copyright belongs to a corporation and not persons?
I'm asking a law question for a change, because I enjoy skimming through important old papers and uploading parts of them where everyone can legally enjoy them.
Announcements:
* https://mastodon.social/@cirosantilli/113132173342955363
* https://x.com/cirosantilli/status/1834693576669720889
= Relationship between the Falun Mine and the discovery of new chemical elements
{parent=Updates}
I like the <#Falun Mine> for two reasons:
* some cool chemical discoveries have been made with a relation to the mine, notably #tantalum and <#selenium>, added a section to <Wikipedia>: https://en.wikipedia.org/w/index.php?title=Falun_Mine&oldid=1245374294#Discovery_of_new_elements I used the book <discovery Of The Elements by Mary Elvira Weeks> as my primary reference.
* it is the <Chinese> version of the <#Scunthorpe problem> due to a naming conflict with <Falun Gong>, a censored new religion that was banned in <China>
Announcements:
* https://mastodon.social/@cirosantilli/113143492454499727
= duty-machine news on china-dictatorship issues
{parent=Updates}
{tag=China Dictatorship}
Whenever a user creates an issue or comment on <China Dictatorship>, the bot now automatically creates a new issue with one of the latest news from Duty Machine: https://github.com/duty-machine/duty-machine
Sample created issue: https://github.com/cirosantilli/china-dictatorship/issues/1322 Script: https://github.com/cirosantilli/china-dictatorship/blob/ab6a46c511afaaf6c9e68ba8813c2b2cf9d9638c/action.js#L195
Duty Machine is a bot repo that automatically scrapes Chinese language news from major news outlets such as the <New York Times> or <#Radio Free Asia> which ensures that China Dictatorship news will always be new.
It's the war of the anonymous bots against the little pinks, part of asymmetric information warfare: https://cirosantilli.com/china-dictatorship/asymmetric-information-warfare
Announcements:
* https://mastodon.social/@cirosantilli/113073054553465619
* https://x.com/cirosantilli/status/1830910108479492500
Update September 2024: <GitHub blocked the China Dictatorship bot>
Update March 2025: duty-machine was DMCA'ed on February 2025: https://github.com/duty-machine/news by, surprise surprise, a Chinese copyright owner "Sanlian News" https://github.com/github/dmca/blob/master/2025/02/2025-02-27-sanlian-news.md I noticed because it broke my Action on another repo: https://github.com/orgs/community/discussions/154227 Announcements:
* https://mastodon.social/@cirosantilli/114188384569606918
* https://x.com/cirosantilli/status/1902291981964861882
= `facial_recognition` Python package
{parent=Updates}
{tag=Facial recognition}
https://superuser.com/questions/420885/is-there-a-face-recognition-command-line-tool/1852394#1852394 played with the `face_recognition` Python package: https://github.com/ageitgey/face_recognition Cute <CLI> API, but disappointing accuracy. Also at:
* https://stackoverflow.com/questions/13211745/detect-face-then-autocrop-pictures/37501314#37501314
* https://softwarerecs.stackexchange.com/questions/1988/floss-tools-for-facial-recognition/90995#90995
Thanks https://github.com/ageitgey[Adam Geitgey] for putting that repo up.
\Image[https://upload.wikimedia.org/wikipedia/commons/thumb/5/52/Barack_Obama_and_Donald_Trump.jpg/1024px-Barack_Obama_and_Donald_Trump.jpg]
Announcements:
* https://mastodon.social/@cirosantilli/112966200451796295
* https://x.com/cirosantilli/status/1824074987331780709
= Marie Curie papers
{parent=Updates}
{tag=Marie Curie}
Under <publication by Marie Curie>{full} I did a quick overview of the <papers> in which <Marie Curie> and collaborators publish the existence of new elements <polonium> and <radium>. Both are very understandable (except the <chemistry>), and have some cute terminology. I also cited those papers on her <Wikipedia> page: https://en.wikipedia.org/w/index.php?title=Marie_Curie&diff=1240252528&oldid=1238097626 Another good exercise in "old paper finding" + "Wikipedia markup/rules" as I looked at the <Comptes rendus de l'Académie des Sciences> a bit.
\Image[https://upload.wikimedia.org/wikipedia/commons/c/c8/Marie_Curie_c._1920s.jpg]
{height=600}
This was kickstarted by <YouTube> recommending me the following good video:
\Video[https://www.youtube.com/watch?v=dgBDvnqMkT4]
{title=The RaLa Experiment by Our Own Devices}
which led me into yet a quick <nuclear physics> binge. I shouldn't do this to myself. I also ended up writing some tentative answers on <Quora>:
* https://www.quora.com/During-fission-are-the-gamma-rays-always-exactly-the-same-frequency/answer/Ciro-Santilli[During fission, are the gamma rays always exactly the same frequency?]
* https://www.quora.com/How-many-times-stronger-is-plutonium-than-uranium/answer/Ciro-Santilli[How many times stronger is plutonium than uranium?]
Announcements:
* https://mastodon.social/@cirosantilli/112960157294243902
* https://x.com/cirosantilli/status/1823704595346153589
* https://www.facebook.com/cirosantilli/posts/pfbid02sWkV1vTwU14YseWrRk9Zn1Pc92MQqvw1peKYRUqxQth59WfWss8dzWgRhQL4Kdohl
= Text-to-speech software comparison
{parent=Updates}
{tag=Text-to-speech}
https://askubuntu.com/questions/501910/how-to-text-to-speech-output-using-command-line/1522885#1522885
I tried to use every single free offline <text-to-speech> engine that would run on <Ubuntu 24.04> without too much hassle to see if any of them sounded natural. #pico2wave was the overall winner so far, but it is not perfect.
I've been noticing a gap between the "<AI>" <SOTA> and what is actually packaged well enough to be usable by a general audience.
Also played a bit more with <#OpenAI Whisper>: https://askubuntu.com/questions/24059/automatically-generate-subtitles-close-caption-from-a-video-using-speech-to-text/1522895#1522895 Mind blowing performance and perfect packaging as well, kudos.
Announcements:
* https://mastodon.social/@cirosantilli/112938085076238718
* https://x.com/cirosantilli/status/1822272257349038417
= Older updates
{parent=Updates}
* https://en.wikipedia.org/wiki/Scott_Hassan I delved into a bit of <Wikipedia> drama on the page of <Scott Hassan>, initial coder of <Google Search>, which I created an am the main contributor.
Originally I had added some details about this messy divorce which saw coverage in major publications such as the <New York Times>: https://www.nytimes.com/2021/08/20/technology/Scott-Hassan-Allison-Huynh-divorce.html and Scott used puppets to remove those at several points in time over the years.
Those removals were then reverted by other editors, not myself, indicating that editors wanted the details there.
While preparing to finally decide this through moderation, I ended up finding that the divorce details should likely have been left out according to Wikipedia rules, because Scott is "relatively unknown" and a "low profile individual":
* https://en.wikipedia.org/w/index.php?title=Wikipedia:Biographies_of_living_persons&oldid=1235693634#People_who_are_relatively_unknown
* https://en.wikipedia.org/wiki/Wikipedia:Who_is_a_low-profile_individual
and so I ended up removing them myself.
This is yet once again <deletionism on Wikipedia> weakening the site, and making @OurBigBook stronger :-) Here is the uncensored one: <#Scott Hassan>
I spent time on this partly because I'm mildly obsessed with founding myths of companies, but also partly to better understand the moderation process of Wikipedia.
Posted at:
* https://mastodon.social/@cirosantilli/112908378712190057
* https://x.com/cirosantilli/status/1820377809836978496
* https://www.facebook.com/cirosantilli/posts/pfbid0yoj6be1hS2YVNuPhHXXeErZ11ExER6A9XCxLYG44dEK96hNTVpuzEJDXLJBmoah6l
* https://unix.stackexchange.com/questions/256138/is-there-any-decent-speech-recognition-software-for-linux/613392#613392 cool to see that the Vosk open source speech recognition software by https://twitter.com/alphacep now has a convenient command line interface called vosk-transcriber!
It allows you to just:
``
vosk-transcriber -m ~/var/lib/vosk/vosk-model-en-us-0.22 -i in.ogg -o out.srt -t srt
``
to extract a subtitle file out.srt from a .ogg audio input file.
Accuracy is a bit meh, but we'll take it!
Posted at:
* https://mastodon.social/@cirosantilli/112317205707461869
* https://twitter.com/cirosantilli/status/1782535572143140900
* https://askubuntu.com/questions/161515/speech-recognition-app-to-convert-mp3-voice-to-text/423849#423849
* https://video.stackexchange.com/questions/33531/how-to-remove-background-from-video-without-green-screen-on-the-command-line/37392#37392 tested this AI video background remover https://github.com/nadermx/backgroundremover by @nadermx. It had a few glitches, but I had fun.
\Image[https://ia600306.us.archive.org/10/items/ciro-santilli-selfie-in-cycling-kit-with-old-cottages-background-removal/Ciro_Santilli_selfie_in_cycling_kit_with_old_cottages_background_removal.gif]
https://unix.stackexchange.com/questions/233832/merge-two-video-clips-into-one-placing-them-next-to-each-other/774936#774936 I then learned how to stack videos side-by-side with ffmpeg to create this side-by-side demo. It also works for GIFs! https://stackoverflow.com/questions/30927367/imagemagick-making-2-gifs-into-side-by-side-gifs-using-im-convert/78361093#78361093
\Image[https://web.archive.org/web/20240422123110im_/https://i.stack.imgur.com/gkJnK.gif]
Posted at:
* https://twitter.com/cirosantilli/status/1781994384805822684
* https://mastodon.social/@cirosantilli/112308748237095260
* https://www.facebook.com/cirosantilli/posts/pfbid02SqbYcRBvVkfivXmqmWJ1cc1KjEkbZyC8EXkBqgzZisgFPcXdADEXzrKCucWJn8uQl
* https://archive.org/details/ciro-santilli-selfie-in-cycling-kit-with-old-cottages-background-removal
* Just found out that <Ciro Santilli's hardware/P14s>[my Lenovo ThinkPad P14s] has an infrared camera, and recorded a quick test video on <Ubuntu 23.10> with:
``
fmpeg -y -f v4l2 -framerate 30 -video_size 640x360 -input_format gray -i /dev/video2 -c copy out.mkv
``
* https://mastodon.social/@cirosantilli/112261675634568209
* https://twitter.com/cirosantilli/status/1778981935257116767
* https://www.facebook.com/cirosantilli/posts/pfbid027M3n2p8snE9otAWdHtJ3ig2AhrXoDGv4h68o1z8agHceQBbFHZpEoxg7KZbiWAgWl
* https://www.linkedin.com/feed/update/urn:li:activity:7184755892410576897/
* https://www.youtube.com/watch?v=o1ZeR6pmf6o
* https://commons.wikimedia.org/wiki/File:Infrared_video_of_Ciro_Santilli_waving_recorded_on_Lenovo_ThinkPad_P14s_with_FFmpeg_6.0_on_Ubuntu_23.10.webm
\Image[https://web.archive.org/web/20240413030921if_/https://i.sstatic.net/c5KbD2gY.gif]
{title=<Ciro Santilli> waving hello in infrared}
{height=400}
Ciro Santilli