This section contains the a list of cool things Ciro Santilli has been up to in chronological order, including small quick ones. Many/most of those are also posted on Ciro Santilli's accounts such as:
For a more theme-oriented version of the best results see: Section "The best articles by Ciro Santilli".
For OurBigBook Project updates see: docs.ourbigbook.com/news
This is an update to the article: Section "CIA 2010 covert communication websites"
I found 44 new covert websites made by the CIA around 2010 bringing the total to 397!
Most websites were boring as usual, but one was slightly cooler: webofcheer.com is a comedy fansite featuring Johnny Carson, Charles Chaplin, Rowan Atkins (of Mr. Bean fame), The Three Stooges and some other Americans no one knows about anymore. There must have been a massive Johnny Carson amongst the contractors at that time, given that we previously also knew about
alljohnny.com
, a site dedicated fully to him! Both of these sites also serve as some of the earliest examples we've got so far, dating back to 2004 and 2005.2011 Wayback Machine archive of webofcheer.com
. Source. 2011 Wayback Machine archive of webofcheer.com scrolled to show Johnny Carson
. Source. 2004 Wayback Machine archive of alljohnny.com
. Source. This one was a previously known website featuring Johnny Carson.Another cool discovery is that I found the Getty Images source of the Jedi boy on their Star Wars themed site starwarsweb.net: web.archive.org/web/20101230033220/http://starwarsweb.net/ The photo can still be licensed today as of 2025: www.gettyimages.co.uk/detail/photo/little-jedi-royalty-free-image/172984439. I found it by searching for "jedi boy" on gettyimages.co.uk. The photo is credited to username
madisonwi
, presumably an alias of a photographer from Madison, Wisconsin. Inspired by this I reverse image searched and found the source of many other stock images from other websites, and I pinged their authors whenever I could locate them e.g. x.com/cirosantilli/status/1899750172260806711.Stock photo of a Jedi boy from Getty Images used on starwarsweb.net
. Source. 2010 Wayback Machine archive of starwarsweb.net
. There were two small advances that led to the discovery of new domains:
- while looking for a way to procrastinate I decided to scrape justdropped.com/drops/ for fun. That website lists expired domain names and see if it would yield any new results.I had already scrapped other expired domain websites before and used that data, and I hoped that this one would provide some new domain hits, even though it had very large overlap with the other websites I had scraped domains from previously.Such domain name lists tend to contain all SCAM domains in existence, since those inevitably expire once the scammers are caught.
- even more importantly, I noticed by chance that I was being too strict on a small part of my fingerprinting which was excluding a few good domains, by removing any hits that had multiple archives of the Communication mechanism
With those two new developments, I then kicked off my pre-existing search pipelines searching for domain names with the word
news
on them, an amazingly efficient heuristic because many of the websites were disguised as news aggregators, and after a few hours theses new hits emerged. A few of those also led to the discovery of new IPs which then led to new domains.One entirely new IP range was found around fastnews-online.com from 208.93.112.105 to 208.93.112.125. There were many domain names with very promising names in the range, but unfortunately for some reason most didn't have Wayback Machine Archives so I didn't count them as hits as per my guidelines.
2009 Wayback Machine archive of fastnews-online.com
. Also the newly found todaysengineering.com at 208.254.38.39 appears to form an IP range with the previously known nejadnews.com at 208.254.38.56:, but I couldn't find any other domains in the region with our current data sources.
2011 Wayback Machine archive of todaysengineering.com
. All other domains either slot into previously known IP ranges, or more commonly don't currently have a known IP, though they would likely just slot in existing ranges if we had better data.
Thanks to Jack Rhysider from the Darknet Diaries podcast for pointing me to the existing of the 2022 Reuters article that kickstarted my research on the subject!
The full list of newly found websites is:
- cellar-notes.com
- dailywellnessnews.com
- differentviewtoday.com
- dryterrainnews.com
- euronewsonline.net
- fastnews-online.com
- financecentraltoday.com
- globalcitizennews.net
- globalinvestmentnews.net
- inkfreenews.com
- internationalnewsworthiness.com
- intoworldnews.com
- lasthournews.com
- latinamericanewsbeat.com
- localtoglobalnews.com
- magneticfieldnews.com
- middle-east-newstoday.com
- mideasttoday.net
- mydailynewsreport.com
- mynepalnews.com
- nbanewsroundup.com
- nejadnews.com
- networkconnectionsite.com
- news-and-sports.com
- newsdelivered.net
- pondernews.net
- profile-news.com
- purlicue-news.com
- sandstormnews.com
- segomonews.com
- shadesofnews.com
- technologypresstoday.com/
- the-news-scene.com
- thefootball-life.com
- thefreshnews.com
- thenewsofpakistan.com
- totallynewsnow.com
- travelxtreme.net
- webofcheer.com
- wiredworldnews.com
- world-news-online.net
- worldaroundyunnan.com
- worldofonlinenews.com
Announced at:
- mastodon.social/@cirosantilli/114156495883418926
- x.com/cirosantilli/status/1900249928653271334
- www.facebook.com/cirosantilli/posts/pfbid02LbrfezGmFik582d6H7ZEoCf9bwpU73vyivdGLVbbzWjejWLS5Rv9EjGNXBPQppUBl
- www.linkedin.com/posts/cirosantilli_httpslnkdineyu8qwc-i-found-44-new-covert-activity-7306015949374058496-X5zl/
This is a summary of the status of the OurBigBook Project, focusing notably on the past 9 months that I've been able to devote fully to it starting June 2024 notably due to the anonymous 1000 Monero donation and other supporters.
I have 3 months left and after unless some crazy person gives more money, I'll go back to some generic programming job that could be done by many other people so that my wife won't kill me. Hopefully I'll find something in quantum computing or AGI research this time that is not too boring, but we'll see.
I should also note that I have raised my requirement for a second year full time from 100k USD to 200k USD, such that there are about only 144k USD missing as of writing, a bargain. See also Section "Sponsor Ciro Santilli's work on OurBigBook.com". I have also set a 2M USD retirement goal in case someone wants to free me to lurk after university students for the rest of my life. Creepy.
The reason for this increase is partly because I'm jealous watching my university peers getting relatively richer and richer than me. More seriously though, as I'm likely going to be looking for a job soon, I don't want to scare employers off too much thinking that it is likely that I'll be leaving in a few months too easily. Plus inflation and the natural lack of security that such endeavour brings.
Long story short, the project is so far a complete failure on the most important metric: number of regular users, which current sits at exactly one: myself.
There were notable users who found the project online and who actually tried to use the website for some content and provided extremely valuable feedback:Unfortunately after the period of a few weeks they stopped using it to follow their other priorities instead. Which is of course totally fine, however sad.
I still believe that the OurBigBook Web feature is a significant tech innovation that could make the website go big.
I also believe that the project gets many fundamentals of braindumping right, notably the infinitely deep table of contents without forced scoping, e.g.:does not make Calculus have an ID orr URL of
- Mathematics
- Calculus
mathematics/calculus
, rather it's just calculus
.Internal cross file internal link uses only the leaf ID
hilbert-space
.But there is a fundamental difficulty in reaching critical mass to that self-sustaining point, as people don't seem to be convinced by these logical "my system is better" argument alone, as opposed to having them Google into stuff they need now and then understand that the project is awesome.
A closely related critical mass issue is that existing big multiuser knowledge base websites such as Stack Overflow and Wikipedia have a tremendous advantage on PageRank. No matter how useless a Wikipedia article about something is, it will always be on top of Google within a week of creation for title hits. And since the main goal of publishing your stuff is to get it seen, it makes much more sense for writers to publish on such existing websites whenever possible, because anywhere else it is way way less likely to be seen by anybody.
Even I end up writing way more on Stack Overflow than on OurBigBook as a programmer. But I still believe that there is a value to OurBigBook, for the usual reasons of:
- it allows you to organize a more global view of a subject, i.e. a book. Even I write answers on Stack Overflow, I also tend to organize links to these answers in a structured ways here, see e.g. big topics such as SQL
- deletionism and overly narrowness of allowed topics/style
Perhaps what saddens me the most is that even on GitHub stars/Twitter/Hacker news terms there is almost no interest in the project despite the fact that I consider that it has innovations, while many other note taking apps as well in the thousands of stars. Maybe I'm just delusional and all the tech that I'm doing is completely useless?
Part of the issue is probably linked to the fact that most other note taking apps focus on "help me organize my ideas so I can make more money" and often completely ignore "I want to publish my knowledge", and stuff that helps you make money is always easier to sell and promote.
OurBigBook on the other hand a huge focus on "I want to publish me knowledge". It aims almost single mindedly in being the best tool ever for that. However this doesn't make money for people, and therefore there are going to be way less potential users.
I do believe strongly that all it takes is a few users for the project to snowball. For some people, once you start braindumping, it is very addictive, and you never want to stop basically. So with only a few of those we can open large parts of undergrad knowledge to the world. But these people are few, and so far I haven't been able to find even a single one like me, and on top of that convince them that I have created the ultimate system for their knowledge publishing desires.
Another general lesson is that I should perhaps aimed for greater compatibility with existing systems such as Obsidian. Taking something that many people already know and use can have a huge impact on acceptance. E.g. anything that touches Obsidian can reach thousands of stars: github.com/KosmosisDire/obsidian-webpage-export. Note taking apps that aim for "markdown" compatibility also tend to fare better, even if in the end you inevitably have to extend the Markdown for some of your features. And WYSIWYG, which I want but don't have, is perhaps the ultimate familiarity.
Another issue compared to other platforms is that OurBigBook just came out late. Obsidian launched in 2020. Roam Research and Trillium Notes also came earlier. And it is hard to fight the advantage already gained by those on the "I'm going to take some personal notes" area. I do believe however that there a strong separation between "these are my personal notes" and "I want to publish these". Once you decide to publish your knowledge, you immediately start to write in a different way, and it is very hard to convert pre-existing "private" notes into ones suitable for public consumption.
At first I had intended to create a lot more content for the world class university located where I lived, but I ended up not doing that and just improving the project tech instead.
There are a few reasons for this, good or bad:
- as a tech nerd, my natural tendency is to first sit down by myself and code to solve big general problems rather than go out and try to solve specific people's specific problems to obtain money and users
- at one point I got the feeling that helping students with a bunch of small courses might be useful, and that instead I might get more impact by instead by focusing on creating content for a next big thing area such as: because many of the courses are fundamentally useless by design due to misalignment between university and reality.I'm still not sure what to do about that, but I do think I'll try to do a bit of course solving at least and see how it goes.One thing I've learned first hand through Ciro Santilli's Stack Overflow contributions and Linux Kernel Module Cheat is that the barrier to make money from a useful open source learning project that benefits a large number of people a little bit is huge, perhaps infinite, and that it might be better to instead focus more intensely on fewer users. This insight pushes me more towards going for solving local courses.Another consideration that supports going for courses is that being close to students is perhaps my only unfair advantage. There is likely no one else in the world in the same position that I'm at, with some "free time" to chill with undergrads and help them with 100% of my undivided attention and passion.A point that pulls me towards the big tutorials however is that my time is almost up, and focusing on them would increase the chances that I will be work in those fields afterwards. This feeling may go against the best interests of the project, but it is perhaps an inevitable self preservation consideration unless someone decides to free me from that forever with the 2M :-)
- the entry barrier to help students of a top university is rather high. The students are already extremely busy and pressured (this is pe), and if it is in the slightest hard to explain their problems to you because you are not fluent enough in their subject, they will find a faster way to obtain the knowledge and never come to you.
- I also did a bit of procrastinating with a few quick few exploration into cute programming projects. Nothing too crazy long however, just the usual. It's in my nature to have broad interests, and perhaps only such a person can make a OurBigBook.com. I'm not a fast worker. But I never stop. Once something is in my "this must be done or learnt list", I just keep coming back to it again and again until it happens.
The downsides of going for tech first are severe:There are however counterpoints to these as for anything else:
- you risk being misaligned with what users want and spend enormous amounts of time on useless features
- it is also rather demotivating that you are working hard on a really cool feature but you know that there are no users yet so no one will benefit from it, and that this feature alone is not enough to attract the users anyways
- I'm a user and I'm always improving it for myself. If there are other people like me out there, they will love it. If there aren't, perhaps I'll never be able to do anything that caters for them well enough anyways.
- as the two users made me understand, once someone touches your thing, they expect it to be perfect, and their standards are extremely high. This is understandable in part given the large number of note taking apps in existence, and notably WYSIWYG ones. As such, there is some rationale for improving tech.
In any case, the outcome of that is that the tech has improved. And I have done a relatively good job of clearly publishing any "more user visible" improvements to docs.ourbigbook.com/news and social media such as though it is important to note that there have been more than one "fix a hard bug" weeks that were not published because they would just bore readers.
During this period the main focus has been on improving OurBigBook Web, i.e. the dynamic website that powers OurBigBook.com. There are two reasons for that:As a result, Web is now way less buggy and much more usable.
- Web is what has the OurBigBook topics feature for mind-melding, which is the killer feature of OurBigBook compared to other note taking apps and therefore deserves the highest levels of priorityStatic website generation is an indispensable escape valve that ensures that your content can be published forever even if OurBigBook.com goes down one day, which it won't as long as I live. But the innovation is Web.
- static website generation was closer to good enough, but web was much further and is fundamentally harder.I'm extremely satisfied with OurBigBook static website generation and haven't touched it as much. It wasn't easy to reach this state, but I'm there.But Web is a different and much more complex beast.Making CLI software that will run on a person's local computer under full trust and building a bunch of HTML from lightweight markup in bulk is one thing.But making a public dynamic website that has to continuously maintain a coherent database state on granular updates, while giving users some trust but not enough for them to blow everything up is on a totally different level. See e.g. the recent SPAM attack we've had to fend off.
Figure 9. Screenshot showing voting manipulated SPAM as the most highly upvoted article on OurBigBook.com. Source.And then there's also the issue of front-end being mega-hard to get right.
If you look through the list of Web updates, there is nothing specifically mind blowing. The core ideas have largely crystallized, and we are just trying to making them click. I have a few more punches up my sleeve, but the core is decided.
OurBigBook Web search
. Source. This is one of the many basic quality of life improvements that have been done on OurBigBook Web.OurBigBook Web article announcement
. Source. Another cute new feature, you can send an email to your followers about a new amazing article you created.Web process has been somewhat slower than what I'd like. Of course, it is the case of any project that things are easily said than done. But there are two other main structural factors that have played into it:
- I have my first baby now, and we're learning how to deal with that on the fly.For example, we could have put him on childcare a bit earlier, but due to inexperience we've kept him a bit longer than we maybe should have.Things are well sorted out now, but not matter how good your support system is, at the end of the day, and more often night, it is you the parents that have to deal with a lot of inevitable baby issues. Unless you want them to turn into psychopaths and drug addicts that is, which I don't. I've reached the point of semi failure middle age that the baby feels like my best moonshot.All of this sets a fundamental limit on how many hours you can work per week.But at least with the donations I was able to work on OurBigBook at all. Because if it weren't for that, I would have to focus entirely on the generic job instead and OurBigBook would have been put on hold.
- the choice of Web stack. I was allured by Next.js. I can see the beauty and usefulness of a Node.js render front-end that also runs on backend and hydration. That is awesome.But:
- React is insanely hard to learn and understand. Furthermore, it is also hard to understand the performance problem that it solves, and actually have a benchmark where this problem is solved faster than just delivering some HTML files with ad-hoc Js on top.
- the lack (or perhaps excess of shitty) actual web framework like Ruby on Rails and Django means that I have to rediscover the wheel many times over for all the essential support activities like testing, login and so one
At this point a rewrite is out of the question. I've managed to master things well enough to get a decent result, and given up on the few things that I couldn't for the life of me achieve, after documenting them very well for posterity of course.
Aside from Web, there was only one thing that received a significant improvement, and that was the OurBigBook VS Code extension. The extension is not perfect, and it is not the "final UI", which has to be some WYSIWYG implementation, and there are some fundamental limitations that cannot be overcome without patching VS Code itself. However, the extension is already extremely usable, and I'm writing this on it right now. Basics like syntax highlighting, jump to definition and autocomplete are very useful and usable.
Tree navigation in the OurBigBook Visual Studio Code extension
. OK, I need to do content. I know :-) At the university I'm at, the only department that is open is the mathematics one. Both:All other courses extremely closed, notably Physics, which is the other course I'd consider. There are upsides and downsides for going for Mathematics:If I were free to choose, I might go for Physics instead. But maths isn't hard, and I think I'll just go with the hand I'm dealt this time to start with.
- physically, I'm sitting next to some students right now, though they don't yet know that their saviour is just next to them.
- in terms of publishing the course materials online. Many of them even have solution
- upside:
- maths doesn't change with time
- maths doesn't require experiments
- downside: most of it is useless compared to Physics
Tech wise, the big things are the following ones to which I have given different levels of architectural consideration (i.e. read: I'm afraid they'll be fucking hard and that I'll spend a month on yet another useless feature that won't help get a single user). I don't think I'll do those before at least a little bit of content, we'll see:
- WYSIWYG: this is not a question of if, but when and how. Even I miss it when dealing with images. I was particularly impressed by Trillium Notes, and might consider forking it or reusing some of its components
- perfect two way sync from web to local: github.com/ourbigbook/ourbigbook/issues/326Currently, after much effort, publishing from local to web is extremely good.But pulling back changes that you make on web UI locally is not really possible. A basic version can be made easily, but a great version requires some thought.In particular, preventing accidental rewrite on simultaneous local + web edits require edit history to be in place.The rationale here is that users would start editing on Web with a low entry barrier. And as they become more committed to the project, they would eventually transition to having editing most of their content locally from a desktop, with the exception of a few minor edits on the go when they are on a cell phone, and which we want to very easily and automatically be pulled back to local as soon as they open an editor on their laptop.I.e. we want to add a downwards arrow to the following diagram:
Smaller cute tech that I might do before content "real quick" include:
- move more into community tagging rather than just community topic-ing:
- automatic topic rendering for plaintext! github.com/ourbigbook/ourbigbook/issues/356. In particular this could open the doors for AI generated content.
Another thing I really want to do before time is up is to create a video summarizing my philosophy of education. I want it to be as fun and funny and sad as possible, with silly moving animated images and slides, not just me talking to the camera. Although all of the points I intend to talk about have undoubtedly been covered by others, it is something that I feel so strongly about that I would like to tell others about it more personally. If I start it it will likely take a few days to get done, and I'm not sure wha the final quality would be. It is a bit sad to not do "project work", but I think I'll end up doing it regardless. Class it under "fundraising" if you will, as it may help to find other like minded but rich people.
It is a bit sad to work on a project that no one cares about. You're not sure if you're crazy or a visionary. And it is kind of lonely.
I sometimes wonder if I would be happy doing this for the rest of my life if I could. And if it would have any impact at all no matter for how long I do it. My feelings in that area go from slightly depressed to slightly excited about the potential a few times every week.
As we all know, living and making life choices means sacrificing other things that could have been. When I was in France in 2015, I started a masters course in AI/robotics with the idea of doing a PhD and AGI research later on but quit half way because I felt university was such a waste of time.
But come now the AI boom, and although I still believe education is broken, I might have been much better off financially/reputationally if I had withstood the bullshit followed that path. Instead I sacrificed that for nerding about low level programming and open educational content.
It is hard to get such ideas off one's mind. But the fact is, for better or worse, I've started walking the path of educational reform and sacrificed others along the way, and this is the path that I'm further ahead than other people, and perhaps I should pursue it further to a possible conclusion. Also this path has the advantage that it is not fully exclusive from other academic endeavors as we will always need content about the new flashy things that keep coming up.
So yeah, it's hard, but here I am, and I'll go as far as I can without going into Charles Bukowski levels of personal sacrifice.
Announcements:
I wanted to do a quick exploration of open PageRank implementation and data.
My general motivation for this is that a PageRank-like algorithm could be useful for more accurate user and article ranking on OurBigBook, see: Section "PageRank-like ranking"
But it could also be just generally cool to apply it to other graph datasets, e.g. for computing an Wikipedia internal PageRank.
A quick Google reveals only Open PageRank, but their methods are apparently closed source.
Then I had a look at the Common Crawl web graph data to see if I could easily calculate it myself, and... they already have it! See: Section "Common Crawl web graph official PageRank"
Their graph dumps are in BVGraph graph file format, which is the native format of the WebGraph framework, which implements the format and algorithms such as PageRank.
The only thing I miss is a command line interface to calculate the PageRank. That would be so awesome.
The more I look at it the more I love Common Crawl.
Announcements:
In cc-main-2024-25-dec-jan-feb-domain-ranks.txt:
cirosantilli.com
was ranked ~453kourbigbook.com
was at ~606k
I finally took a day to edit the Cool data embedded in the Bitcoin blockchain section from Aratu Week 2024 Talk by Ciro Santilli: My Best Random Projects into a proper YouTube video. The amount of effort that goes into every minute of video editing never ceases to amaze me.
My Bitcoin inscription museum by Ciro Santilli
. Source. Announcements:
- mastodon.social/@cirosantilli/113764420506911687
- x.com/cirosantilli/status/1875157694270841024
- www.linkedin.com/posts/cirosantilli_my-bitcoin-inscription-museum-images-and-activity-7280924162838126592-BVLX/
- www.facebook.com/cirosantilli/posts/pfbid02kN3sVVTViekYsgyqmN1pdcTp81ca7rJSmofk7X3DkdXYL6Rb8tEd78LoLYw7dEMSl
In 2024 I was user #25 with the most reputation gained on Stack Overflow.
This is up from #38 in 2023 is even though I have answered less questions than before.
This is likely because LLMs have killed users that just answered lots of easy new questions, and favored those like me who only answer more important questions found through Google.
I was #13 on the last quarter, so this is likely to go even higher in 2025. More details at: Section "Ciro Santilli's Stack Overflow contributions"
Announcements:
Ciro Santilli's Stack Overflow stats
. Further methodology details at: Figure 13. "Ciro Santilli's Stack Overflow stats".I've been thinking lightly about adding full text search to OurBigBook.
For example, at docs.ourbigbook.com/news/article-and-topic-id-prefix-search article search was added, but it only finds if you search something that appears right at the start of a title, e.g. for:you'd get a hit for:but not for
Fundamental theorem of calculus
fundamental
calculus
To do this efficiently, we need full text search, which PostgreSQL implements.
But finding a clean way to generate test data for testing out the speedup was not so easy and exploration into this led me to publishing a few new slightly improved methods where Googlers can now find them:
- unix.stackexchange.com/questions/97160/is-there-something-like-a-lorem-ipsum-generator/787733#787733 I propose a neat random "sentence" generator using common CLI tools like
grep
andsed
and the pre-installed Ubuntu dictionary/usr/share/dict/american-english
:grep -v "'" /usr/share/dict/american-english | shuf -r | paste -d ' ' $(printf "%4s" | sed 's/ /- /g') | sed -e 's/^\(.\)/\U\1/;s/$/./' | head -n10000000 \ > lorem.txt
- to achieve that, I also proposed two superior "join every N lines" method for the CLI: stackoverflow.com/questions/25973140/joining-every-group-of-n-lines-into-one-with-bash/79257780#79257780, notably this awk poem:
seq 10 | awk '{ printf("%s%s", NR == 1 ? "" : NR % 3 == 1 ? "\n" : " ", $0 ) } END { printf("\n") }'
- to achieve that, I also proposed two superior "join every N lines" method for the CLI: stackoverflow.com/questions/25973140/joining-every-group-of-n-lines-into-one-with-bash/79257780#79257780, notably this awk poem:
- stackoverflow.com/questions/3371503/sql-populate-table-with-random-data/79255281#79255281 I propose:
- a clean PostgreSQL random string stored procedure that picks random characters from an allowed character list
CREATE OR REPLACE FUNCTION random_string(int) RETURNS TEXT as $$ select string_agg(substr(characters, (random() * length(characters) + 1)::integer, 1), '') as random_word from (values('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789- ')) as symbols(characters) join generate_series(1, $1) on 1 = 1 $$ language sql;
- first generating PostgreSQL data as CSV, and then importing the CSV into PostgreSQL as a more flexible method. This can also be done in a streaming fashion from stdin which is neat.
python generate_data.py 10 | psql mydb -c '\copy "mytable" FROM STDIN'
- a clean PostgreSQL random string stored procedure that picks random characters from an allowed character list
- stackoverflow.com/questions/16020164/psqlexception-error-syntax-error-in-tsquery/79437030#79437030 regarding the safe generation of prefix search
tsquery
from user inputs without query errors, I've learned aboutwebsearch_to_tsquery
and further highlighted a possibletsquery -> text -> tsquery
approach that might be correct for prefix searches - stackoverflow.com/questions/67438575/fulltext-search-using-sequelize-postgres/79439253#79439253 I put everything together into a minimal Sequelize example, read for usage in OurBigBook
Finally I did a writeup summarizing PostgreSQL full text search: Section "PostgreSQL full-text search" and also dumped it at: www.reddit.com/r/PostgreSQL/comments/12yld1o/is_it_worth_using_postgres_builtin_fulltext/ for good measure.
This one was way harder than my previous fun with "find the oldest people who won a given prize" (Nobel Prize/Oscar) mastodon.social/@cirosantilli/112689376315990248 because unlike those prizes where all the decisions are centralized, countries are much more complicated beasts, with changing currencies and international recognition.
This was a good experience to see a few ways in which Wikidata is inconsistent, with the same concept being expressed in multiple different ways, e.g. "end time" property of the current vs the superior "end time" qualifier.
Particularly bad is the notion of a "deprecated rank", that should really not exist.
This is exactly the type of semi interactive data munching that I like to do, a bit in the same vein as CIA 2010 covert communication websites and Cool data embedded in the Bitcoin blockchain.
As you might imagine, the secret services use exactly this type of knowledge modelling to do their dirty business, e.g. Gaffer by the GCHQ.
If only I weren't such a rebel, I'd be a perfect fit for the intelligence agencies.
This is the best monstrosity I had the patience to come up with:It got quite close to the ISO 4217 list.
SELECT
?currency
(GROUP_CONCAT(DISTINCT ?currencyIsoCode; SEPARATOR=", ") AS ?currencyIsoCodes)
?currencyLabel
(GROUP_CONCAT(DISTINCT ?countryLabel; SEPARATOR=", ") AS ?countries)
WHERE {
?country wdt:P31/wdt:P279* wd:Q6256. # is country
?country p:P38 ?countryHasCurrency.
?countryHasCurrency ps:P38 ?currency.
?countryHasCurrency wikibase:rank ?countryHasCurrencyRank.
OPTIONAL {
?currency p:P498 ?currencyHasIsoCode.
?currencyHasIsoCode ps:P498 ?currencyIsoCode.
}
FILTER NOT EXISTS {?country wdt:P576 ?countryAbolished}
FILTER NOT EXISTS {?currency wdt:P576 ?currencyAbolished}
FILTER NOT EXISTS {?currency wdt:P582 ?currencyEndTime}
FILTER NOT EXISTS {?countryHasCurrency pq:P582 ?countryHasCurrencyEndtime}
FILTER (?countryHasCurrencyRank != wikibase:DeprecatedRank)
FILTER (!bound(?currencyHasIsoCode) || ?currencyHasIsoCode != wikibase:DeprecatedRank)
# TODO makes query take timeout? Why? Needed to exclude PLZ.
FILTER NOT EXISTS {?currencyHasIsoCode pq:P582 ?currencyHasIsoCodeEndtime}
SERVICE wikibase:label {
bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".
?currency rdfs:label ?currencyLabel .
?country rdfs:label ?countryLabel .
}
}
GROUP BY ?currency ?currencyLabel
ORDER BY ?currencyIsoCodes ?currencyLabel
I was drawn into this waste of time after I noticed that someone had managed to create the Wikipedia of PsiQuantum which I had tried earlier but got deleted: mastodon.social/@cirosantilli/113488891292906243, and then I made the mistake of having a look at the Wikidata page of PsiQuantum.
500,000 Transnistrian ruble banknote 1997 series
. This is one of the most widely used currencies which does not have an ISO 4217 code.Announcements:
I also had one more fun with: opendata.stackexchange.com/questions/15750/structured-data-for-nobel-prizes/21847#21847 getting some basic info about Nobel Prize winners, and noticed one, John Sulston, 2002 Nobel Prize in Physiology and Medicine laureate, who likely has the wrong place of birth on his Nobel Prize profile: www.nobelprize.org/prizes/medicine/2002/sulston/facts/ which is funny. I suggested the change now. Edit they fixed it after I pointed it out:
Another highlight was 1913 Nobel Prize in Chemistry laureate Alfred Werner who born either in Mulhouse in Alsace, France, or in "Yo no sé qué me pasó" ("I don't know what happened to me" in Spanish), a 1986 song by Mexican singer Juan Gabriel.
Announcements:
Also at opendata.stackexchange.com/questions/21849/how-to-get-a-list-of-all-nobel-prize-winners-who-never-had-a-doctorate-from-wiki/21850#21850 I tried to get the list of Nobel Prize laureates who don't have a PhD. I think the query was correct, but Wikidata data is just too incomplete. Related:
I edited the VOD of the talk Aratu Week 2024 Talk by Ciro Santilli: My Best Random Projects about the CIA 2010 covert communication websites a bit and published it at: www.youtube.com/watch?v=TFfuzZC5Qpc.
Announcements:
GitHub forbade our China Dictatorship auto-reply bot, the reason given is because they forbid comment reply bots in general. Though it was cool to see a junior support staff person giving out what obviously triggered the action:before a more senior one took over.
We've received a large volume of complaints from other users indicating that the comments and issues are unrelated to the projects they were working on.
Ciro was slightly saddened but not totally surprized by the bloodbath against him on the Reddit the threads he created:
- www.reddit.com/r/github/comments/1g7acv6/github_forbade_me_from_running_a_bot_that_would/ deleted by admins becausewhich is stupid, obviously we should be able to discuss GitHub policies in that sub.
We don't work for GitHub and we can't help you with your GitHub support problems. You'll just need to be patient.
Also good highlight to user whoShotMyCowReply:Has GitHub also forbidden you from, say, getting a job
Reply:No, a 120,000 USD donation did that: cirosantilli.com/sponsor#1000-monero-donation
Many successful people are neurodiverse comes to mind.Can't hate on the grind but I think you should also consider psychiatric help
- www.reddit.com/r/China/comments/1g7aa6k/american_programming_website_github_forbade_me/: also deleted without reason
So we observe once again the stupidity of deletionism towards anything that is considered controversial. The West is discussion fatigued, and would rather delete discussion than have it.
We also se people against you having freedom to moderate your own repositories as you like it, with bots or otherwise. Giving up freedoms for nothing, because "bot is evil".
Announcements:
academia.stackexchange.com/questions/213576/do-copyright-transfer-of-papers-to-publishers-affect-when-the-paper-enters-the-p Do copyright transfer of papers to publishers affect when the paper enters the public domain since copyright belongs to a corporation and not persons?
I'm asking a law question for a change, because I enjoy skimming through important old papers and uploading parts of them where everyone can legally enjoy them.
Announcements:
I like the Falun Mine for two reasons:
- some cool chemical discoveries have been made with a relation to the mine, notably tantalum and selenium, added a section to Wikipedia: en.wikipedia.org/w/index.php?title=Falun_Mine&oldid=1245374294#Discovery_of_new_elements I used the book discovery Of The Elements by Mary Elvira Weeks as my primary reference.
- it is the Chinese version of the Scunthorpe problem due to a naming conflict with Falun Gong, a censored new religion that was banned in China
Announcements:
Whenever a user creates an issue or comment on China Dictatorship, the bot now automatically creates a new issue with one of the latest news from Duty Machine: github.com/duty-machine/duty-machine
Sample created issue: github.com/cirosantilli/china-dictatorship/issues/1322 Script: github.com/cirosantilli/china-dictatorship/blob/ab6a46c511afaaf6c9e68ba8813c2b2cf9d9638c/action.js#L195
Duty Machine is a bot repo that automatically scrapes Chinese language news from major news outlets such as the New York Times or Radio Free Asia which ensures that China Dictatorship news will always be new.
It's the war of the anonymous bots against the little pinks, part of asymmetric information warfare: cirosantilli.com/china-dictatorship/asymmetric-information-warfare
Announcements:
superuser.com/questions/420885/is-there-a-face-recognition-command-line-tool/1852394#1852394 played with the
face_recognition
Python package: github.com/ageitgey/face_recognition Cute CLI API, but disappointing accuracy. Also at:Thanks Adam Geitgey for putting that repo up.
Announcements:
Under Section "Publication by Marie Curie" I did a quick overview of the papers in which Marie Curie and collaborators publish the existence of new elements polonium and radium. Both are very understandable (except the chemistry), and have some cute terminology. I also cited those papers on her Wikipedia page: en.wikipedia.org/w/index.php?title=Marie_Curie&diff=1240252528&oldid=1238097626 Another good exercise in "old paper finding" + "Wikipedia markup/rules" as I looked at the Comptes rendus de l'Académie des Sciences a bit.
This was kickstarted by YouTube recommending me the following good video:
which led me into yet a quick nuclear physics binge. I shouldn't do this to myself. I also ended up writing some tentative answers on Quora:
Announcements:
I tried to use every single free offline text-to-speech engine that would run on Ubuntu 24.04 without too much hassle to see if any of them sounded natural. pico2wave was the overall winner so far, but it is not perfect.
I've been noticing a gap between the "AI" SOTA and what is actually packaged well enough to be usable by a general audience.
Also played a bit more with OpenAI Whisper: askubuntu.com/questions/24059/automatically-generate-subtitles-close-caption-from-a-video-using-speech-to-text/1522895#1522895 Mind blowing performance and perfect packaging as well, kudos.
Announcements:
- en.wikipedia.org/wiki/Scott_Hassan I delved into a bit of Wikipedia drama on the page of Scott Hassan, initial coder of Google Search, which I created an am the main contributor.Originally I had added some details about this messy divorce which saw coverage in major publications such as the New York Times: www.nytimes.com/2021/08/20/technology/Scott-Hassan-Allison-Huynh-divorce.html and Scott used puppets to remove those at several points in time over the years.Those removals were then reverted by other editors, not myself, indicating that editors wanted the details there.While preparing to finally decide this through moderation, I ended up finding that the divorce details should likely have been left out according to Wikipedia rules, because Scott is "relatively unknown" and a "low profile individual":and so I ended up removing them myself.This is yet once again deletionism on Wikipedia weakening the site, and making @OurBigBook stronger :-) Here is the uncensored one: Scott HassanI spent time on this partly because I'm mildly obsessed with founding myths of companies, but also partly to better understand the moderation process of Wikipedia.
- unix.stackexchange.com/questions/256138/is-there-any-decent-speech-recognition-software-for-linux/613392#613392 cool to see that the Vosk open source speech recognition software by twitter.com/alphacep now has a convenient command line interface called vosk-transcriber!It allows you to just:
vosk-transcriber -m ~/var/lib/vosk/vosk-model-en-us-0.22 -i in.ogg -o out.srt -t srt
to extract a subtitle file out.srt from a .ogg audio input file.Accuracy is a bit meh, but we'll take it! - video.stackexchange.com/questions/33531/how-to-remove-background-from-video-without-green-screen-on-the-command-line/37392#37392 tested this AI video background remover github.com/nadermx/backgroundremover by @nadermx. It had a few glitches, but I had fun.unix.stackexchange.com/questions/233832/merge-two-video-clips-into-one-placing-them-next-to-each-other/774936#774936 I then learned how to stack videos side-by-side with ffmpeg to create this side-by-side demo. It also works for GIFs! stackoverflow.com/questions/30927367/imagemagick-making-2-gifs-into-side-by-side-gifs-using-im-convert/78361093#78361093Posted at:
- Just found out that my Lenovo ThinkPad P14s has an infrared camera, and recorded a quick test video on Ubuntu 23.10 with:
fmpeg -y -f v4l2 -framerate 30 -video_size 640x360 -input_format gray -i /dev/video2 -c copy out.mkv
- mastodon.social/@cirosantilli/112261675634568209
- twitter.com/cirosantilli/status/1778981935257116767
- www.facebook.com/cirosantilli/posts/pfbid027M3n2p8snE9otAWdHtJ3ig2AhrXoDGv4h68o1z8agHceQBbFHZpEoxg7KZbiWAgWl
- www.linkedin.com/feed/update/urn:li:activity:7184755892410576897/
- www.youtube.com/watch?v=o1ZeR6pmf6o
- commons.wikimedia.org/wiki/File:Infrared_video_of_Ciro_Santilli_waving_recorded_on_Lenovo_ThinkPad_P14s_with_FFmpeg_6.0_on_Ubuntu_23.10.webm
Figure 18. Ciro Santilli waving hello in infrared.