Big goals:
- the pursuit of AGI
- physics simulations, including scientific visualization software
- formalization of mathematics
Examples under cmake:
- cmake/hello: just print a message in CMake itself and exit. No compilation.
- cmake/hello_c: C hello world
- cmake/option:
set()
andoption()
basic examples - cmake/multi_executable
- cmake/multi_file
- cmake/multi_file_recursive
- cmake/shared_lib_external
"automatic programming has always been a euphemism for programming in a higher-level language than was then available to the programmer" sums it up.
The ultimate high level is of course to program with: "computer, make money", which is the goal of artificial general intelligence.
Lowering means translating to a lower level representation.
Raising means translating to a higher level representation.
Decompilation is basically a synonym, or subset, of raising.
Saves preprocessor output and generated assembly to separate files.
- preprocessor:
- assembly:
Very hot stuff! It's like ISA-portable assembly, but with types! In particular it also it deals with calling conventions for us (since it is ISA-portable). TODO: isn't that exactly what C does? :-) LLVM IR vs C
Documentation: llvm.org/docs/LangRef.html
Example: llvm/hello.ll adapted from: llvm.org/docs/LangRef.html#module-structure but without double newline.
To execute it as mentioned at github.com/dfellis/llvm-hello-world we can either use their crazy assembly interpreter, tested on Ubuntu 22.10:
This seems to use
sudo apt install llvm-runtime
lli hello.ll
puts
from the C standard library.Or we can Lower it to assembly of the local machine:
which produces:
and then we can assemble link and run with gcc:
or with clang:
sudo apt install llvm
llc hello.ll
hello.s
gcc -o hello.out hello.s -no-pie
./hello.out
clang -o hello.out hello.s -no-pie
./hello.out
hello.s
uses the GNU GAS format, which clang is highly compatible with, so both should work in general.Reproducible builds allow anyone to verify that a binary large object contains what it claims to contain!
Many plotting software can be used to create mathematics illustrations. They just tend to have more data-oriented rather than explanatory-oriented output.
Some notable ones:
Ciro Santilli has some good related articles listed under: the best articles by Ciro Santillis.
Good library to render text in OpenGL, see also: stackoverflow.com/questions/8847899/opengl-how-to-draw-text-using-only-opengl-methods/36065835#36065835
The fact that they kept the standard open source makes them huge heroes, see also: closed standard.
Good modern OpenGL tutorial in retained mode with shaders, see also: stackoverflow.com/questions/6733934/what-does-immediate-mode-mean-in-opengl/36166310#36166310
Examples at: two-js/.
Feels good. Maybe not ultra featured, and could have more simple examples in docs, but still good.
One of the main features of Two.js appears to be the fact that it can natively render to either SVG and canvas, rather than creating SVG through DOM hacks as done by other projects.
One specific software project, typically with a single executable file format entry point.
As mentioned at Section "Computer security researcher", Ciro Santilli really tends to like people from this area.
Also, the type of programming Ciro used to do, systems programming, is particularly useful to security researchers, e.g. Linux Kernel Module Cheat.
The reason he does not go into this is that Ciro would rather fight against the more eternal laws of physics rather than with some typo some dude at Apple did last week and which will be patched in a month.
Ciro Santilli found out that he likes computer security researchers and vice versa.
It's a bit the same reason why he likes physicists: you can't bullshit with security.
You can't just talk nice and hope for people to belive you.
You can't not try to break things and just keep everyone happy in their false illusion of safety.
You can't do a half job.
If you do any of that, you will get your ass handed to you in a little gift bag.
All of this is closely linked to Ciro Santilli's self perceived creative personality and being naughty and creative are correlated.
A superstar security researcher with some major exploits from in the 2000's.
Oh yeah, that felt good. A few months before he died.
Ermm, as of February 2021, I was able to update my 2FA app token with the password alone, it did not ask for the old 2FA.
So what's the fucking point of 2FA then? An attacker with my password would be able to login by doing that!
Is it that Google trusts that particular action because I used the same phone/known IP or something like that?
- youtu.be/IH0GXWQDk0Q?t=900 mentions that Alfred Charles Hobbs commented in 1853:
Rogues are very keen in their profession, and know already much more than we can teach them
Basically the opposite of security through obscurity, though slightly more focused on cryptography.
This is really good.
It allows the client to prepare a single request that gets all the data it wants to fill up a given webpage, rather than doing several separate requests.
So it only gets exactly what it needs, and in a single request.
Very sweet. This is the future of the web.
The author Ole Tange answers every question about it on Stack Exchange. What a legend!
This program makes you respect GNU make a bit more. Good old make with
-j
can not only parallelize, but also take in account a dependency graph.Way too few people know about this. Spread the word.
This means that e.g. if you do an
UPDATE
query on multiple rows, and power goes out half way, either all update, or none update.This is different from isolation, which considers instead what can or cannot happen when multiple queries are running in parallel.
Determines what can or cannot happen when multiple queries are running in parallel.
See Section "SQL transaction isolation level" for the most common context under which this is discussed: SQL.
A software that implements some database system, e.g. PostgreSQL or MySQL are two (widely extended) SQL implementations.
List databases:
echo 'show dbs' | mongo
Delete database:
or:
use mydb
db.dropDatabase()
echo 'db.dropDatabase()' | mongo mydb
View collections within a database:
echo 'db.getCollectionNames()' | mongo mydb
Show all data from one of the collections: stackoverflow.com/questions/24985684/mongodb-show-all-contents-from-all-collections
echo 'db.collectionName.find()' | mongo mydb
Tested as of Ubuntu 20.04, there is no Mongo package available by default due to their change to Server Side Public License, which Debian opposed. Therefore, you have to add their custom PPA as mentioned at: docs.mongodb.com/manual/tutorial/install-mongodb-on-ubuntu/
Per language:
How to decide if an ORM is decent? Just try to replicate every SQL query from nodejs/sequelize/raw/many_to_many.js on PostgreSQL and SQLite.
There is only a very finite number of possible reasonable queries on a two table many to many relationship with a join table. A decent ORM has to be able to do them all.
If it can do all those queries, then the ORM can actually do a good subset of SQL and is decent. If not, it can't, and this will make you suffer. E.g. Sequelize v5 is such an ORM that makes you suffer.
The next thing to check are transactions.
Basically, all of those come up if you try to implement a blog hello world world such as gothinkster/realworld correctly, i.e. without unnecessary inefficiencies due to your ORM on top of underlying SQL, and dealing with concurrency.
Ciro Santilli used to use file managers in the past.
But he finally converted to a shell
cd
aliases that auto-ls
: github.com/cirosantilli/dotfiles/blob/a51bcc324f0cff0eddd4c3bb8654ec223a0adb7b/home/.bashrc#L1058The most powerful GUI file manager ever?? Infinite configurability??
Ciro Santilli wasted some time on it before he gave up on file managers altogether.
Ciro Santilli considered it before he stopped using file managers altogether, it is not bad.
A library to make games.
Ciro Santilli considered this as the basis for Ciro's 2D reinforcement learning games, but ultimately decided it was a bit too messy. Nice overall though.
The one true game engine!
Their project lead as of 2018 was pro-CCP: github.com/cirosantilli/china-dictatorship/blob/aa1176c57fc2929465294e520b43b50d44e202ba/communities-that-censor-politics.md
Originally by Keyhole Inc., which the nbecame Google Maps, but the format seems standardized and has non-Google support, so should be OK.
Owned/developed by Google as of 2020.
Early on jumpstarted from several acquisitions, notably Keyhole Inc. and Where 2 Technologies.
Street View's go into the past mode is the dream of every archaeologist. Ciro can only dream of a magic street view that allows going back to earlier centuries and beyond... isn't it amazing to think that people in the future will have that ability to time travel back to around the year 2006? Ciro wonders how long Google will be able to keep storing data like that.
Thanks, CIA.
It is rare to find a project with such a ridiculously high importance over funding ratio.
E.g., as of 2020, their help login help.openstreetmap.org/ shows MyOpenID as an option, which was discontinued in 2014, and not Google OAuth.
They do still seem to have a bit more activity than gis.stackexchange.com/questions/tagged/openstreetmap on Stack Exchange.
Complaints:
- Transliteration is off by default!...... wiki.openstreetmap.org/wiki/Translation You just have to learn all scripts ever. Good luck with the Chinese characters. Genius.
- In order to see information about places, you have to click "Query features" on the toolbar first. Who made such a terrible UI? Direct click is a much, and so easy to implement?
- It is impossible to discern different types of paths and other walking path symbols, the symbols are too small, and just scale down to a line no matter how much you zoom in.
- Power lines are way too visible. While that is kind of cool, it is useless and distracting to most people most of the time.
- No street-level imagery...: help.openstreetmap.org/questions/1178/adding-photos
- No aerial imagery: help.openstreetmap.org/questions/6849/how-can-i-see-the-aerial-imagery-without-editing-the-map But that is kind of understandable, as that one might not be free.
- No restaurant ratings: help.openstreetmap.org/questions/64852/ratings-for-pois because it is "Subjective". OMG those people, such a huge value powerhouse wasted.Not just for restaurants, but for other things as well, e.g. sharing of good cycle circuits.
All of this is a shame, because they do have some incredible data that you cannot find easily on other maps because people just edited it up.
Kind of works! Notably, has the amazing cycling database offline for you, if you fall within the 6 area downloads. It is worth supporting these people beyond the 6 free downloads however.
Has some of the best map data available for the United Kingdom, but their data appears to be proprietary?
IDEs are absolutely essential for developing complex software.
The funny thing is that you don't notice this until someone shows it to you. But once you see it, there is not turning back, just like Steve Jobs customers don't know what they want quote.
Unfortunately, after the Fall of Eclipse (archive), the IDE landscape in 2019 is horrible and split between:
Programmers of the world: unite! Focus on one IDE, and make it work for all languages and all build systems. Give it all the features that Eclipse has, but none of the bugginess. Work with top project to make sure the IDE works for all top projects.
Projects of the world: support one IDE, with in-tree configuration. Complex integration is often required between the IDE and the build system, and successful projects must to that once for all developers. Either do this, or watch you complex project wither away.
Build tool maintainers: make it possible for IDEs to support your tool! E.g., implement JSON Compilation Database output so that IDEs can read the exact compiler commands from that, in order to automatically determine how files should be parsed! Or better, just use libllvm in your IDE itself as the main parser.
Ciro is evaluating some IDEs at: github.com/cirosantilli/ide-test-projects
However also at the same time very limited integration with vscode, that makes using it for VScode compatibility almost useless, e.g.:
- you can't reuse the syntax defintions!
Before we get a decent open source integrated development environment, what else can you do?
But also perfect for small one-off files when you don't have the patience to setup said IDE.
vim's defaults are atrocious for the 21st century! Vundle is reasonable as an ad-hoc package manager, but it can't set fixed versions of packages:
Vimscript unit testing!!!
Ciro Santilli contributed a bit to this, and was even given push rights, see also: see also: Ciro Santilli's minor projects.
There is one major annoyance: you can't use ESC to leave the address bar focus, but using Tab as a workaround works:
Once upon a time (early 2010's), Eclipse dominated the IDE landscape and all was good. NetBeans was around too. And Java was still unmarred by Google LLC v. Oracle America, Inc..
But then something happened.
For some reason, Eclipse started to decay.
And the project that had once been a vibrant community of awesomeness, started to become... a zombie of its former self.
Buggyness started increasing. And not even hard to fix bugs. One liners that affect every user immediately after startup.
Sometimes, to Eclipse's defense they weren't "bugs". Just features that it became evident with time every programmer expected from a modern IDE.
But somehow the Eclipse community had a deep problem. A cancer. It had completely lost touch with user experience.
Perhaps is was due to the increasing interest of the several corporations that had adopted Eclipse as the base IDE for the proprietary solutions?
Perhaps.
Many users stuck to the IDE.
Some heroic efforts were made as plugins that drastically improved certain defects. The Darkest Dark plugin comes to mind.
But all those efforts required configuration. A setup time that most users simply don't have. The core devteam had become dumb and dead, unable to incorporate such changes.
This greatly opened up the space for other competing IDEs to come along. The "semi feature complete but at least easy to use and not so buggy" Visual Studio Code and the proprietary JetBrains IDEs being some of the most notable ones.
Using Eclipse as of the early 2020's is such a mixed experience. If you spend enough time to configure out the key buggyness, there are moments where you can feel "OMG, this feature is amazing".
But the effort is just too great, and soon another bug or obvious missing feature hits you and brings you back to reality.
Every young person uses VS Code now. Eclipse is dead, and there is no way back, usage will just continue dropping.
RIP, Eclipse. It wasn't meant to be.
Bibliography:
undo
is broken beyond belief: github.com/VSCodeVim/Vim/issues/1490
It is especially bad on large projects, unless you carefully whitelist only the small source directories:
FFmpeg is the assembler of audio and video.
As a result, Ciro Santilli who likes "lower level stuff", has had many many hours if image manipulation fun with this software, see e.g.:
- the "Media" section of the best articles by Ciro Santillis.
- Figure "Ciro knows how to convert videos to GIFs"
As older Ciro grows, the more he notices that FFmpeg can do basically any lower level audio video task. It is just an amazing piece of software, the immediate go-to for any low level operation.
FFmpeg was created by Fabrice Bellard, which Ciro deeply respects.
Resize a video: superuser.com/questions/624563/how-to-resize-a-video-to-make-it-smaller-with-ffmpeg:
Unlike every other convention under the sun, the height in
ffmpeg -i input.avi -filter:v scale=720:-1 -c:a copy output.mkv
scale
is the first number.FFmpeg is likely the backend of YouTube through reverse engineering: streaminglearningcenter.com/blogs/youtube-uses-ffmpeg-for-encoding.html (archive)
Crop
20
pixels from the bottom of the image:
convert image.png -gravity East -chop 20x0 result.png
What happens when the underdogs get together and try to factor out their efforts to beat some evil dominant power, sometimes victoriously.
Or when startups use the cheapest stuff available and randomly become the next big thing, and decide to keep maintaining the open stuff to get features for free from other companies, or because they are forced by the Holy GPL.
Open source frees employees. When you change jobs, a large part of the specific knowledge you acquired about closed source a project with your blood and tears goes to the trash. When companies get bought, projects get shut down, and closed source code goes to the trash. What sane non desperate person would sell their life energy into such closed source projects that could die at any moment? Working on open source is the single most important non money perk a company can have to attract the best employees.
Open source is worth more than the mere pragmatic financial value of not having to pay for software or the ability to freely add new features.
Its greatest value is perhaps the fact that it allows people study it, to appreciate the beauty of the code, and feel empowered by being able to add the features that they want.
That is why Ciro Santilli thought:
Life is too short for closed source.
But quoting Ciro's colleague S.:
Every software is open source when you read assembly code.
While software is the most developed open source technology available in the 2010's, due to the "zero cost" of copying it over the Internet, Ciro also believes that the world would benefit enormously from open source knowledge in all areas on science and engineering, for the same reasons as open source.
A more precise term for those in the know: open source software that also has a liberal license, for some definition of liberal.
Ciro Santilli defines liberal as: "can be commercialized without paying anything back" (but possibly subject to other restrictions).
He therefore does not consider Creative Commons licenses with NC to be FOSS.
For the newbs, the term open source software is good enough, since most open source software is also FOSS.
But when it's not, it's crucial to know.
This model can work well when there is a set of commonly used libraries that some developers often use together, but such that there isn't enough maintenance work for each one individually.
So what people do is to create a group that maintains all those projects, to try and get enough money to survive from the contributions done primarily for each one individually.
Examples:
Ciro Santilli's raison d'etre, one of his attempts: OurBigBook.com.
The outcome of closed knowledge is reverse engineering.
Projects:
- MIT OpenCourseWare
- several e-learning websites, e.g. OpenStax
- www.oeglobal.org/
Not everything is perfect.
One big problem of many big open source projects is that they are contributed to by separate selfish organizations, that have private information. Then what happens is that:
- people implement the same thing twice, or one change makes the other completely unmergeable
- you get bugs but can't share your closed source test cases, and then you can't automate tests for them, or clearly demonstrate the problem
- other contributors don't see your full semi secret important motivation, and may either nitpick too much or take too long to review your stuff
Another common difficulty is that open source maintainers may simply not care enough about their own project (maybe they did in the past but lost interest) to review external patches by people they don't know.
This is understandable: a new patch, is a new risk of things breaking.
Therefore, if you ever submit patches and they get ignore, don't be too sad. It just comes down to a question of maintenance cost, and means that you will waste some extra time on the next rebase. You just have to decide your goals and be cold about it:
- are you doing the right thing and going for a specific goal backward design? Then just fork, run as fast as possible towards a minimum viable product, and if you start to feel that rebase is costing you a lot, or feel you could get some open source fame for cheap, open reviews and see what upstream says. If they ignore you, politely tell yourself in your mind silently "fuck them", and carry on with the MVP
- otherwise, e.g. you just want to randomly help out, you have to ask them before doing anything big "how can I be of help". If I propose a patch for this issue, do you promise to review it?
Writing documentation in an open source project in which you don't have immediate push rights is another major pain due to code reviews. Code code reviews tend to be much less subjective, because if you do something wrong, stuff crashes, runs slower, or you need more lines of code to reach the same goal. There are tradeoffs, but in a limited number. Documentation code reviews on the other hand, are an open invitation to infinite bike-shedding, since you can't "run" documentation through a standardized brain model. Much better is for one good documenter person to just make one cohesive Stack Overflow post, and ping others with more knowledge to review details or add any missing pieces :-)
Open source development model in which developers develop in private, and only release code to the public during releases.
Notable example project: Android Open Source Project.
This development model basically makes reporting bugs and sending patches a waste of time, because many of them will already have been solved, which is why this development model is evil.
Ciro Santilli can accept closed source on server products more easily than offline, because the servers have to be paid for somehow (by stealing your private data).
Closed source on offline products used by millions of people is evil, when you could just have those for free with open source software! Thus Ciro's hatred for Microsoft Windows and MacOS (at least userland, maybe).
The opposite of open source software.
How the hell are you supposed to develop an open source implementation of something that has a closed standard?
Not to mention open source test suites, that would be way too much to ask for, those always end up being made by some shady small companies that go bankrupt from time to time, see e.g. .
If you are going to do closed source, at least do it like this.
Basically the opposite of need to know for software.
These people are heroes. There's nothing else to say.
Amazing project, that basically makes a more searchable Wayback Machine.
A bit hard to use their data though, partly due to size, but also lack of free to use querrying mechanisms, and how obtuse Amazon S3 is to use.
Notably, aws-cli with an account is the only reliable way, everything else is way too broken.
But still, their projct is amazing.
The only out-of-the-box search they seem to have is: urlsearch.commoncrawl.org/ for domains/URLs. It is good, but there could be so much more... notably IPs.
Also could should document the data shape a bit better.
Sample sizes can be found at: commoncrawl.org/2023/04/mar-apr-2023-crawl-archive-now-available/
To explore the data, after login:
aws s3 ls s3://commoncrawl/crawl-data/CC-MAIN-2013-20/
Copy the toplevel directory only:
aws s3 cp s3://commoncrawl/crawl-data/CC-MAIN-2013-20/ . --recursive --exclude "*/*"
Copy some wet/wat files:
aws s3 cp s3://commoncrawl/crawl-data/CC-MAIN-2013-20/segments/1368696381249/wat/CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.wat.gz .
aws s3 sync s3://commoncrawl/crawl-data/CC-MAIN-2013-20/segments/1368696381249/wet/CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.wet.gz .
Directory structrure:
- cc-index.paths.gz (1K)
- cc-index-table.paths.gz (1K)
- segment.paths.gz (1.7K) Sample lines:
crawl-data/CC-MAIN-2013-20/segments/1368696381249/ crawl-data/CC-MAIN-2013-20/segments/1368696381630/
- index.html (2.3K)
- wat.paths.gz (98K) Sample lines:
crawl-data/CC-MAIN-2013-20/segments/1368696381249/wat/CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.wat.gz crawl-data/CC-MAIN-2013-20/segments/1368696381249/wat/CC-MAIN-20130516092621-00001-ip-10-60-113-184.ec2.internal.warc.wat.gz
- wet.paths.gz (98K) Sample lines:
crawl-data/CC-MAIN-2013-20/segments/1368696381249/wet/CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.wet.gz crawl-data/CC-MAIN-2013-20/segments/1368696381249/wet/CC-MAIN-20130516092621-00001-ip-10-60-113-184.ec2.internal.warc.wet.gz
- warc.paths.gz (99K)
crawl-data/CC-MAIN-2013-20/segments/1368696381249/warc/CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz crawl-data/CC-MAIN-2013-20/segments/1368696381249/warc/CC-MAIN-20130516092621-00001-ip-10-60-113-184.ec2.internal.warc.gz
- segments: directgory with actual data
- 1368696381249: one of many segments, any meaning of name?
- CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.wet.gz (142M, 334M unzipped)A tiny bit of metadata, and then plaintext content from the website, e.g. the second one:No IP unfortunately.
WARC/1.0 WARC-Type: conversion WARC-Target-URI: http://004eeb5.netsolhost.com/stephensilver.htm WARC-Date: 2013-05-18T08:11:02Z WARC-Record-ID: <urn:uuid:773b31ba-ddc6-47a5-ae24-d08141b9944d> WARC-Refers-To: <urn:uuid:4b1bdbff-4926-4ced-86f6-072f5bb3837a> WARC-Block-Digest: sha1:LQFSCR2LIJQYMPTXRHWU7HAPQTVSYS3A Content-Type: text/plain Content-Length: 12046 Stephen Silver is a journalist and editor who specializes in the areas of politics, pop culture, film and sports. He works as an editor with the North American Publishing Co. and as a film critic with The Trend, a local newspaper in the Philadelphia area.
- CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.wat.gz (329M, 1.4G unzipped)A lot of JSON metadata and no contents as desired. Contains IP! Some entries however are humongous with a ton of useless data, that's what bloats these so much:Let's beautify one of them to see it better:
WARC/1.0 WARC-Type: metadata WARC-Target-URI: CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz WARC-Date: 2013-11-22T14:51:12Z WARC-Record-ID: <urn:uuid:ec54e493-8965-41be-b344-07596cc30b3a> WARC-Refers-To: <urn:uuid:cfeff436-7c4c-4119-aaa4-ec2ce27ad3e1> Content-Type: application/json Content-Length: 1180 {"Envelope":{"Format":"WARC","WARC-Header-Length":"274","Block-Digest":"sha1:JCZOI4V3UOTXGIRLFMPLW4J2WPLAKGVR","Actual-Content-Length":"372","WARC-Header-Metadata":{"WARC-Type":"warcinfo","WARC-Filename":"CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz","WARC-Date":"2013-11-22T14:51:12Z","Content-Length":"372","WARC-Record-ID":"<urn:uuid:cfeff436-7c4c-4119-aaa4-ec2ce27ad3e1>","Content-Type":"application/warc-fields"},"Payload-Metadata":{"Trailing-Slop-Length":"0","Actual-Content-Type":"application/warc-fields","Actual-Content-Length":"372","Headers-Corrupt":true,"WARC-Info-Metadata":{"robots":"classic","software":"Nutch 1.6 (CC)/CC WarcExport 1.0","description":"Wide crawl of the web with URLs provided by Blekko for Spring 2013","hostname":"ip-10-60-113-184.ec2.internal","format":"WARC File Format 1.0","isPartOf":"CC-MAIN-2013-20","operator":"CommonCrawl Admin","publisher":"CommonCrawl"}}},"Container":{"Compressed":true,"Gzip-Metadata":{"Footer-Length":"8","Deflate-Length":"453","Header-Length":"10","Inflated-CRC":"866052549","Inflated-Length":"650"},"Offset":"0","Filename":"CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz"}} WARC/1.0 WARC-Type: metadata WARC-Target-URI: http://%20jwashington@ap.org/Content/Press-Release/2012/How-AP-reported-in-all-formats-from-tornado-stricken-regions WARC-Date: 2013-05-18T05:48:54Z WARC-Record-ID: <urn:uuid:d519658f-7a63-46c1-849b-4cd92332ddb8> WARC-Refers-To: <urn:uuid:cefd363b-1fec-4590-8305-4c6fab2e095f> Content-Type: application/json Content-Length: 1501 {"Envelope":{"Format":"WARC","WARC-Header-Length":"433","Block-Digest":"sha1:B2B6JDSGWCUQIIUGV54SXEE25RX4SANS","Actual-Content-Length":"302","WARC-Header-Metadata":{"WARC-Type":"request","WARC-Date":"2013-05-18T05:48:54Z","WARC-Warcinfo-ID":"<urn:uuid:cfeff436-7c4c-4119-aaa4-ec2ce27ad3e1>","Content-Length":"302","WARC-Record-ID":"<urn:uuid:cefd363b-1fec-4590-8305-4c6fab2e095f>","WARC-Target-URI":"http://%20jwashington@ap.org/Content/Press-Release/2012/How-AP-reported-in-all-formats-from-tornado-stricken-regions","WARC-IP-Address":"165.1.125.44","Content-Type":"application/http; msgtype=request"},"Payload-Metadata":{"Trailing-Slop-Length":"4","HTTP-Request-Metadata":{"Headers":{"Accept-Language":"en-us,en-gb,en;q=0.7,*;q=0.3","Host":"ap.org","Accept-Encoding":"x-gzip, gzip, deflate","User-Agent":"CCBot/2.0","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"},"Headers-Length":"300","Entity-Length":"0","Entity-Trailing-Slop-Bytes":"0","Request-Message":{"Method":"GET","Version":"HTTP/1.0","Path":"/Content/Press-Release/2012/How-AP-reported-in-all-formats-from-tornado-stricken-regions"},"Entity-Digest":"sha1:3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ"},"Actual-Content-Type":"application/http; msgtype=request"}},"Container":{"Compressed":true,"Gzip-Metadata":{"Footer-Length":"8","Deflate-Length":"455","Header-Length":"10","Inflated-CRC":"453539965","Inflated-Length":"739"},"Offset":"453","Filename":"CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz"}}
Fuck no IP addresses either. But other entries do have it, why not this one?{ "Envelope": { "Format": "WARC", "WARC-Header-Length": "274", "Block-Digest": "sha1:JCZOI4V3UOTXGIRLFMPLW4J2WPLAKGVR", "Actual-Content-Length": "372", "WARC-Header-Metadata": { "WARC-Type": "warcinfo", "WARC-Filename": "CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz", "WARC-Date": "2013-11-22T14:51:12Z", "Content-Length": "372", "WARC-Record-ID": "<urn:uuid:cfeff436-7c4c-4119-aaa4-ec2ce27ad3e1>", "Content-Type": "application/warc-fields" }, "Payload-Metadata": { "Trailing-Slop-Length": "0", "Actual-Content-Type": "application/warc-fields", "Actual-Content-Length": "372", "Headers-Corrupt": true, "WARC-Info-Metadata": { "robots": "classic", "software": "Nutch 1.6 (CC)/CC WarcExport 1.0", "description": "Wide crawl of the web with URLs provided by Blekko for Spring 2013", "hostname": "ip-10-60-113-184.ec2.internal", "format": "WARC File Format 1.0", "isPartOf": "CC-MAIN-2013-20", "operator": "CommonCrawl Admin", "publisher": "CommonCrawl" } } }, "Container": { "Compressed": true, "Gzip-Metadata": { "Footer-Length": "8", "Deflate-Length": "453", "Header-Length": "10", "Inflated-CRC": "866052549", "Inflated-Length": "650" }, "Offset": "0", "Filename": "CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz" } }
The reason these can be huge is theHTML-Metadata
section which contain all outlinks! gist.github.com/Smerity/e750f0ef0ab9aa366558#file-bbc-pretty-wat-L34 CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz
()Obtain:aws s3 cp s3://commoncrawl/crawl-data/CC-MAIN-2013-20/segments/1368696381249/warc/CC-MAIN-20130516092621-00000-ip-10-60-113-184.ec2.internal.warc.gz .
- 1368696381249: one of many segments, any meaning of name?
The original gangster.
This is the dream cheating software every student should know about.
It also has serious applications obviously. www.sympy.org/scipy-2017-codegen-tutorial/ mentions code generation capabilities, which sounds super cool!
The code in this section was tested on
sympy==1.8
and Python 3.9.5.Let's start with some basics. fractions:
outputs:
Note that this is an exact value, it does not get converted to floating-point numbers where precision could be lost!
from sympy import *
sympify(2)/3 + sympify(1)/2
7/6
We can also do everything with symbols:
outputs:
We can now evaluate that expression object at any time:
outputs:
from sympy import *
x, y = symbols('x y')
expr = x/3 + y/2
print(expr)
x/3 + y/2
expr.subs({x: 1, y: 2})
4/3
How about a square root?
outputs:
so we understand that the value was kept without simplification. And of course:
outputs
outputs:
gives:
x = sqrt(2)
print(x)
sqrt(2)
sqrt(2)**2
2
. Also:
sqrt(-1)
I
I
is the imaginary unit. We can use that symbol directly as well, e.g.:
I*I
-1
Let's do some trigonometry:
gives:
and:
gives:
The exponential also works:
gives;
cos(pi)
-1
cos(pi/4)
sqrt(2)/2
exp(I*pi)
-1
Now for some calculus. To find the derivative of the natural logarithm:
outputs:
Just read that. One over x. Beauty.
from sympy import *
x = symbols('x')
diff(ln(x), x)
1/x
Let's do some more. Let's solve a simple differential equation:
Doing:
outputs:
which means:
To be fair though, it can't do anything crazy, it likely just goes over known patterns that it has solvers for, e.g. if we change it to:
it just blows up:
Sad.
y''(t) - 2y'(t) + y(t) = sin(t)
from sympy import *
x = symbols('x')
f, g = symbols('f g', cls=Function)
diffeq = Eq(f(x).diff(x, x) - 2*f(x).diff(x) + f(x), sin(x)**4)
print(dsolve(diffeq, f(x)))
Eq(f(x), (C1 + C2*x)*exp(x) + cos(x)/2)
diffeq = Eq(f(x).diff(x, x)**2 + f(x), 0)
NotImplementedError: solve: Cannot solve f(x) + Derivative(f(x), (x, 2))**2
Let's try some polynomial equations:
which outputs:
which is a not amazingly nice version of the quadratic formula. Let's evaluate with some specific constants after the fact:
which outputs
Let's see if it handles the quartic equation:
Something comes out. It takes up the entire terminal. Naughty. And now let's try to mess with it:
and this time it spits out something more magic:
Oh well.
from sympy import *
x, a, b, c = symbols('x a b c d e f')
eq = Eq(a*x**2 + b*x + c, 0)
sol = solveset(eq, x)
print(sol)
FiniteSet(-b/(2*a) - sqrt(-4*a*c + b**2)/(2*a), -b/(2*a) + sqrt(-4*a*c + b**2)/(2*a))
sol.subs({a: 1, b: 2, c: 3})
FiniteSet(-1 + sqrt(2)*I, -1 - sqrt(2)*I)
x, a, b, c, d, e, f = symbols('x a b c d e f')
eq = Eq(e*x**4 + d*x**3 + c*x**2 + b*x + a, 0)
solveset(eq, x)
x, a, b, c, d, e, f = symbols('x a b c d e f')
eq = Eq(f*x**5 + e*x**4 + d*x**3 + c*x**2 + b*x + a, 0)
solveset(eq, x)
ConditionSet(x, Eq(a + b*x + c*x**2 + d*x**3 + e*x**4 + f*x**5, 0), Complexes)
Let's try some linear algebra.
Let's invert it:
outputs:
m = Matrix([[1, 2], [3, 4]])
m**-1
Matrix([
[ -2, 1],
[3/2, -1/2]])
This section is about the file: python/sympy_cheat/logarithm_integral.py
python/sympy_cheat/logarithm_integral.py
#!/usr/bin/env python3
from sympy import *
x = symbols('x')
myli = integrate(sympify(1)/ln(x), x)
# It recognizes our definition as its own li! Beauty.
assert myli.equals(li(x))
for r in range(-2, 2):
for i in range(-2, 2):
print(f'{r} {i} {li(r + i*I).evalf()}')
Huge respect to this companies.
E.g. showing live data from a scientific instrument! TODO:
- superuser.com/questions/825588/what-is-the-easiest-way-of-visualizing-data-from-stdout-as-a-graph
- unix.stackexchange.com/questions/190337/how-can-i-make-a-graphical-plot-of-a-sequence-of-numbers-from-the-standard-input
- stackoverflow.com/questions/44470965/how-can-you-watch-gnuplot-realtime-data-plots-as-a-live-graph-with-automatic-up
- stackoverflow.com/questions/14074790/plotting-a-string-of-csv-data-in-realtime-using-linux
- stackoverflow.com/questions/11874767/how-do-i-plot-in-real-time-in-a-while-loop-using-matplotlib
By Ciro Santilli.
It does a huge percentage of what you want easily, and from the language that you want to use.
Couldn't handle exploration of large datasets though: Survey of open source interactive plotting software with a 10 million point scatter plot benchmark by Ciro Santilli
Examples:
- matplotlib/hello.py
- matplotlib/educational2d.py
- matplotlib/axis.py
- matplotlib/label.py
- Line style
- Subplots
- matplotlib/two_lines.py
- Data from files
- Specialized
Tested on Python 3.10.4, Ubuntu 22.04.
Tends to be Ciro Santilli's first attempt for quick and dirty graphing: github.com/cirosantilli/gnuplot-cheat.
domain-specific language. When it get the jobs done, it is in 3 lines and it feels great.
When it doesn't, you Google for an hours, and then you give up in frustration, and fall back to Matplotlib.
Couldn't handle exploration of large datasets though: Survey of open source interactive plotting software with a 10 million point scatter plot benchmark by Ciro Santilli
CLI hello world:
gnuplot -p -e 'p sin(x)'
A glitch is more precisely a software bug that is hard to reproduce. But it has also been used to mean a software bug that is not very serious.
Debugging sucks. But there's also nothing quite that "oh fuck, that's why it doesn't work" moment, which happens after you have examined and placed everything that is relevant to the problem into your brain. You just can't see it coming. It just happens. You just learn what you generally have to look at so it happens faster.
Related:
This is a simple hierarchical plaintext notation Ciro Santilli created to explain programs to himself.
It is usuall created by doing searches in an IDE, and then manually selecting the information of interest.
It attempts to capture intuitive information not only of the call graph itself, including callbacks, but of when things get called or not, by the addition of some context code.
For example, consider the following pseudocode:
Supose that we are interested in determining what calls
f1() {
}
f2(i) {
if (i > 5) {
f1()
}
}
f3() {
f1()
f2_2()
}
f2_2() {
for (i = 0; i < 10; i++) {
f2(i)
}
}
main() {
f2_2()
f3()
}
f1
.Then a reasonable call hierarchy for
f1
would be:
f2(i)
if (i > 5) {
f1()
f2_2()
for (i = 0; i < 10; i++) {
f2(i)
main
f3
f3()
main()
Some general principles:
- start with a regular call tree
- to include context:
- remove any blank lines from the snippet of interest
- add it indented below the function
- and then follow it up with a blank line
- and then finally add any callers at the same indentation level
One of the Holiest age old debugging techniques!
Git has some helpers to help you achieve bisection Nirvana: stackoverflow.com/questions/4713088/how-to-use-git-bisect/22592593#22592593
Obviously not restricted to software engineering alone, and used in all areas of engineering, e.g. Video "Air-tight vs. Vacuum-tight by AlphaPhoenix (2020)" uses it in vacuum engineering.
The cool thing about bisection is that it is a brainless process: unlike when using a debugger, you don't have to understand anything about the system, and it incredibly narrows down the problem cause for you. Not having to think is great!
Nirvana!!!
What it adds on top of reverse debugging: not only can you go back in time, but you can do it instantaneously.
Or in other words, you can access variables from any point in execution.
TODO implementation? Apparently Pernosco is an attempt at it, though proprietary.
Just add GDB Dashboard, and you're good to go.
The best open source implementation as of 2020 seems to be: Mozilla rr.
- stackoverflow.com/questions/1206872/go-to-previous-line-in-gdb/46996380#46996380
- stackoverflow.com/questions/1470434/how-does-reverse-debugging-work/53063242#53063242
- stackoverflow.com/questions/3649468/setting-breakpoint-in-gdb-where-the-function-returns/46116927#46116927
- stackoverflow.com/questions/27770896/how-to-debug-a-rare-deadlock/50073993#50073993
- stackoverflow.com/questions/522619/how-to-do-bidirectional-or-reverse-debugging-of-programs/50074106#50074106 link only, marked as duplicate of go to previous line
- softwareengineering.stackexchange.com/questions/181527/why-is-reverse-debugging-rarely-used
Proprietary extension to Mozilla rr by rr lead coder Robert O'Callahan et. al, started in 2016 after he quit Mozilla.
TODO what does it add to
rr
?GDB Nirvana?
The musical study of software engineering.
Ciro Santilli is obsessed by those in order to learn any new concept, not just for bug reporting.
This includes to learn more theoretical subjects like physics and mathematics.
Evil company that desecrated the beauty created by Sun Microsystems, and was trying to bury Java once and or all in the 2010's.
Their database is already matched by open source e.g. PostgreSQL, and ERP and CRM specific systems are boring.
Oracle basically grew out of selling one of the first SQL implementations in the late 70's, and notably to the United States Government and particularly the CIA. They did deliver a lot of value in those early pre-internet days, but now open source is and will supplant them entirely.
Although Ciro Santilli is a bit past their era, there's an aura of technical excellence about those people. It just seems that they sucked at business. Those open source hippies. Erm, wait.
Bibliography:
- archive.org/details/sunburstascentof00hall Sunburst: the ascent of Sun Microsystems by Mark Hall (1990)
Video "1984 Macintosh advertisement by Apple (1984)" comes to mind.
TODO year. This was a reply to Microsoft anti-Linux propaganda it seems: www.ubuntubuzz.com/2012/03/truth-happens-redhats-legendary-reply.html
Trascript from: www.dailymotion.com/video/xw3ws
The world is flat. Earth is the centre of the universe. Fact - until proven otherwise.
Despite ignorance. Despite ridicule. Despite opposition. Truth happens.Despite ignorance.
The telephone has too many shortcomings to be seriously considered as a means of communication. /Western Union 1876/
In 1899 the US Patent Commissioner stated, everything that can be invented has been invented.Despite ridicule.
The phonograph has no commercial value at all. /Thomas Edison 1880/
The radio craze will die out in time. /Thomas Edison 1922/
The automobile has practically reached the limit of its development. /Scientific American 1909/Despite it all truth happens.
Man will not fly for fifty years. /Orville Wright 1901/
The rocket will never leave the Earth's atomosphere. /New York Times 1936/
There is a world market for maybe five computers. /IBM's Thomas Watson 1943/
640K Ought to be enough for anybody. /Bill Gates 1981/First they ignore you...
Linux is the hype du jour. /Gartner Group 1999/Then they laugh at you...
We think of linux as competitor in the student and hobbyist market. But I really don't think in the commercial market we'll see it in any significant way. /Bill Gates 2001/Then they fight you...
Linux isn't going away. Linux is a serious competitor. We will rise to this challenge. /Steve Ballmer 2003/Then you win... /Mohandas Gandhi/You are here.
Red Hat Linux. IBM.
Please, use AsciiDoc and one page to rule them all.
The mandatory xkcd: xkcd 927: Standards.
Of course, "Ciro Santilli" with quotes, since all of those are either taken directly from others, or had been previously formulated by others.
Some anecdotes.
Ciro Santilli never splits up functions unless there is more than one calling point. If you split early, the chances that the interface will be wrong are huge, and a much larger refactoring follows.
If you just want to separate variables, just use a scope e.g.:
int cross_block_var;
// First step.
{
int myvar;
}
// Second step.
{
int myvar;
}
Ciro has seen and had to deal with in his lifetime with two projects that had like 3 to 10 git separate Git repositories, all created and maintained by the same small group of developers of the same organization, even though one could not build without the other. Keeping everything in sync was Hell! Why not just have three directories inside a single repository with a single source of truth?
Another important case: Linux should have at least a C standard library, init system, and shell in-tree, like BSD Operating Systems, as mentioned at: Section "Linux".