Ciro Santilli $$ Sponsor Ciro $$ 中国独裁统治 China Dictatorship 新疆改造中心、六四事件、法轮功、郝海东、709大抓捕、2015巴拿马文件 邓家贵、低端人口、西藏骚乱

Satoshi uploader

| nosplit | ↑ parent "How to extract data from the Bitcoin blockchain" | words: 662 | descendant words: 694 | descendants: 1
By "Satoshi uploader" we mean the data upload script present in tx 4b72a223007eab8a951d43edc171befeabc7b5dca4213770c88e09ba5b936e17 of the Bitcoin blockchain.
The uploader, and its accompanying downloader, are Python programs stored in the blockchain itself. They are made to upload and download arbitrary data into the blockchain via RPC.
These scripts were notably used for: illegal content of block 229k. The script did not maintain its popularity much after this initial surge up loads, likely all done by the same user: there are very very few uploads done after block 229k with the Satoshi uploader.
Our choice of name as "Satoshi uploader" is copied from A Quantitative Analysis of the Impact of Arbitrary Blockchain Content on Bitcoin by Matzutt et al. (2018) because the scripts are Copyrighted Satoshi Nakamoto on the header comment, although as mentioned at Hidden surprises in the Bitcoin blockchain by Ken Shirriff (2014) this feels very unlikely to be true.
A more convenient version of those scripts that can download directly from without the need for a full local node can be found at: by using the --satoshi option. E.g. with it you can download the uploader script with:
./ --satoshi 4b72a223007eab8a951d43edc171befeabc7b5dca4213770c88e09ba5b936e17
mv 4b72a223007eab8a951d43edc171befeabc7b5dca4213770c88e09ba5b936e17.bin
The scripts can be found in the blockchain at:
The uploader script uses its own cumbersome data encoding format, which we call the "Satoshi uploader format". The is as follows:
  • ignore all script operands and constants less than 20 bytes (40 hex characters). And there are a lot of small operands, e.g. the uploader itself uses format has a OP_1, data, OP_3, OP_CHECKMULTISIG pattern on every output script, so the OP_1 and OP_3 are ignored
  • ignore the last output, which contains a real change transaction instead of arbitrary data. TODO why not just do what with the length instead?
  • the first 4 bytes are the payload length, the next 4 bytes a CRC-32 signature. The payload length is in particular useful because of possible granularity of transactions. But it is hard to understand why a CRC-32 is needed in the middle of the largest hash tree ever created by human kind!!! It does however have the adavantage that it allows us to more uniquely identify which transactions use the format or not.
This means that if we want to index certain file types encoded in this format, a good heuristic is to skip the first 9 bytes (4 size, 4 CRC, 1 OP_1) and look for file signatures.
Let's try out the downloader to download itself. First you have to be running a Bitcoin Core server locally. Then, supposing .bitcon/bitoin.conf containing:
we run:
git clone git://
git -C python-bitcoinrpc checkout cdf43b41f982b4f811cd4ebfbc787ab2abf5c94a
pip install python-bitcoinrpc==1.0
BTCRPCURL=http://asdf:qwer@ \
  PYTHONPATH="$(pwd)/python-bitcoinrpc:$PYTHONPATH" \
  python3 \
worked! The source of the downloader script is visible! Note that we had to wait for the sync of the entire blockchain to be fully finished for some reason for that to work.
Other known uploads in Satoshi format except from the first few:
  • tx 89248ecadd51ada613cf8bdf46c174c57842e51de4f99f4bbd8b8b34d3cb7792 block 344068 see ASCII art
  • tx 1ff17021495e4afb27f2f55cc1ef487c48e33bd5a472a4a68c56a84fc38871ec contains the ASCII text e5a6f30ff7d43f96f61af05efaf96f869aa072b5a071f32a24b03702d1dcd2a6. This number however is not a known transaction ID in the blockchain, and has no Google hits.