wget -O all.html
cp all.html all-recode.html
recode html..ascii all-recode.html
awk '!seen[$0]++' all-recode.html > all-uniq.html
awk to skip the gazillion "mined by message" repeats.
A lot of in that website stuff appears to be cut up at the 20 mark. As shown in Force of Will, this is possibly because they didn't use -w in strings -n20, and the text after the newlines was less than 20 characters.
That website can be replicated by downloading the Bitcoin blockchain locally, then:
cd .bitcoin/blocks
for f in blk*.dat; do strings -n20 -w $f | awk '!seen[$0]++' > ${f%.dat}.txt; done
tail +n1 *.txt
Remove most of the binary crap:
head -n-1 *.txt | grep -e '[. ]' | grep -iv 'mined by' | less