The HTML from the index page of Wayback Machine were:
- dumped at: github.com/cirosantilli/media/tree/master/cia-2010-covert-communication-websites/html
- downloaded with: github.com/cirosantilli/media/tree/master/cia-2010-covert-communication-websites/download-html.sh. Note that there were many supurious errors notably:we just ran it multiple times until all errors were gone.
OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to web.archive.org:443
The best way to analyse the HTML is to grap our dumps from: github.com/cirosantilli/cia-2010-websites-dump.
Some possibly interesting searches include:
- list all HTML comments, maybe something spicy was left over:
git grep '<!--'
- search for weird file extensions:
git ls-files | grep -Ev '\.(jpg|gif|html|txt|png|css|php|js|jar|cgi|htm|swf|ico|JPG|class|zip|sf)'
- have a look at the largest folers:
ncdeu
Some of the HTML files contain conditional comments e.g. web.archive.org/web/20091023041107/http://aquaswimming.com/ contains:
<!--[if IE 6]> <link href="swimstyleie6.css" rel="stylesheet" type="text/css"> <![endif]-->
Varios of the non-English websites seem to have comments translating the content e.g.:This feels like it could be the translation helping the technical webdev team know what is what.
./noticiasmusica.net/20101230165001/index.html:<h2>Alguns dos Melhores Sites Nacionais</h2><!--some of the best national sites (in music)-->
Many of the RSS frame pages use:which is a weird HTML tag that would lead all links to open on new tabs, e.g. web.archive.org/web/20110202124411/http://thecricketfan.com/home.html.
<base target="_blank" />
Various websites have pages with .php extension. It feels likely that all websites were written in PHP.
Some sites use a
feeds.php
for the feeds, e.g. http://www.absolutebearing.net//absolutebearing_feeds/feeds.php?src=http%3A%2F%2Ffeeds2.feedburner.com%2FOceanyachtsinfo&desc=1Some URLs existed both in HTML and .php extension, or were converted at some point:
allworldstatistics.com/20110207151941/comprehensivesources.html
allworldstatistics.com/20130818155225/comprehensivesources.php
A few of the PHP urls have weird IDs in them like we wonder what they mean.
omktf
, juqwt
and qlaqft
:./middle-east-newstoday.com/20100829004127/omktf/uirl.php?ok=461128
./newsandsportscentral.com/20100327130237/juqwt/eubcek.php?pe=747155
./pondernews.net/20100826031745/lldwg/qlaqft.php?fc=281298