The Wayback Machine has an endpoint to query cralwed pages called the CDX server. It is documented at: github.com/internetarchive/wayback/blob/master/wayback-cdx-server/README.md.
This allows to filter down 10 thousands of possible domains in a few hours. But 100s of thousands would be too much. This is because you have to query exactly one URL at a time, and they possibly rate limit IPs. But no IP blacklisting so far after several hours, so it's not that bad.
Once you have a heuristic to narrow down some domains, you can use this helper: cia-2010-covert-communication-websites/cdx.sh to drill them down from 10s of thousands down to hundreds or thousands.
We then post process the results of cdx.sh with cia-2010-covert-communication-websites/cdx-post.sh to drill them down from from thousands to dozens, and manually inspect everything.
From then on, you can just manually inspect for hist on your browser.
Ancestors
Incoming links
- CIA 2010 covert communication websites
- 2013 DNS census MX records
- 2013 DNS census NS records
- 2013 DNS census secureserver.net MX records intersection 2013 DNS Census virtual host cleanup
- 2013 DNS Census virtual host cleanup heuristic keyword searches
- Non .com .net TLDs
- Secure subdomain search on 2013 DNS Census
- Wayback Machine