Grepping the 2013 DNS Census first by overused CGI comms subdomains
secure.
and ssl.
leaves 200k lines. Grepping for the overused "news" led to hits:- secure.worldnewsandent.com,2012-02-13T21:28:15,208.254.40.117
- ssl.beyondnetworknews.com,2012-02-13T20:10:13,66.104.175.40
Also tried but failed:
sports
:- secure.motorsportdealers.com,2012-04-10T20:19:09,64.73.117.38 web.archive.org/web/20110501000000*/motorsportdealers.com
OK, after the initial successes in New results: only one...
secure.
, we went a bit more data intensive:- took all
secure.*
ssl.*
URLs in the 2013 DNS Census, 70k entries - cleaned up a bit, e.g. only
.com
or.net
. this left only, 30k entries only - lopped over all of them in archive CDX: Wayback Machine CDX scanning, searching for those that also end in
.cgi
web.archive.org/cdx/search/cdx?url=$domain&matchType=domain&filter=urlkey:.*.cgi&to=20140101000000. Took an afternoon, but no rate limit block. - this leaves about 1000, so we loop over all of them manually on web archive with a script, and opened any that had the pattern of very vew hits between 2010 and 2013 only, and on those check for visual/thematic style match. Careful not to make more than 15 requests per minute or else 5 min blacklist!
- 208.254.42.205 secure.driversinternationalgolf.com,2012-02-13T10:42:20,
After 2013 DNS Census virtual host cleanup heuristic keyword searches we later understood why there were so few hits here: the 2013 DNS Census didn't capture the
secure.
subdomains of many domains it had for some reason. Shame, because if it had, this method would have yielded many more results.