[PSA] Warning: What Memex Crawlers are doing on the Dark Web (and what they can find)

So, I have seen a few posts about the timing attacks revealed at Hack in the Box this month, however what is really surprising to me is that no one has mentioned any of the news related to the MEMEX search engines. While there are posts from April when these got first announced, two days ago one named punkSPIDER finished crawling all of the tor hidden services (no that isn't a typo). The surprising news about this? It only took them 7 hours to do so, and there are only around 7000 of them.

This information was received from a Forbes article posted two days ago Link: http://www.forbes.com/sites/thomasbrewster/2015/06/01/dark-web-vulnerability-scan/

PunkSPIDER is one of the many DARPA (aka US government) backed programs under the name Memex which aim to create more intelligent web crawlers, originally intended for the Deep Web. These are the that more obscure websites that get missed by major search engines like Google. (For example NASA is backing one of the Memex crawlers to look for any rocket designs that might be beneficial to them) PunkSPIDER is one of the more interesting and controversial crawlers because of what it does it. PunkSPIDER is automated to search through websites, along with pages on them, 'poking' them and looking for common (and some not so common) vulnerabilities such as SQLi XXS etc. The controversial part about this crawler is that it posts all of the found weaknesses onto a public searchable database. This is to make sure that some organizations (cough cough NSA) don't keep theses holes in security to themselves for their own benefit, as well as limiting personal gains of hackers by making the holes public. Also, by posting them online the websites are motivated to patch and fix them.

This is where the story gets interesting.Two days ago they reworked their engine in order to crawl through the Hidden Services on tor. Now the Forbes article mentions that they crawled through the entirety of the Hidden Services in just 7 hours. However, this isn't accurate as they stated that only 2100 responded to the https requests, so only those were searched ( about a third, but whether these other 5000 are simply offline for good or they just weren't available at the time is unclear). Either way, they ended up finding 50 sites with vulnerabilities and about 100 flaws. As stated in the article the programmer of PunkSpider says that this is much lower than normal clearweb services. While he suspects this is because many of them are simple static HTML pages, I want to give a nice shoutout to OPSec and security measures of the designers of the pages as well, because I believe this is also a reason for the low percentage of vulnerabilities.

The scary part of this however, is while the PunkSPIDER community website gives a searchable database for the vulnerabilities found on the clear web, these security holes found on the Dark Web have been "filtered" for the time being until they decide what to do with the information. This means that they know 50 .onion websites with *significant** vulnerabilities, and they can do whatever they want with them*

The programmer already stated that they know "at least one" [emphasis added] .onion site (related to kitten orgies some type of "weird child porn," that they "don't want the website administrator to fix...before someone in law enforcement hacks it" While I assume most of us on here can agree that an intense child porn website should be taken down by LE (unless it's Bailey Jay as a kid) this leaves them in a situation where they can play "Judge" of the Dark Web websites when it comes to the vulnerabilities deciding which ones to alert admins of, which ones to hand over to LE and which ones to post on their community searchable database (putting a target on their site second only to Mr. Nice Guy's at the moment)

While I think that PunkSPIDER is a great start in the right direction for exposing vulnerabilities, I don't believe that they should be able to pick and choose which sites they want to turn in, or turn the hackers onto.

My opinions aside I wanted to post this in order to inform the members of our community about a possible security issue that hasn't been brought up yet.

TL:DR: PunkSPIDER crawls websites to search for vulnerabilities, and two days ago searched the Tor Hidden Services. They found 50 sites (out of the 2100 they searched (out of the ~7000 total .onion sites)) that had about 100 flaws in total. They are not releasing the flaws on their database like they do with clearweb sites, and have already admitted to plans of turning over flaws of at least 1 site (a bad child porn site) to LE.

Edit: Formatting


Comments


[8 Points] set-the-record:

It only took them 7 hours to do so, and there are only around 7000 of them.

I just want to set the record straight a little bit. This crawler DID NOT crawl all hidden services. They crawled all the published hidden service addresses they could find online. I run about 20 hidden services that are not published, and not one had any access over the last two weeks. That's right, not one request over the last two weeks.

This was posted in /r/tor in reference to crawling ALL hidden services by /u/Fuck_the_admins:

"All"

Onion addresses are 16 digit base32. That's more than 1.2 Septillion possible addresses (1,208,925,819,614,629,446,642,046) to scan.

They may have scanned a number of publicly advertised hidden services, but they have certainly not scanned all HSs on tor.

As with most things, the kids of /r/darknetmarkets are a few days late and just run with the headlines like they're not sensationalized. Get a clue.


[6 Points] alex_from_punkspider:

Hi There,

I'm the original creator of PunkSPIDER and I wanted to clarify a few annoying pieces of misinformation here (mostly in comments).

First of all PunkSPIDER is totally 100% self-funded, the only outside funding we received was from a Kickstarter campaign a few years ago. This is not and has never been funded by DARPA. My company, Hyperion Gray, happens to do work for DARPA related to web crawling (yes some of it for hidden services) but PunkSPIDER is not a part of it.

Second, regarding which sites we choose to share with law enforcement, we're only sharing one. This is a child porn site, I think we can all non-pieces-of-shit here get behind this decision. All other vulnerabilities will be shared publicly, as is the mission of our project.

Regarding the scan, it is in fact from a big lookup of published hidden services and a crawl out from them. Yes there's probably a better methodology, this is still early work in this domain.

No we didn't just run a standard web app pen testing tool, we wrote our own, it's on github and it's called MassWeb.

No we're not part of the NSA.

We're really open about this project, all the way down to sharing the source code. Please feel free to ask me anything about it!

Alex


[3 Points] None:

Well thats shitty


[5 Points] moosealerter:

DARPA (aka US government)

I want to clarify this a bit, especially for the non-US people here. DARPA is the Defense Advanced Research Projects Agency. They are part of the Department of Defense, and have a HUGE budget - about $2B last time I looked.

Their charter is to research "blue sky" tech that might someday benefit the DoD, and to a lesser extent the world at large. The way that they work is to issue a "Broad Agency Announcement" for research in a particular area. University and private researchers bid and get awarded to do research in that area. Sometimes military researchers get funded too, but that is infrequent, mainly because Darpa doesn't think military researchers are very good. Darpa itself is quite small - mainly just the program managers and support staff. Interestingly enough, most of Darpa's PMs are university professors who work there for a few years and then go back to academia. That's mainly because you can't get the expertise you need from government workers, and also because they want a lot of turnover.

Like most research, the vast amount of what Darpa does never amounts to anything.

Trivia: In X-Files, Fox Mulder found a secret underground tunnel from the Pentagon to Darpa. That would be a long freaking tunnel - Darpa is in Ballston, VA.

I used to work alongside Darpa in a sister agency. If anyone has any question, AMA.


[3 Points] MarkMerrill1102:

Is this crawler open source?


[2 Points] darknetpotter:

Concerning. We ultimately have to put our trust in the markets though, so lets hope none of the decent ones have any vulnerabilities to start with. And let's hope PunkSPIDER aren't dicks and only report harmful sites like CP, maybe weapons/hitmen services type stuff rather than drug markets.


[2 Points] fainting_g0at:

Just think about this for a second. Now just think about what we can expect the NSA to have.


[2 Points] thenewproblem:

DARPA also invented TOR no? Seems only logical they would give us a way to search TOR?


[1 Points] youtakesally:

Where will this list be posted?


[1 Points] We_Are_Never_Safe:

-comment overwritten-


[1 Points] TotesMessenger:

I'm a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:

If you follow any of the above links, please respect the rules of reddit and don't vote in the other threads. (Info / Contact)


[1 Points] thegreatestescape1:

Sounds like they just ran your standard web app pen testing tool. I bet 9/10 of those vulns are class positives that they never further investigated.


[1 Points] william_junior:

Ok. But as a market admin, the crawling part isn't really what you're interested in, is it. The pen testing is. And if you manage to point any given good pen testing software at your site, through tor, and fix what it brings up you shouldn't be worried too much I guess.

Of course, any such effort is again prone to decloaking attacks. So obviously care needs to be taken about where to launch such tests from.

Edit: on second thought, you don't even need tor. Copy your site - minus the data - to another location. And if the pen tester is open source just run it on localhost. Lock down (as in disconnect) the machine so no data is leaked. Collect the report. Then wipe the thing. Go fixing. Just a thought.


[1 Points] fainting_g0at:

This is the kind of thing that spooks dark site operators into exit scamming. If your still keeping your coins on a site wallet pull them now.


[1 Points] None:

Why the fuck are rocket designs being posted to the inter....you know what, nevermind.


[0 Points] consumerrr:

Disclaimer: I know nothing about this stuff, so this might be a stupid question. Aside from ongoing activity in a certain moment (say, people logging into a market in a given timeframe that's monitored) what are the chances something like this could uncover data that's been wiped (like the address you've given for an order) or stuff from the past (all the bitcoin transactions that have originated in a legit clearnet accounts and end up in a sellers wallet)?