— Free Report

Tracking Stolen Data on the Dark Web

By Adam Ely, IANS Faculty

The Challenge: Finding and Removing Stolen Data Before Clients or Media Catch Wind

The security team for a large health insurer wants to start monitoring the web to alert it if/when the company’s sensitive customer data shows up on unapproved sites or the dark web. The issue came to light shortly after an internal incident response (IR) tabletop exercise highlighted this deficiency in its process. The team would like not only to be alerted when stolen data is found, but also be able to quickly suppress such data, before it is discovered by clients or publicized by the media. Specifically, the team asks:

  • If our company is breached and hackers put our stolen data on the web, how can we locate it and how can we ensure it gets taken down quickly?
  • Do any firms specialize in suppressing breached protected health information (PHI) or personally identifiable information (PII)?

A Very Under-Served Space

This is currently an under-served space by vendors because it’s a relatively new concern. Even many progressive, forward-leaning organizations haven’t yet focused on this issue, although some are starting to acknowledge it. Since there’s not much out there in terms of products and services, security organizations are having to cobble together a few different strategies to address this.

Watch for Data Leaks

Some services, such as PwnedList.com, will help you monitor for data leaks. PwnedList searches data posted on the dark web, indexes it and lets you subscribe to search for your particular records. It’s a good place to start because if you have an email address used within the company, you can run queries on it, and if you get hits, it’s an indicator you’ve been compromised. Another service is offered by Hold Security. It searches the dark web forums and deep web sites, gathers the data, indexes it and then searches for indicators that could identify a company. In the past, these services were marketed primarily to individuals, but some are beginning to address the corporate market.

The challenge with all of these services, however, is that the records posted on the dark web are usually up for sale, and hackers don’t tend to post the whole data set or release a lot of information about where the data originated. For example, hackers might post that they have a database of medical records and social security numbers from people in three states, but they might not disclose enough to identify which company that data came from. It’s pretty challenging, then, to find this stuff before it hits the real web, but those services can help.

Leverage Your DLP

Some organizations that have data loss prevention (DLP) tools in place are leveraging them to help discover lost data. For example, tools like Bishop Fox’s Google Hacking Diggety Project will let you search the web in an automated fashion, looking for specific patterns. If you are using DLP, this search is made easier because most DLP tools require that data be in a certain, unique format that can then be searched.

Try a Honey Token

Many organizations are beginning to seed their sensitive internal databases with fictitious accounts — or honey tokens — that can help serve as an early alert system and validate that a certain data set did indeed come from you. This is used primarily in customer-facing portals. To attackers, the fake accounts look real. They have a login history, have been active within the past 30 days, etc., but they pose no real risk.

Once a database is seeded with the fictitious accounts, security staffers can then monitor the associated email addresses for incoming email, login attempts, etc. If someone tries to log in using your fake portal credentials or if three or four of the fictitious accounts suddenly start getting spam email, you know you’ve probably had a data breach.

There are a few caveats, however. For example, you wouldn’t want to insert fictitious records across every single database, since that gets messy quickly. Not all data is equally sensitive, and if you have certain data sets you know analysts constantly run reports on, you may not want to place fictitious data there.

Some companies work around this issue by creating the fictitious records across a certain range of customer IDs and then excluding those ID ranges when running reports. Others don’t worry about excluding the data, since an extra 100 records in the scheme of things won’t tip the balance either way. The fictitious records don’t affect data processing so they ignore them because they know it will have no negative effect.

Still, it’s a good idea to think about the full lifecycle of that data and how it’s used internally.

You should also focus on the data and data fields that hackers find most attractive. Today, hackers know that if they try a full database dump during a breach, most companies’ IDS/IPS will pick it up fast and they’ll be out of luck. Smart attackers tend to siphon off the data slowly because it’s far easier to exfiltrate small data sets without being noticed. When seeding the data, you should focus on the more lucrative fields, such as the name, address, login credential, group ID number, etc. Random medical codes will be stolen far less often.

Search Unique Identifiers

Instead of using fictitious data in a honeypot, some other companies try to use real data unique to the data set when searching the dark web. There are two schools of thought here. Some argue that since it’s a unique piece of data that nobody knows about, it’s a valuable indicator and can be used. Others say that if you do enough searches over time, those data points get indexed and someone could piece them together and somehow associate it with a certain person or company. For example, how comfortable would you feel continually searching Google for your name and the last four digits of your social security number? Probably not very.

However, if you have an indicator that can’t be traced back to a real person — a header in a file or some code indicator — you can probably use it and search it. If it’s relatable to a real person, however, you probably shouldn’t. It’s an edge case, but something to be aware of.

Suppressing Data Once It’s Posted

There are very few companies or services focusing on the suppression side of this equation. Some IR companies, like Mandiant, do this 17 times a day and have put some real resources behind it. They have folks who know who to contact at Pastebin, Google, Yahoo, etc., and can tell you what the legal process is. Sometimes, they will actually handle parts of it for you, although they typically will push most of it back on your legal team.

If you have an IR firm on retainer, you should bring them in as part of your IR planning process and use their expertise to write up a playbook. If they say you should notify Google, ask them for the right email and contact information, etc. Find out how to go to the Department of Justice (DoJ) or get a subpoena and have that all spelled out. Create templates for the whole process so that if a breach happens and your data gets out there, you have a real playbook and can just plug in the right players to contact.

One thing to realize is that if you’ve had a breach, your legal team will be busy. If you can, assign a member of legal ahead of time to spearhead the suppression effort and lead the fight to have the data taken down. If your legal department doesn’t have that bandwidth, it’s a good idea to retain outside counsel, especially with a legal firm that specializes in technology and data loss. Have that relationship in place, so you can have them at the ready.

It’s also wise to build relationships with law enforcement ahead of time, especially the local FBI. If something goes wrong, the FBI really wants to help and can quickly get you in touch with the right people.

Overall, the suppression process is complex, and many web companies won’t pull down data because they have to toe the line on free speech. You really have to struggle with some ISPs, especially those in the dark web. And if you’re dealing with ISPs across multiple jurisdictions, it gets even more difficult and time consuming. Most web companies won’t even talk to you unless they have a court order, a police report or a subpoena for the data.

Be Careful When Threatening Lawsuits

Some companies decide that their first option is to sue the company hosting the stolen data, but that can be tricky. I’ve seen sites respond very differently to legal threats. For example, when I worked at Salesforce.com, sometimes our customers would host stolen data. Victims would contact us and say that while they know we don’t own the data and had no part in the issue, we shouldn’t be allowing it and they would threaten legal action. Salesforce.com is not a very litigious company, but when faced with threats like that, it would pull out all the stops and counter-sue like crazy.

When I worked at Disney, the response was different. Disney sues people all day long for misuse of things like the Mickey Mouse trademark, but if a victim approached it saying some data was maliciously posted in a game or something Disney hosted, Disney would work to ensure the matter was resolved expediently.

Creativity Required

Overall, finding and removing stolen data posted to the dark web is a difficult challenge. Not many vendors are focusing on the issue yet, so security organizations need to get creative in how they go about solving it. For now, combining vendor services like PwnedList and Hold Security with homegrown strategies like honey tokens and suppression playbooks, is the best bet.

Any views or opinions presented in this document are solely those of the Faculty and do not necessarily represent the views and opinions of IANS. Although reasonable efforts will be made to ensure the completeness and accuracy of the information contained in our written reports, no liability can be accepted by IANS or our Faculty members for the results of any actions taken by the client in connection with such information, opinions, or advice.

Subscribe Now for Email Updates

We’ll send you short and sweet notifications about our content and events.