The Deep Web: Big Data As A Risk Management Tool

article2_131289851By Rick Draper

Managing security-related risks requires information. To manage those risks effectively requires the right information; available at the right time; presented in a meaningful format. How do you get it?

Open source intelligence, or OSINT, has long been used in risk management by governments and big business around the world; well before the internet was even conceived. However, the World Wide Web has made ridiculous amounts of information available to everyone. It is just a matter of defining your requirements, and knowing what you want to do with the information you obtain so that you can turn that information into worthwhile intelligence.

The Internet in 3D
The internet, like most human sources of information, has a number of dimensions. Information that we are happy for others to know, and may even want to publicise; things that we are happy enough to share, but only in a qualified way; and, of course, those dark secrets that should never be publicly known. The three dimensions of the internet that are relevant to our search for data to inform risk management are the Daily Web, Deep Web, and the Dark Web.

The Daily Web, or more commonly referred to as the “surface web” or “indexable web”, is how most of us access content on the internet every day. If we are interested in finding information on a specific subject, we almost certainly turn to Google, Yahoo or Bing. There are over 45,000 Google searches every second!

Ever wondered how they determine what gets displayed in the search results and what does not? In order for a link to a webpage to be shown at all in the search results, it had to have been ‘indexed’ by the search engine. That means that someone told the search engine about the page directly (there are techniques for this), or the page was available as a link on another page that was itself indexed. There is a whole industry around getting pages indexed and appearing first in the search results – Google “Search Engine Optimisation” if you are interested.

The Deep Web, is a term coined by the founder of BrightPlanet, a company that pioneered harvesting data from the internet. The “Deep Web” is that layer of the internet that is not indexed or accessible directly via the regular search engines. It is impossible to reliably estimate the size of the Deep Web, but it is thought to be many thousands of times the size of the surface web. It includes, for example, web content accessible only after logging into paid subscription or membership accounts – such as www.asisonline.org, asial.com.au and www.spaal.com.au.

There are many approaches that can be used to restrict Deep Web content from being indexed by publicly available search engines, or accessed by links in other webpages that seek to expose the content (a technique called deep linking). However, it is important to note that just because content has not been indexed, does not mean that it is not able to be harvested using tools designed for the purpose. There are techniques available to unlock a substantial amount of Deep Web content to supplement OSINT from the surface web – some of which will be discussed later in this article.

The Dark Web, is sometimes referred to as a subset of the Deep Web; and to the extent that the content in the Dark Web is not indexed by commercial search engines, this is valid. However, the Dark Web is actually (and intentionally) more difficult to access, unless you know the techniques needed to reach the content. For example, a publicly accessible webpage might appear innocuous enough on first inspection, but clicking on a part of the page may reveal an otherwise hidden form field. Entering a valid passphrase into that form causes the content on the page to change completely, revealing the hidden secrets. Welcome to the Dark Web, where all manner of content is hidden in plain sight.

Some Dark Web content is even further obscured through anonymising networks, such as the TOR (The Onion Router) Network. TOR uses a series of virtual tunnels to conceal information about both the user and the website, which would otherwise be available over conventional internet routing. TOR was originally conceived and deployed for US military purposes, but is now known to be widely used to provide the anonymity sought when engaging in illegal or otherwise questionable activities.

Law enforcement agencies invest a great deal of time and effort tracking down the dark websites that are, in effect, supporting crime. Probably the most famous of those taken off line by the FBI was Silk Road, which appeared again not long after that as Silk Road 2. As you might expect, these sites facilitate the sale of illegal drugs and firearms, distribution of trade secrets, and money laundering, along with enabling horrific examples of human depravity. However, there are other dimensions of the Dark Web that are important to understand from a corporate security perspective, including issues around the sale of counterfeit goods and diverted/stolen property, and fraud. These will be discussed in more detail below.

It would be remiss not to point out that there are legitimate uses for TOR and the dark web; they just tend to be overshadowed by the nefarious activities that it supports.

What is Big Data and what is in it for me?
The term “Big Data” is used to describe a collection of information that is so large or complex that it becomes challenging to process and use in a meaningful way. What is “big” to some organisations may be routine for others, so the term is context dependent. For the purposes of this discussion, big data is simply a collection of large, and mostly unstructured, datasets that have the potential to reveal linkages and relationships that can aid understanding and inform further analysis.

The data and information available on all three dimensions of the internet comes in many forms, including traditional websites and web-enabled databases, through to streams of social media. Most of these sources are being used to varying extents by organisations in managing security-related risks. But there are even more benefits to be gained by leveraging insights available through harvesting data from multiple sources.

Public and private sector organisations are taking advantage of deep web harvesting services that effectively operate in parallel with the traditional search engines. These services go much further than indexing webpages, by actually extracting surface web and deep web content so that it can be analysed on the fly and comparisons made of content over time. In some cases, it is even possible for these services to harvest content from dark web sources.

The risk management uses for this type of big data collection and analysis range from detecting potential disease outbreaks so that staff travel advisories can be issued, through to identifying related websites selling counterfeit designer brand products, so that targeted action can be taken. The power of being able to pull data from literally tens of thousands of sources on a daily, or even hourly, basis should not be underestimated.

The key to being able to use all this information effectively is inherent within the stages of the traditional intelligence cycle:

  1. Planning and direction.
  2. Collection.
  3. Processing/collation.
  4. Analysis and production.
  5. Dissemination.

While you might not know what information you are actually going to get, having a very clear plan and direction for harvesting the data is essential. When it comes to big data, the value comes through semi-automating analytics and the application of visualisation tools that enable the important relationships to be brought to the attention of a human who can take the analysis further.

While big data is not something that every security manager needs for their risk management program, it is important to become familiar with what all three dimensions of the internet have to offer.

What about Social Media?
The next article in this series deals specifically with social media and how it can be used in security risk management. For now, it should be recognised that for the most part, social media is a part of the deep web that can provide insights into a wide range of security-related risks.

Many organisations now require staff, and even contractors, to provide details of their social media accounts. Naturally, there are privacy concerns that are often raised in relation to such requirements, but the fact remains that utterances by staff on social media can have implications for employers.

Security managers should familiarise themselves with social media and deep web harvesting tools that they can use tactically for situational awareness, and as sources for OSINT to support their risk management programs. In employing any of these strategies, it is essential that effective policies and procedures be developed and implemented to ensure that OSINT and social media are leveraged in the most effective manner, while having regard to human resource and reputational risks.

Rick Draper is the Principal Advisor and Managing Director at Amtac Professional Services Pty. Ltd. Rick has over 30 years of experience in the security industry – the last 21 years as a consultant. He is also an adjunct senior lecturer in security management and crime prevention at Griffith University, and a member of the ASIS Loss Prevention and Crime Prevention Council. Rick Draper has been involved in the development of web-applications and data management since the 1990s, including the development of a range of tools to assist security managers. You can contact him on rick.draper@amtac.net