“Bad data + biased algorithms = big delusion,” says David Venable

Adam Bannister

Editor, IFSEC Global

Author Bio ▼

Adam Bannister is editor of IFSEC Global. A former managing editor at Dynamis Online Media Group, he has been at the helm of the UK's leading fire and security publication since 2014.
February 2, 2017

Get the IFSEC Global newsletter

The latest security and fire news, reports and resources


Download: A Technical Guide to Fire Detection and Alarm Systems

david venableFormer US National Security Agency (NSA) intelligence officer David Venable has extensive experience in computer network exploitation and cryptography, among other cybersecurity disciplines.

Now VP of cybersecurity at Masergy Communications, he is responsible for protecting global network infrastructure and advising multinational companies about protecting their digital assets.

We met up with Venable to discuss his views on data privacy, the misuse of big data, the impact of machine learning, cybersecurity resilience in organisations big and small and his recent appearance at Black Hat Europe.

IFSEC Global: Hi David. Please tell us a bit about what you spoke about at Black Hat Europe…

David Venable: So the definition of big data is having so much data that you don’t know what to do with it, essentially. As we’re developing different techniques and ways to learn from it, we’re starting to see some shifts, and some interesting results.

IBM put out a study that $3.1tn out of just the US was lost based on decisions made from bad data.

The problem is, the algorithms we use to then do something with that data are designed by people with biases, and these algorithms tend to reflect their creator’s biases

There are a number of ways this could happen. One of the apparently less interesting areas within data science is how to properly parse this data and put it together.

But it could come from corrupted data, bad sources and so on. It can also come from bad conclusions drawn from the data.

Cathy O’Neil published a book recently called Weapons of Math Destruction… People tend to rely on data as though these are the facts.

The problem is, the algorithms we use to then do something with that data are designed by people with biases, and these algorithms tend to reflect their creator’s biases. And there are a number of ways this could happen, many of which are so subtle that they’re very hard to detect.

So the typical big data formula is a bunch of data plus a bunch of algorithms equals big data. But based on real world results, we’re looking at bad data and biased algorithms, which I’m terming ‘big delusion’.

And you can individually be impacted by this. You could be denied a loan. You could be connected to a terrorist cell. Things as simple or bad as that.

Which is an interesting position for humanity to be in: when everything you do is producing data, and all that data is collected and forwarded. So this was largely the topic of my talk, and then techniques you can use to prevent selected inaccuracies from being tied to public personas and things like this.

We’re still seeing a lot of the impacts of design with security tacked on as an afterthought

IG: Is a lot of that to do with managing your social media output?

DV: Everything from social media to driving to walking to which cell tower your phone is connected to.

What concerns me is, have you seen North by Northwest, the Hitchcock classic? Cary Grant’s character is at this conference having some coffee. He spills his coffee right at the time the intercom is announcing there is a phone call for a George Kaplan, who happens to be a non-existent person created by spies.

He happens to get up and walk towards the lobby phone as this announcement goes off. People who are trying to find out who this Kaplan person is then think it’s Cary Grant’s character, and this spirals into this really fantastic story of mistaken identity.

IG: It’s the age old assumption of Y followed X therefore X must have caused Y…

DV: Exactly. So this could easily happen today: this person was talking to these people, his cell phone was in this location at this time, and by piecing things together it would be fairly easy to make wildly inaccurate assumptions about someone.

IG: You worked for the NSA so I guess you’ve had experience with wrestling with these conundrums?

DV: That’s true to an extent. I’ll say that a lot of this technology didn’t exist when I worked there. I’d rather not go into much depth on that.

IG: So you’ve also done work with Fortune 500 companies haven’t you?

DV: Yes. Masergy provides professional services and consultancy. We look at a company’s security, whether in the form of a penetration test or looking at overall security programmes, and finding areas that can be corrected, improved, optimised.

We’re seeing behavioural analysis and machine learning applied to big data around security and some really sophisticated techniques for stopping these attacks 

IG: What would you say the biggest shortcoming is in computer systems in Fortune 500 or business generally?

DV: One of the biggest problems is not thinking of security from the ground up. This is starting to change, but we’re still seeing a lot of the impacts of design with security tacked on as an afterthought.

I’d say the other one is looking at security very individually and not from a holistic global perspective, and that’s starting to change as well.

IG: Most categories of physical crime has been falling steadily since the 70s, it very much feels like the criminals have the upper hand in the virtual world…

DV: I don’t think we’ll ever get the upper hand in that sense. When you’re small and fast it’s easy to do things. One or two hackers with sophisticated tools, a lot of mobility and being able to call the shots in engagement tend to have the upper hand.

On the other hand, business has far more resources. As we’re seeing behavioural analysis and machine learning applied to big data around security, we’re starting to see some really sophisticated techniques for stopping these attacks – if not before they happen, then certainly at the beginning stages.

And with the cyber kill chain, if you take out any one stage, then it prevents the exploitation or compromise.

IG: What’s the cyber kill chain?

DV: A series of steps that must occur by an attacker before they can perform a compromise. And it’s things like reconnaissance, identification of issues, ways to utilise those issues for the attacker’s benefit – things like that.

If you can stop any of those steps, the overall attack fails.

IG: So not too dissimilar to how you might plan a physical attack…

DV: Exactly the same.

IG: Going back to the earlier point about falls in traditional crimes, that is at least partly to do with advances in technology… The risk-benefit analysis for robbing a bank is very different now compared to 30-40 years ago because of CCTV etc…

DV: It’s one thing to walk into a liquor store with a gun, it’s fairly easy to be caught that way. But if you’re in a country that doesn’t care, routing your attack through four countries that won’t work together, to attack another country that has no visibility on who is doing this or why, it’s much harder to find them.

IG: So have you encountered physical security systems, like CCTV, access control and intruder alarms in your line of work?

DV: There’s lots of facial recognition software being applied to CCTV footage and there have been some creative ways to evade having your face recognised.

So there’s glasses that can not only evade having your face detected, you can actually make it detect someone else. It’s pretty interesting.

Then in the US, there have been some companies giving police departments, without charge, cameras to put on their police cars. They recognise licence plates, send this information off, see if it’s been stolen or if there are any outstanding warrants, then an alert pops up.

But what it does is create a data point that this vehicle was in that location at this time. So even without a GPS or any kind of LoJack or tracking, there’s still potential for your location to be used in this massive Orwellian [situation].

I’m a privacy advocate. I certainly don’t want government overstepping its bounds. But companies don’t really have those bounds. And they’re very free to do things government would never do

IG: Do you think concerns about state snooping are sometimes overwrought?

DV: The people I worked with in government I would trust implicitly.

Even if one in 100 people there would do negative things with the data [they collect], there is an amazing amount of oversight. If you abuse it you go to prison. What we’re seeing now is that companies are developing similar levels of access to personal data – but without that oversight. Without any real checks and balances, and that’s what scares me.

I am a privacy advocate. I certainly don’t want government overstepping its bounds. But companies don’t really have those bounds. And they’re very free to do things government would never do.

IG: So have you worked with small or medium-sized businesses as well as Fortune 500 blue chips?

DV: We’re starting to see more and small to mid-sized businesses becoming concerned with this.

And being able to afford world-class solutions. For a long time you couldn’t [if you were a small business].

That’s changed a lot. And a lot of that is due to machine learning techniques – not quite AI – and, certainly in our own products, a lot more automation that reduces the cost, which then enables small and mid-sized businesses to have similar security to enterprise [level businesses]. We’re seeing quite a lot of interest that just didn’t exist a few years ago.

IG: What are your thoughts on the internet of things? That’s essentially multiplying the vectors of possible cyber attack…

DV: This is a pretty fascinating topic. One, you have a lot more centralisation of the internet, which is the root DNS servers. And two, creating devices with weak security. This has been a major topic in the security industry for decades.

Now the question has been: Who is going to hack a refrigerator? Who will hack a light bulb? And now we have an answer.

The security industry has been saying this for years, but it’s easy to dismiss when you’re talking about a microwave oven. Now it’s happened I’m hoping it will serve as a wake-up call.

I’ll tie this to something similar that scares me. SCADA industrial control systems – the devices that control pipelines, bridges, dams, all this sort of thing – were designed with that security in mind, much like these IoT devices are. And a lot of it’s been out there and 30 years later we’re still dealing with problems associated with that.

So that’s my fear: that IoT could go that way, where 30 years from now we’re still dealing with the ramifications of bad security design. I would like to see that go a different route. Hopefully it will.

IG: A lot of developers bandy around the term AI a lot these days. Is that term being misused or overhyped in some situations?

DV: It is absolutely overhyped. In fact, my colleague Mike Stute, our chief scientist, gave a talk at Black Hat several months ago called ‘Breaking the Machine Learning Hype Cycle’. One of his main points was that when you hear someone talking about AI, you know something is off.

Machine learning has a lot more legitimacy.

IG Finally, going back to your Black Hat talk: do you have any practical tips on how to protect yourself against misuse or misinterpretation of your data?

DV: The main tip I can give your average user: use different passwords for different accounts. That will decrease the chance for your information to be exploited and targeted.

And I don’t mean use your pretty secure password for some types of accounts, and you’re really secure password for others. Use different passwords for each account.

This can be difficult, but there are a number of tools out there that make it easier.

The number two tip is use multi-factor authentication: where you type in your password, then you have a temporary password or something like that.

Free Download: the CyberSecurity Crashcourse

Are you even aware if you have been the victim of a cybersecurity breach? This report will help you to find out and protect yourself, Eric Hansleman from 451 Research presents a rapid-fire overview of cybersecurity, because a firewall just won’t do, you need multi-layered defences to truly protect your data.

Click here to download now

Related Topics

Leave a Reply

Be the First to Comment!

Notify of