Tracking cybersecurity threats with Machine Learning to spot patterns: part one

Most people are familiar with having antivirus software on their computer or mobile device. This is an obvious application of cybersecurity. Of course, other defences are needed, such as strong passwords, but an antivirus tool is there to tell the user if their system, program or app has been compromised by an attacker. Antivirus software works by looking for known virus patterns (“definitions”) in program files, hence it’s essential that the database of known patterns is kept up to date.

The disadvantage of such software is that it can only protect against threats which its developers know about. Therefore there is always a delay between the development of a new virus and its definition appearing in the database of threats and another delay before the virus definition database on the device is updated through a software update to the antivirus software.

There are classes of cybersecurity threats that are not covered by antivirus tools. Often programs contain vulnerabilities caused by poor design or poor engineering. As these vulnerabilities are discovered, updates to fix these problems are issued. That’s why it’s important to keep both systems and apps up to date.

For vehicles and other “real world” systems these kinds of weekly or monthly updates can be difficult. You don’t want your car to start a system update when you are driving!

In Secure-CAV, we are taking a different approach. We start with the assumption that a cyber attack is trying to make a vehicle do something “abnormal”. Therefore, we try to build a picture of what constitutes normal behaviour, and we look for anomalies that may be the signature of malicious interference. For example, we are monitoring the signals on the CAN bus that connects all the major electronic components in a vehicle. This anomaly-based approach has two advantages: we don’t need to make regular updates to a database of virus definitions and we don’t need to keep updating the system to fix vulnerabilities.

Clearly, however, it’s a challenge to build up a statistical picture of the normal behaviour on a system interconnect such as a CAN bus. We can see information from system components, such as the odometer, as well as actions from the driver, such as the indicator being switched on. Driving at 30 mph in a town is different to driving at 70 mph on a motorway. Both are different again to waiting in a queue at traffic lights. And of course, there are differences in how different drivers interact with the vehicle. All of these (and many more) create a huge range of different scenarios contributing to the picture of normal behaviour.

This leaves us with a very difficult problem to solve. Secure-CAV is the first, important step. Aside from the problem of recognising anomalous behaviour, there are significant engineering challenges. Any method to detect it must fit onto a chip and therefore has to be small and to consume little electrical power.

The role of the University of Southampton in the Secure-CAV project is to develop classification algorithms for abnormal system behaviour. Because the anomalous behaviour can be complex, we cannot just look for signatures as we would in an antivirus tool – we need to identify deeper patterns. Thus, we are using machine learning approaches to learn normal behaviour and to recognise anomalies. Machine learning in this environment presents its own challenges, which we will look at in part 2 of this blog.

You May Also Like