Best practices for strengthening machine learning security


Machine learning security is business critical

Machine learning security has the same goal as all cybersecurity measures: reducing the risk of sensitive data being exposed. If a bad actor tampers with your machine learning model or the data it uses, that model could produce incorrect results that at best undermine the benefits of machine learning and at worst negatively impact your business or customers.

“Executives should care because there’s nothing worse than doing the wrong thing very quickly and confidently,” said Zach Hanif, vice president of machine learning platforms at Capital One. And while Hanif works in a regulated industry – financial services – requiring additional levels of governance and security, he says any business adopting ML should take the opportunity to examine its security practices.

Devon Rollins, vice president of cyber engineering and machine learning at Capital One, adds, “Protecting business-critical applications requires a level of differentiated protection. It’s safe to assume that many implementations of machine learning tools at scale are critical given the role they play for businesses and how they directly impact outcomes for consumers.”

New security considerations to keep in mind

Although the best practices for securing machine learning systems are similar to those for any software or hardware system, the wider adoption of machine learning also presents new considerations. “Machine learning adds another level of complexity,” explains Hanif. “This means that organizations need to consider the multiple points in the machine learning workflow that can represent entirely new vectors.” These core elements of the workflow include the ML models, the documentation and systems around those models and the data they use, and the use cases they enable.

It is also imperative that ML models and supporting systems are designed with security in mind right from the start. It is not uncommon for engineers to rely on freely available open source libraries developed by the software community rather than coding every single aspect of their program. These libraries are often designed by software engineers, mathematicians, or academics who may not be as familiar with writing secure code. “The people and skills needed to develop high-performance or cutting-edge ML software may not always intersect with security-focused software development,” adds Hanif.

According to Rollins, this highlights the importance of sanitizing the open source libraries used for ML models. Developers should consider considering confidentiality, integrity, and availability as a framework to guide information security policy. Confidentiality means that data assets are protected from unauthorized access; integrity refers to the quality and security of data; and availability ensures that the right authorized users can easily access the data needed for the job at hand.

Additionally, ML input can be manipulated to compromise a model. One risk is inference manipulation—essentially changing the data to trick the model. Because ML models interpret data differently than the human brain, data can be manipulated in ways that are imperceptible to humans but still change the results. For example, all it might take to compromise a computer vision model might be to change one or two pixels in an image of a stop sign used in that model. The human eye will still see a stop sign, but the ML model may not categorize it as a stop sign. Alternatively, one can explore a model by submitting a series of different inputs, thereby learning how the model works. By observing how incoming data affects the system, Hanif explains, outside actors can figure out how to cloak a malicious file so it’s not detected.

Another risk vector is the data used to train the system. A third party can “poison” the training data so that the machine learns something incorrectly. As a result, the trained model will make mistakes – for example, automatically identifying all stop signs as yield signs.



Source link