Increased Data Security Using ‘EzPC’ In The Machine Learning Model Validation Process

On Jan 16, 2022

Artificial intelligence (AI) has revolutionized various industries in the last decade, from manufacturing and logistics to agriculture and transportation—examples include improving predictive analytics on the manufacturing floor and making microclimate predictions to respond and save their crops in time. AI adoption is projected to accelerate in the following years, emphasizing the importance of an efficient adoption process that protects data privacy.

Firms that want to incorporate AI into their workflow undergo a model validation process. They test or verify AI models from different suppliers before choosing the one that best matches their needs. This is typically done with a test dataset provided by the organization. Unfortunately, the two alternatives for model validation that are now available are insufficient; both risk data exposure.

One of these alternatives is for the AI provider to share their model with the organization to validate it on its test dataset. However, the AI provider risks disclosing its intellectual property, which it obviously wants to preserve. The second, equally riskier approach is to share its test dataset with the AI provider. This is an issue on two levels. It runs the danger of revealing a dataset containing sensitive information, for starters.

Furthermore, there is a danger that the AI vendor may utilize the test dataset to train the AI model, “over-fitting” the model to the test dataset to provide convincing results. The model must not have been trained to effectively analyze how an AI model performs on a test dataset. Currently, these issues are handled through complicated legal agreements that might take months to form and execute, causing a significant delay in the AI adoption process.

The danger of data exposure and the requirement for legal agreements is exacerbated in healthcare, where patient data—which constitutes the test dataset—is susceptible. Both firms must adhere to tight privacy standards. Furthermore, the vendor’s AI model may incorporate sensitive patient information as part of the training data used to construct it and the private intellectual property information. This creates a problematic situation.

On the one hand, healthcare companies want to fast adopt AI because of its vast potential in applications such as recognizing patient health risks, forecasting and diagnosing illnesses, and providing tailored health interventions. On the other hand, there is a rapidly expanding number of AI providers in the healthcare field to select from (now over 200), making the accumulated legal paperwork of AI validation intimidating.

EzPC stands for Easy Secure Multi-Party Computation.

source: https://www.microsoft.com/en-us/research/uploads/prod/2022/01/EZPC_1400x788_no_logo_final.gif

Easy Secure Multi-party Computation (EzPC) is keen on speeding up the AI model validation process while simultaneously protecting dataset and model privacy. Researchers in cryptography, computer languages, machine learning (ML), and security collaborated to create this open-source platform. EzPC is built on secure multiparty computing (MPC), a set of cryptographic protocols that allows several parties to collaboratively calculate a function on their private data without disclosing that data to one another or any other party. Because of these capabilities, AI model validation is a good use case for MPC.

However, MPC has been present for over four decades, but it is rarely used since developing scalable and efficient MPC protocols requires extensive cryptography knowledge. Furthermore, while MPC works well when calculating tiny or essential stand-alone functions, merging numerous critical functions in ML applications—is far more complicated and wasteful if done without a particular talent.

EzPC tackles these issues by allowing developers, not just cryptography specialists, to utilize MPC as a building block in their applications while maintaining excellent computing speed. EzPC is built around two breakthroughs. To begin, a modular compiler called CrypTFlow accepts TensorFlow or Open Neural Network Exchange (ONNX) code for ML inference as input. It creates C-like code that can later be compiled into multiple MPC protocols. This compiler is “MPC-aware” and optimized, guaranteeing that MPC protocols are efficient and scalable. The second breakthrough is a set of highly performant cryptographic methods for securely calculating complicated machine learning algorithms.

Source: https://www.microsoft.com/en-us/research/blog/ezpc-increased-data-security-in-the-ai-model-validation-process/

EzPC in action: AI validation across many institutions in medical imaging

Stanford University researchers constructed an internationally recognized 7-million parameter DenseNet-121 AI model trained on the CheXpert dataset to identify specific lung ailments from chest X-rays. In contrast, CARING researchers created a labeled test dataset of 500 patient pictures. The aim was to evaluate the CheXpert model’s accuracy on CARING’s test dataset while maintaining the anonymity of both the model and the test data.

EzPC allowed the first-ever secure validation of a production-grade AI model, demonstrating that data sharing is not required for accurate AI model validation. Furthermore, the secure validation’s performance overheads were fair and feasible for the application. It took 15 minutes to make safe inferences on a single image from test data across two conventional cloud virtual machines, which was almost 3000 times longer than the time required to test a picture without the enhanced security provided by EzPC. Running all of the photos from the test data took five days and cost $100. The complete assessment would have taken 15 minutes if we had performed all of the test photos simultaneously. The document, Multi-institution encrypted medical imaging AI validation without data sharing, contains the specifics of this real-world case study.

Standardizing privacy technologies and applications outside of healthcare will be important in the future.

With EzPC, MPC technology is now practical to run on complex AI workloads. Making it a game-changer in data collaboration enables organizations across all industries to select the best AI models for their use cases while protecting data and model confidentiality.

Furthermore, this technology can influence the drafting of complicated legal agreements essential for the AI model validation process. In addition to AI model validation, EzPC may be used in various scenarios where data privacy is critical such as phishing detection, tailored radiotherapy, speech to keywords, and analytics.

EzPC is available on GitHub under an MIT license. Discover the most recent advances on the EzPC research project page, where you can read the related papers and watch videos to learn more.

Project: https://www.microsoft.com/en-us/research/project/ezpc-easy-secure-multi-party-computation/

Github: https://github.com/mpc-msri/EzPC

Related Paper: https://eprint.iacr.org/2017/1109.pdf

Reference: