Data and Analytics, Detection and Diagnostics, Proof-of-Concept, Technology and Market
GEMstone 2.0: Secure Interrogation of Genomic Databases — Project Report
Genomic data are becoming increasingly valuable as we develop methods to utilize the information at scale and gain a greater understanding of how genetic information relates to biological function. Advances in synthetic biology and the low cost of sequencing are increasing the amount of privately held genomic data. As the quantity and value of private genomic data grow, so does the incentive to acquire and protect such data, which in turn creates a need to store and process these data securely.
This project explores the limitations, opportunities, and capabilities of secure computation techniques applied to DNA sequence comparisons. Using homomorphic encryption (a software-based encryption approach) the Secure Interrogation of Genomic DataBases (SIG-DB) protocol was developed to enable searches of databases (DB) of genomic sequences with an encrypted query sequence without revealing the query sequence to the database owner or any of the database sequences to the Querier. Our results show that the SIG-DB algorithm returns an accurate assessment of the similarity of queries to databases of interest. The computational runtime and information leakage were compared between a fully homomorphic approach using the Microsoft SEAL cryptosystem and a partially homomorphic approach using the Paillier cryptosystem. SIG-DB is the first application that we are aware of to take advantage of locality-sensitive hashing and homomorphic encryption to allow generalized sequence-to-sequence comparisons of genomic data.
We also explored an alternative approach that uses hardware-based secure computation, specifically Software Guard eXtension (SGX), by Intel®. We were unable to complete a prototype at this time due to the immaturity of the technology. However, our research findings indicate that SGX has the potential to enable a cloud-based secure computation system with, theoretically, minimal information leakage and similarity scoring execution times near equivalent to plaintext comparisons. Much research remains to be done to fully understand the operational and security limitations of the system.
We briefed government stakeholders on our prototype and findings at a recent B.Next event. Attendees expressed strong support for continued work on homomorphic encryption for secure interrogation of genomic databases. The participants provided valuable feedback on the tool and numerous use cases they encounter that could be transformed by this approach. Although the algorithm was developed specifically for microbial genomics comparisons, SIG-DB could be useful for a number of applications, including healthcare, human genomics, organizational collaborations, and more.