An effective response to a disease outbreak requires the rapid identification of pathogen and source.

Exploiting genomic information has become an important component of effective biothreat agent identification, characterization, and attribution. To do so, the necessary bioinformatic analyses require known genomic data against which to compare the agent’s genomic data. However, more and more genomic data is becoming privately held. To truly understand where an agent came from or important features of the agent (e.g., virulence, alternative hosts, and environmental stability), the biodefense community will likely need to leverage the genomic data that resides in these private databases. This may be especially important when a truly novel agent is discovered and near-neighbors need to be identified. Security requirements necessary for biothreat agent information or active investigations limit the direct sharing of genomic information with outside parties. Private entities are often unable to share access to their database due to privacy and legal issues. Fortunately, technology options exist that enable secure computations to be executed that fulfill data privacy requirements.

We developed the Secure Interrogation of Genomic Databases (SIG-DB) algorithm to enable the interrogation of a privately held database with a sequence of interest to determine the presence of similar sequences, without compromising the query or database information. This method was confirmed to be functional and evaluated using wild-type and in silico mutated versions of Escherichia coli and Staphylococcus aureus genomic sequences obtained from the NCBI RefSeq database.

This is the poster that was presented at the 2018 annual biothreats meeting, hosted by the American Society for Microbiology (ASM).

Genomic data are becoming increasingly valuable as we develop methods to utilize the information at scale and gain a greater understanding of how genetic information relates to biological function.  Advances in synthetic biology and the low cost of sequencing are increasing the amount of privately held genomic data.  As the quantity and value of private genomic data grow, so does the incentive to acquire and protect such data, which in turn creates a need to store and process these data securely.

This project explores the limitations, opportunities, and capabilities of secure computation techniques applied to DNA sequence comparisons. Using homomorphic encryption (a software-based encryption approach) the Secure Interrogation of Genomic DataBases (SIG-DB) protocol was developed to enable searches of databases (DB) of genomic sequences with an encrypted query sequence without revealing the query sequence to the database owner or any of the database sequences to the Querier. Our results show that the SIG-DB algorithm returns an accurate assessment of the similarity of queries to databases of interest. The computational runtime and information leakage were compared between a fully homomorphic approach using the Microsoft SEAL cryptosystem and a partially homomorphic approach using the Paillier cryptosystem.  SIG-DB is the first application that we are aware of to take advantage of locality-sensitive hashing and homomorphic encryption to allow generalized sequence-to-sequence comparisons of genomic data.

We also explored an alternative approach that uses hardware-based secure computation, specifically Software Guard eXtension (SGX), by Intel®. We were unable to complete a prototype at this time due to the immaturity of the technology.  However, our research findings indicate that SGX has the potential to enable a cloud-based secure computation system with, theoretically, minimal information leakage and similarity scoring execution times near equivalent to plaintext comparisons.  Much research remains to be done to fully understand the operational and security limitations of the system.

We briefed government stakeholders on our prototype and findings at a recent B.Next event.  Attendees expressed strong support for continued work on homomorphic encryption for secure interrogation of genomic databases.  The participants provided valuable feedback on the tool and numerous use cases they encounter that could be transformed by this approach.  Although the algorithm was developed specifically for microbial genomics comparisons, SIG-DB could be useful for a number of applications, including healthcare, human genomics, organizational collaborations, and more.

Improvements in DNA sequencing technologies (which determine the order of the four constituent bases – A, G, C and T – in which the genetic code is written) have the potential to transform how we detect and respond to infectious disease.  Defense against pathogens, whether naturally occurring diseases or intentional biological attacks, depends critically on our ability to detect and identify pathogens, in order to accurately diagnose, properly triage and treat those infected, and gauge the extent and dynamics of an outbreak.  We review here the progression of DNA sequencing technologies since the 1970’s, the potential impact of sequencing on detecting and managing epidemics, and other applications that will support and expand innovations in sequencing technology.

Leveraging multiple data variables is critical for effective infectious disease outbreak management. Contextual actionable data provided to decision makers is often best analyzed and conveyed visually. However, limited human resources and collaborative platforms create many challenges for the effective use of data.  B.Next executed a project to explore, evaluate, and demonstrate how and to what extent information technology capabilities might empower public health analysts with limited or no coding experience to create enhanced data visualizations during an infectious disease outbreak. Dynamic visualizations with multiple variables have proven beneficial in past epidemic management operations for situational awareness. The popular tools available to produce interactive visualizations require financial resources beyond what is often available to state and local public health organizations or coding expertise to utilize open source coding libraries such as JavaScript, R, and Python. This report outlines the methods and results of a B.Next project with Plotly, a company that creates open source tools for visualization, to further develop their existing web-based interface to create interactive cross-filtering visualizations with multivariate datasets for non-coders.  Plotly enhanced their Chart Studio tool to enable cross-filtering functionalities with nine different charts in a dashboard display. An evaluation of Plotly’s enhancement was completed by both public health subject matter experts and IQT Labs personnel with expertise in infectious disease and visualization tools. The evaluation suggests that the cross-filtering functionality within Plotly provides new capabilities accessible to non-coders. However, Plotly needs additional improvements to compete with more intuitive interfaces such as Tableau.  

We live in an era of increasingly frequent and impactful infectious disease outbreaks. Naturally occurring outbreaks will have significant regional and international security implications for the foreseeable future, given the negative impacts (i.e., death, societal disruption, and economic costs) of such events. We also live in an information era. Integrating novel and available data technologies into public health practice will improve situational awareness, help shape outbreak interventions more precisely, facilitate faster and more efficient response activities, and save lives. To realize these efficiencies, federal, state and local public health agencies need a fundamentally more aggressive and systematic adoption, use and coordination of data technologies to provide essential information for tailoring interventions during an outbreak. Current and emerging data technologies can help tackle the next epidemic.

Medical laboratory analysis is moving steadily and quickly into molecular biology, in which tests can assess health based on DNA sequence, or on changes in the amount of certain proteins in blood. There are tests currently available to detect the levels of over 100 different proteins in human blood, changes in which have been correlated with changes in health or disease. Nearly all of these tests are called immunoassays because they rely on the use of immune system proteins called antibodies as the essential ingredient in the test.

Antibodies have the property of binding tightly to a specific molecular target (and only that target), giving immunoassay tests the specificity required to detect target proteins in a sea of proteins.

The most commonly used immunoassay is called ELISA (Enzyme-Linked Immuno-Sorbent Assay), which is used by medical labs and researchers to detect specific proteins of interest in liquid samples. The human genome encodes about 25,000 proteins. Of these proteins, no more than 10 percent are in sufficient concentration to be reliably measured with conventional ELISA. What clinical insights lie within the other 90 percent? Quanterix’s revolutionary Simoa technology unlocks a world of insight into disease detection, diagnosis, and patient treatment while meeting the demands of today’s laboratory.

Anyone who has had blood drawn at a doctor’s office is familiar with much of the information that results from its analysis, such as glucose, cholesterol, lipids, and cell counts. In the past two decades, two trends have given rise to a revolution in the next generation of diagnostic technologies.

First, molecular biology is increasingly finding molecules in bodily fluids (such as blood and saliva) and tissues that provide information about a patient’s health state to guide the decisions made by a physician (“biomarkers”). Second, engineering advances have enabled these tests to be performed with equipment that is smaller, less expensive, and requires less sample for analysis. These trends are creating market opportunities that, for DNA sequencing technologies alone, are predicted to exceed $20 billion with opportunities beyond niche diagnostic markets. This is encouraging more entrepreneurs to join this rapidly developing market by applying advanced analytics and novel sensors to increase the speed, precision, and accuracy for interrogating human samples, while also reducing per-analysis costs.

The same advances in diagnostic technology are finding use outside of the clinical laboratory. Less expensive, more powerful diagnostics allow public health researchers to screen larger groups of people to determine if a population is at-risk of developing or harboring disease. Such broad applications of genomic- and proteomic-based diagnostics to public health are as vulnerable as any research to flaws that prevent their reproducibility, as has been reported recently. While there are many factors that may contribute to irreproducibility in the use of diagnostic technology, there are some sources of data variability and bias that can be mitigated when engineering the work ow from human sample to molecular answers on a population scale.

Background – This paper reports on a February 28, 2017 Roundtable Discussion convened by B.Next, an IQT Lab.

Several companies are developing DNA sequencing devices that can enable users to sequence DNA outside the traditional laboratory setting.  Among them, Oxford Nanopore is perhaps the most well-known.  The advent of portable sequencing devices opens up a wide variety of potential use cases that range from point-of-care medical diagnostics to on-site agricultural pest analysis.  It will soon be common for scientists to study animal and plant genetics and the structure of microbial communities close to where these species are found in nature.  In the realm of managing epidemics, the current state of portable sequencing technology presents potential opportunities to accelerate the collection of pathogen genomic sequence data during an outbreak.  Distributed sufficiently broadly, portable sequencers could function as “sensors” that help detect the spread and evolution of a pathogen.

The Roundtable included experts from industry, academia, finance and several USG agencies who manufacture, consume, invest in, or develop use cases for sequencing applications as they relate to disease outbreaks.  The discussion took place over a single day, included invited presentations from four participants plus prepared remarks from three others (see below), and was held on a not-for-attribution basis. (The participants agreed to allow IQT to publish a summary of key insights from the meeting.  In addition, participants named below consented to allow us to use their names in this report.)

We are living through a period of what may justly be called a revolution in our understanding of living organisms and how they operate. One eminent biologist has called the present period “the Golden Age of Biological Science”. As is often the case when scientific understanding of natural phenomena advances, new technologies – tools intended to “harness that understanding to human purpose”[1]  – quickly follow. The biorevolution has generated and benefited from an array of new technologies that endow us with new powers to manipulate microbes, plants, animals, and even whole biosystems. These capabilities will be hugely beneficial to humankind, and will be useful across a wide range of applications, from medicine to agriculture to food production and materials science. But as with all powerful technologies, biotechnologies are “dual-use”, and can be used for malignant purposes.

[1] Arthur, W. B. (2009). The nature of technology: What it is and how it evolves. New York: Free Press.

Infectious disease has been a topic of nearly constant media attention in the last two years due to the outbreak of Ebola virus in West Africa. The outbreak, which has continued to smolder long into 2015 (and likely through 2016), has raised questions that are central to our broader concerns about infectious disease both overseas and in the United States (and illustrate further that the distinction between over there and here are largely illusory). Why do outbreaks of infectious disease occur? Can we predict them? How do they spread? How can we respond to outbreaks more effectively? What is the role of technology in this response? In this article, we consider how far we have come in understanding infectious disease, point out current issues, and identify technology trends that will drive the next generation of solutions.