
ETH Zurich Computer Scientists Create A ‘DNA search engine’
On Aug. 10, 2025, computer scientists at ETH Zurich announced they have developed a digital tool capable of searching through millions of published DNA records in a matter of seconds. This can significantly accelerate research into antibiotic resistance and unknown pathogens.
To date, biomedical scientists have needed massive computing power and other resources to search through this amount of DNA sequences and compare them with their own sequences – making the efficient searching in such mountains of data a sheer impossibility.
The scientists have developed a method that greatly shortens and facilitates this search. The “MetaGraph” digital tool searches the raw data of all DNA or RNA sequences stored in the databases – just like a conventional Internet search engine. After entering a sequence they are interested in as full text into a search mask, researchers can find out within seconds or minutes, depending on the query, where it has already appeared.
“MetaGraph“ is comparatively favorable in terms of costs, as the researchers state in their study. The representation of all public biological sequences would fit on a few computer hard drives, while larger queries should cost no more than 0.74 dollars per megabase.
As the DNA search engine the ETH researchers have developed is also both precise and efficient, it can help to accelerate genetic research – for example, in the case of little-researched pathogens or new pandemics. In this way, the tool could become a catalyst in research into antibiotic resistance: for example, by identifying resistance genes or useful viruses that can destroy bacteria – known as bacteriophages – in the databases.
In the study published on October 8 in the journal Nature, the ETH researchers demonstrate how MetaGraph works: the tool indexes the data and presents it in compressed form. This is achieved by way of complex mathematical graphs that improve the structure of the data – similar to spreadsheet programmes such as Excel. “Mathematically speaking, it is a huge matrix with millions of columns and trillions of rows,” as Rätsch states.
The ETH researchers first presented MetaGraph in 2020 and have been continuously improving it ever since. The tool is already available for queries. It provides a full-text search engine for millions of sequence sets from DNA and RNA, as well as proteins from viruses, bacteria, fungi, plants, animals and humans. At present, just under half of the sequence data sets available worldwide are indexed. According to Gunnar Rätsch, the rest should follow by the end of the year. Given that MetaGraph is available as open source it could also be of interest to pharmaceutical companies that have large amounts of internal research data.
Tags:
Source: ETH News
Credit: