Squeegee Helps To Detect Contamination in Low Microbial Biomass Microbiomes – Technology Networks

Posted under Programming, Technology On By James Steward

We’ve updated our Privacy Policy to make it clearer how we use your personal data.
We use cookies to provide you with a better experience. You can read our Cookie Policy here.
Complete the form below and we will email you a PDF version of “Squeegee Helps To Detect Contamination in Low Microbial Biomass Microbiomes”
One of the major challenges in microbiome science has been distinguishing what is a potential environmental contaminant from a true, bona fide microbiome signal in low biomass studies – studies that contain little microbial DNA like breastmilk, placenta or amniotic fluid. For instance, it can be challenging to differentiate between the DNA of a microbe in a sample from remnant contaminant DNA from a sampling kit or extraction kit or the environment.

While researchers normally include negative controls from the equipment or environment and use algorithmic tools to identify microorganisms present in the environment, not all datasets come with negative controls. Researchers at Baylor College of Medicine and Rice University developed a new contamination detection tool to establish reproducibility in the identification and analysis of the microbes. Their findings were recently published in Nature Communications.

“We teamed up with our collaborators at Rice University to develop and test a computational tool we called Squeegee,” said Dr. Kjersti Aagaard, professor of obstetrics and gynecology at Baylor and Texas Children’s Hospital. “The premise of Squeegee is that we can use a computer analysis pipeline to help us detect ‘breadcrumbs’ of contaminants that would be anticipated to be common between the microbiome found in all human (or other mammalian) hosts and the sampling or lab environment.”

The Aagaard Lab at Baylor has conducted IRB-approved and NIH-funded research over the last decade leading to a number of rich datasets from a large number of participants that are particularly low biomass and have many negative controls. They teamed up with researchers at Rice’s Treangen Lab to test Squeegee, an algorithm used on life datasets from human studies that had contamination controls from different environments and DNA extraction kits. They looked at the false positive rate, the recall and how accurately Squeegee could predict and flag these environmental contamination sets with the absence of the negative control.

“We were able to show that Squeegee was capable of having a high-weighted recall and a very low false-positive rate in these ground truth datasets,” said Dr. Michael Jochum, postdoctoral research associate in the Department of Obstetrics and Gynecology Baylor.

According to Jochum, Squeegee improves the overall reliability of metagenomic sequencing analysis results in low biomass studies. The new contamination identification tool is capable of identifying batch effects, flagging them as potential contaminants. Given the focus and expertise of the Aagaard lab in studying these sparse microbial environments, this is a tool that they have added to their toolbox for ongoing and future studies.

“Squeegee is a first-of-its-kind tool for the microbiome science community, and it is freely available for use,” Aagaard said.  The source code for Squeegee is publicly available at https://gitlab.com/treangenlab/squeegee

Reference: Liu Y, Elworth RAL, Jochum MD, Aagaard KM, Treangen TJ. De novo identification of microbial contaminants in low microbial biomass microbiomes with Squeegee. Nat Commun. 2022;13(1):6799. doi:10.1038/s41467-022-34409-z

This article has been republished from the following materials. Note: material may have been edited for length and content. For further information, please contact the cited source.

source

Note that any programming tips and code writing requires some knowledge of computer programming. Please, be careful if you do not know what you are doing…

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.