Developing bioinformatics pipeline for processing environmental DNA metabarcoding sequencing data

Iva Sabolić1*, Lucija Markulin1, Teja Petra Muha2, Barbara Jenko2, Uršula Prosenc Zmrzljak2

1eDNA Labs, Labena d.o.o., Jaruščica 7, 10000 Zagreb, Croatia

2BIA Separations CRO, Labena d.o.o., Teslova ulica 30, 1000 Ljubljana, Slovenia

iva.sabolic [at] labena.hr

Abstract

Environmental DNA (eDNA) is DNA present in an environmental sample, originating from any biological material released from organisms living in that environment. This DNA can be isolated, amplified, sequenced, and analyzed in order to examine the taxonomic richness and abundance of different organism groups in the targeted environment. Methods of eDNA metabarcoding thus offer a unique opportunity to systematically streamline and scale-up regular biological assessments across many different environments of interest.

Recently, as a part of the project funded by European structural and investment funds, Labena d.o.o. company established a modern laboratory in Zagreb focused on the research and provision of services in the field of eDNA. In collaboration with the Institute Ruđer Bošković we have been working on developing tests for analysis of water quality based on the eDNA and, as part of the standardization and optimization of sample-to-results eDNA analysis process, we developed a custom bioinformatics pipeline to facilitate efficient and effective eDNA sequencing data analysis.

The pipeline was was written in Bash and utilizes several different algorithms to filter, trim, merge, denoise and classify targeted eDNA sequences. Python-based scripts which allow automatically download, filter, and format the data available on various online platforms were included in the pipeline to facilitate the curation of custom reference databases needed for taxonomic classification of targeted organism groups. User-friendly and interactive pipeline report generation, comprised of both wet- and dry-lab step-by-step sample statistics and graphical representations or the main results, is supported using Rmarkdown and Plotly and DataTables libraries. The pipeline is containerized in Docker, allowing for easier environment building and pipeline deployment.

Keywords: environmental DNA, pipeline, reference databases, containerization

Comments are closed.