Close up macro shot of a glass vial containing murky wastewater sample in a laboratory setting. Shot with Canon EOS R5, 100mm macro lens, f2.8, soft natural lighting from window, shallow depth of field with sharp focus on water droplets.
Created using Ideogram 2.0 Turbo with the prompt, "Close up macro shot of a glass vial containing murky wastewater sample in a laboratory setting. Shot with Canon EOS R5, 100mm macro lens, f2.8, soft natural lighting from window, shallow depth of field with sharp focus on water droplets."

METAGENE-1: A 7B Parameter Model for Detecting Pathogens in Wastewater

Prime Intellect and USC researchers just released METAGENE-1, an open-source 7B parameter model trained on wastewater DNA sequences. This model scans through 1.5 trillion base pairs of genetic data to spot dangerous pathogens before they spread.

The team built METAGENE-1 specifically for biosurveillance and pandemic monitoring. It analyzes short DNA fragments from wastewater samples to detect potential threats early. This matters because wastewater testing gives us a broader view of what pathogens are circulating in a population compared to individual testing.

The model achieves the best results so far on genomic benchmarks focused on finding human pathogens. Prime Intellect made this model fully open source – you can find the code, model weights, and technical details on their GitHub and Hugging Face pages.

What makes METAGENE-1 powerful is its training data. The Nucleic Acid Observatory provided 1.5 trillion base pairs of DNA and RNA sequences from wastewater samples. This massive dataset lets the model learn complex patterns in genetic code that signal the presence of known and emerging pathogens.

The USC research team, led by Oliver Liu and Jason Wiemels, worked with Prime Intellect’s engineers to design the model architecture. They used an autoregressive transformer that excels at processing short genetic sequences between 100-300 base pairs long.

If you want to try METAGENE-1 yourself, check out:
Website: metagene.ai
Paper: metagene.ai/metagene-1-paper.pdf
GitHub: github.com/metagene-ai/metagene-pretrain
Hugging Face: huggingface.co/metagene-ai

This release shows how AI can help detect and prevent disease outbreaks by monitoring genetic signals in wastewater. The open-source nature of METAGENE-1 means researchers worldwide can build on this work to improve pathogen detection.