AI tools to identify unknown small molecules in the human body

A machine-learning platform to illuminate the chemical dark matter in mass spectrometry-based metabolomics

NIH-funded research Princeton University · NIH-11195000

This project builds AI that learns to recognize and name the many unknown small molecules found in blood, urine, and tissues so doctors and researchers can better understand exposures and metabolism.

Quick facts

Grant typeNIH-funded research
Study typeNIH-funded research
Funding institutionPrinceton University NIH-funded
Lab location1 site (Princeton, UNITED STATES)
Project IDNIH-11195000 on NIH RePORTER

What this research studies

From the patient's perspective, researchers are creating a software platform that reads mass spectrometry data from biological samples and uses machine-learning patterns to propose likely chemical identities for previously unrecognized signals. The team will train and benchmark models against known chemical libraries, filter out lab artifacts and contaminants, and refine predictions using large human metabolomics datasets. The tools are meant to be user-friendly so clinicians and labs can apply them to blood, urine, or tissue samples to help explain abnormal results or environmental exposures. Over time this work aims to make metabolomic results more interpretable and clinically useful.

Who could benefit from this research

Good fit: People who provide biological samples for metabolomics (blood, urine, stool, or tissue) or who are enrolled in research that collects mass spectrometry data would be the most relevant participants or data contributors.

Not a fit: Patients who do not have samples analyzed by mass spectrometry or whose conditions are unrelated to metabolism or chemical exposures may see no direct benefit from this work.

Why it matters

Potential benefit: If successful, the platform could help identify previously unknown molecules in patient samples, improving detection of exposures, biomarkers, and personalized treatment insights.

How similar studies have performed: Earlier computational metabolomics tools have improved identification of some molecules, but many signals remain unlabeled and applying advanced AI language-model techniques to this problem is relatively new.

Where this research is happening

Princeton, UNITED STATES

Researchers

About this research

  1. This is an active NIH-funded research project — typically early-stage science, not a clinical trial accepting patient enrollment.
  2. Some NIH-funded labs run parallel clinical studies or seek volunteers for related work. To check, contact the principal investigator or institution listed above.
  3. For full project details, budget, and progress reports, visit the official NIH RePORTER page below.
Last reviewed 2026-06-13 by the Find a Trial editorial team. Information on this page is for educational purposes and is not medical advice. Always consult qualified healthcare professionals about clinical trial participation.