Machine-learning tools to better understand how genes and the environment affect health

Next-Generation Algorithms in Statistical Genetics Based on Modern Machine Learning

NIH-funded research Cornell University · NIH-11159437

Researchers are building new computer methods to use very large genetic and health datasets to help predict disease risk and find genetic causes of illness.

Quick facts

Grant typeNIH-funded research
Study typeNIH-funded research
Funding institutionCornell University NIH-funded
Lab location1 site (Ithaca, United States)
Project IDNIH-11159437 on NIH RePORTER

What this research studies

This project develops modern machine-learning algorithms to analyze millions of human genomes alongside clinical data and environmental information. It focuses on modeling genetic variation, reanalyzing genome-wide association study data, and predicting health risk from genes plus environment. The team will create open-source software for tasks like genetic imputation, haplotyping, identifying likely causal variants, and computing genetic risk scores. These tools are meant to help researchers and clinicians turn big genetic datasets into insights that could inform prevention and personalized care.

Who could benefit from this research

Good fit: People who can provide or share genetic data and medical records—especially those with well-documented diseases or family histories—would be the most relevant contributors to this work.

Not a fit: Patients without available genetic or clinical data, or whose conditions are driven mainly by non-genetic factors, are less likely to see direct benefits in the near term.

Why it matters

Potential benefit: If successful, the work could produce more accurate genetic risk predictions and clearer identification of disease-related variants that support more personalized prevention and treatment strategies.

How similar studies have performed: Related statistical-genetics and machine-learning efforts have improved risk prediction and variant discovery, but applying large-scale modern ML across millions of genomes and clinical records is relatively new and still being tested.

Where this research is happening

Ithaca, United States

Researchers

About this research

  1. This is an active NIH-funded research project — typically early-stage science, not a clinical trial accepting patient enrollment.
  2. Some NIH-funded labs run parallel clinical studies or seek volunteers for related work. To check, contact the principal investigator or institution listed above.
  3. For full project details, budget, and progress reports, visit the official NIH RePORTER page below.
Conditions DiseaseDisorder
Last reviewed 2026-06-09 by the Find a Trial editorial team. Information on this page is for educational purposes and is not medical advice. Always consult qualified healthcare professionals about clinical trial participation.