According to two recent research papers published in Nature Genetics, newly created artificial intelligence (AI) systems were able to predict the three-dimensional (3D) structure of DNA as well as the function of its elements regulators based only on their raw sequence.
Predicted 3D structure for a segment of human genomic DNA. Image credit: UT Southwestern Medical Center.
Study author Jian Zhou, PhD., assistant professor in the Lyda Hill Department of Bioinformatics at UTSW, believes these tools could provide new insight into how genetic mutations cause disease, as well as new knowledge of how genetic sequence affects spatial organization and function. of chromosomal DNA in the nucleus.
Dr. Zhou is a member of the Harold C. Simmons Comprehensive Cancer Center, a Cancer Prevention and Research Institute of Texas (CPRIT) Scholar, and a Lupe Murchison Foundation Scholar in Medical Research.
Together, these two programs provide a more complete picture of how changes in DNA sequence, even in non-coding regions, can have dramatic effects on its spatial organization and function.
Jian Zhou, Assistant Professor, Lyda Hill Department of Bioinformatics, UT Southwestern Medical Center
The instructions for building proteins are encoded in only about 1% of human DNA. Recent studies have revealed that a large portion of the remaining noncoding genetic material contains regulatory components that control the expression of coding DNA, such as promoters, enhancers, silencers, and insulators.
According to Dr. Zhou, it is not clear how sequencing affects the functions of most regulatory elements.
Zhou and his colleagues at Princeton University and the Flatiron Institute created the Sei deep learning model to better understand these regulatory elements. Sei correctly categorizes these non-coding DNA fragments into 40 “sequence classes” or jobs, such as an enhancer of gene activity in stem cells or brain cells.
More than 97% of the human genome is represented by these 40 sequence classes, which were created from approximately 22,000 datasets from previous research studying genome control. In addition, Sei can classify each sequence according to its expected activity in each of the 40 sequence types and predict how mutations would affect those activities.
The researchers were able to define the regulatory architecture of 47 traits and disorders listed in the UK Biobank database and explain how mutations in regulatory components induce certain pathologies by using Sei on human genetics data.
These skills can help in the systematic study of relationships between changes in the genomic sequence and diseases and other characteristics. The results were published in August 2022.
In May 2022, Dr. Zhou announced the creation of a separate tool called Orca, which uses DNA sequences to predict the 3D arrangement of chromosomes.
Dr. Zhou trained the model to build connections and evaluated the model’s ability to predict structure at different length scales using existing datasets of DNA sequences and structural data acquired from previous research showing the folds, twists and turns of the molecule.
The results revealed that Orca accurately predicted both small and large DNA structures based on its sequences, especially for sequences carrying mutations linked to a variety of medical disorders, including a form of leukemia and limb deformities.
The researchers’ use of Orca also allowed them to develop new theories about how DNA sequence affects both local and large-scale 3D structure.
Sei and Orca, which are publicly accessible on web servers and as open source code, will be used by Dr. Zhou and his team to further investigate the role of genetic mutations in causing the molecular and physical manifestations of diseases. This research could one day lead to new treatment options for these conditions.
The National Institutes of Health (DP2GM146336), CPRIT (RR190071), and the UT Southwestern Endowed Fellows Program in Medical Sciences supported the Orca study.
Journal references:
Chen, KM, et al. (2022) A sequence-based global map of regulatory activity to decipher human genetics. Genetics of nature. doi:10.1038/s41588-022-01102-2.
Zhou, J. (2022) Sequence-based modeling of three-dimensional genome architecture from the kilobase to the chromosome scale. Genetics of nature. doi:10.1038/s41588-022-01065-4.
Source: