Cancer, a formidable adversary, often exploits the very building blocks of our cells. But what if we could understand how these building blocks go rogue and contribute to the disease?
Scientists at St. Jude Children's Research Hospital are delving into the intricate world of disordered proteins and their role in cancer, specifically focusing on how they form abnormal structures called biomolecular condensates. These condensates, essentially concentrated "droplets" of proteins, DNA, or RNA, can disrupt normal cellular functions and fuel cancer development. But the exact mechanisms behind this process have remained elusive… until now.
Fusion oncoproteins, which arise when genes merge and gain new abilities, are a key player in this story. One such ability is the formation of these condensates. The researchers focused on intrinsically disordered regions (IDRs) within these proteins – unstructured segments often involved in condensate formation. They wanted to know: Do these IDRs drive the formation of cancer-causing condensates?
To answer this, the team developed a machine-learning model called IDR-Puncta ML. They trained it using experimental data on IDRs from fusion oncoproteins to predict the behavior of other such regions. The results, published in Science Advances, are fascinating.
And this is the part most people miss: The model revealed that only about 12% of all human IDRs are capable of forming condensates. These condensate-forming IDRs are primarily found within proteins strongly linked to RNA-related functions. This finding underscores the complexity of condensate formation and provides a valuable resource for studying cancer and RNA biology.
This research builds upon a 2023 study in Nature Communications, which predicted condensate formation by fusion oncoproteins. This latest work goes deeper, focusing on the IDRs and their role in this process.
"We reasoned that IDRs were associated with condensate formation in a significant portion of droplet-forming fusion oncoproteins," explains Dr. Richard Kriwacki, a corresponding author of the study. "This allowed us to use our data science tools to understand the sequence features underlying these results, providing insight into the role of these flexible protein regions in human biology."
The team combined machine learning with a robust experimental pipeline to develop IDR-Puncta ML. "We built a dataset by testing different IDRs from various fusions and experimentally validated whether they could independently form droplets," said co-first author Dr. Snigdha Maiti. "Based on this dataset, we created a machine learning model to predict if other IDRs with similar amino acid sequence features could also form condensates."
The model showed impressive accuracy, exceeding 90%, in predicting condensate formation. This allowed them to extend their predictions to the entire human proteome. Interestingly, the fact that only 12% of IDRs were predicted to form condensates suggests that these regions might be linked to specific functions.
"This suggests that condensate-forming IDRs are likely related to specific functions," noted co-first author Dr. Swarnendu Tripathi.
Their hypothesis was confirmed when they discovered that the proteins with these IDRs were largely involved in RNA biology. "We found that IDRs driving condensate formation are within proteins that have specific cellular functions, such as RNA processing, splicing, and regulation of RNA metabolism," said co-first author Dr. David Baggett. "This implies that while some IDRs can form condensates independently, others might need help from other regions of the protein."
But here's where it gets controversial: The research suggests that not all IDRs are created equal. Some can form condensates on their own, while others need assistance. This raises questions about the specific roles of different IDRs and how they interact with other parts of the protein.
IDR-Puncta ML is freely available and provides a crucial platform for understanding the molecular mechanisms behind condensate formation and its links to disease. As Dr. Baggett points out, "Understanding how fusion oncoproteins drive cancer is the first step toward developing treatments, because we can't treat what we don't understand." This research is getting to the core of why these proteins and the condensates they form have the effects they do, which is the first step to correcting it.
What do you think? Does this research change how you view cancer? Do you think focusing on these specific IDRs could lead to new treatments? Share your thoughts in the comments below!