Minha lista
Trabalhos apresentados
#1065824

A novel database of protein-RNA interactions

Autores: Luana Luiza Bastos,Diego César Batista Mariano,Raquel Melo Minardi
Apresentador: Luana Luiza Bastos • luizabastos.luana9@gmail.com
Resumo:
RNA (Ribonucleic Acid) plays an essential role in gene expression and protein synthesis, being fundamental to several biological processes. Likewise, RNA-binding proteins play a crucial role in a wide range of processes, mainly associated with the regulation of gene expression. In recent years, the relevance of RNA in the development of innovative therapies has driven the need for comprehensive and up-to-date databases that explore protein-RNA interactions. However, the lack of specialized databases represents a significant challenge for researchers and drug developers. We propose the development of RNApedia, a curated and specialized database for protein-RNA complexes, with user-friendly access through a web interface. The process started with the evaluation of data available in the Protein Data Bank (PDB). We selected structures containing at least one RNA chain and one protein chain, restricting the maximum resolution to 5 Å. Larger complexes were filtered, keeping only one RNA chain and one protein chain per PDB file, with a maximum distance of 6 Å between them. We eliminated redundancies by discarding complexes with identical sequences and removed water molecules, hydrogens, and crystallographic artifacts to ensure data quality. We used the Luna tool to model interactions to classify RNA atoms and the nAapoli classification criteria for amino acid atoms, finally, we used the distance criteria used in the VTR tool. Then, we calculated the Accessible Surface Area (ASA) to evaluate the size of the binding interfaces. The RNA classification was complemented by the RNACentral API, a database specialized in non-coding RNAs. Correlated sequences were categorized based on their functional descriptions. We also integrated experimental dissociation constant (Kd) data from PDBbind to enhance the affinity and functionality analysis of the complexes in our database, called RNApedia. We cataloged 34,057 complexes containing one RNA chain and one protein chain. Of these, 4,956 were classified using RNACentral, and 1,052 have correlated experimental Kd data. To facilitate searching, we plan to organize the data using clustering techniques, allowing the identification of complexes with similar binding sites. In addition, we will implement the R2DT tool for visualization of secondary structures and Infernal 1.0 for RNA type classification, providing more accurate and detailed analyses. In the future, we intend to correlate experimental data from the ProNAB database with RNApedia, expanding analytical possibilities for researchers interested in targeted therapies and new drug development. The web platform will allow simplified and intuitive access to the data, promoting advances in the understanding of protein-RNA complexes. We believe that RNApedia will significantly contribute to the understanding of the molecular mechanisms underlying protein-RNA interactions, contributing to the development of innovative and targeted therapies.
Palavras-chave: database, RNA, proteins, complexes
★ Running for the Qiagen Digital Insights Excellence Awards
#1065827

Analyzing thermostability patterns in proteins used in the production of second-generation biofuels

Autores: Diego Mariano,Raquel Minardi
Apresentador: Diego Mariano • diogohenks@hotmail.com
Resumo:
Biofuels are renewable energy sources derived from organic biomass. In the production of second-generation biofuels, fermentable sugars are extracted from biomass through the catalytic action of three enzyme complexes: endoglucanases, exoglucanases, and ß-glucosidases. Endoglucanases (E.C. 3.2.1.4) break down cellulose into oligosaccharides of varying lengths, which are then cleaved by exoglucanases (E.C. 3.2.1.91) into disaccharides like cellobiose. Finally, ß-glucosidases (E.C. 3.2.1.21) hydrolyze cellobiose into glucose, the primary sugar used in fermentation for biofuel production. Thus, thermostable ß-glucosidases, capable of functioning at elevated temperatures, hold significant potential for industrial applications. However, the structural determinants of their thermostability are not well understood. In this study, we employed graph-based structural signatures to analyze the three-dimensional structures of ß-glucosidases obtained from thermophilic organisms (organisms capable of surviving in high temperature environments). We collected ß-glucosidase sequences from the UniProt database (https://www.uniprot.org) and three-dimensional models from the AlphaFold Protein Structure Database (https://alphafold.ebi.ac.uk). Our dataset was composed of 1,717 enzyme structures — 890 from thermophilic and 827 from non-thermophilic organisms — divided into training and test subsets. Then, the dataset was imported into Orange data mining. Using the CatBoost machine learning algorithm, we achieved an accuracy of 0.816 and a F1-score of 0.817 on the test dataset. These results indicate that the proposed signature model provides valuable insights into the structural features underlying thermostability and could guide the design of more efficient enzymes for biofuel production.

Acknowledgements: The authors would like to thank the research funding agency FAPEMIG (Fundação de Amparo à Pesquisa do Estado de Minas Gerais).
Palavras-chave: Structural bioinformatics, Biofuels, Machine learning
★ Running for the Qiagen Digital Insights Excellence Awards
#1065831

EVALUATING THE ROLE OF KSRP PROTEINS IN KINETOPLASTIDS USING MOLECULAR DOCKING

Autores: Leandro Morais,Sheila Cruz Araujo,Diego César Batista Mariano,Leonardo Henrique França de Lima,Raquel Melo Minardi
Apresentador: Leandro Morais • leandrozno@gmail.com
Resumo:
Kinetoplastids are a group of flagellated protozoa that include trypanosomatids, which are responsible for neglected diseases such as Chagas disease and leishmaniasis. These diseases affect millions of people worldwide, primarily in low-income regions, and are associated with high morbidity and significant social and economic burdens. The Kinetoplastid Specific Ribosomal Protein (KSRP) is a key point of divergence between the ribosomal structures of kinetoplastids and other eukaryotes. KSRP is crucial for maintaining ribosome integrity and is essential for protein synthesis, making it a promising therapeutic target. However, current treatments for kinetoplastid-related diseases are limited due to issues such as low efficacy, resistance, toxicity, and the need for parenteral administration. This highlights the urgent need for new, effective therapies. Despite the importance of KSRP, its interactions with ribosomal RNA and potential inhibitors remain poorly understood. In this study, we applied structural bioinformatics techniques to investigate KSRP and its potential as a drug target. We collect sequences from public databases such as Protein Data Bank (PDB) and the Propedia. The KSRP structure from Trypanosoma cruzi (PDB ID: 5OPT) was retrieved from the Propedia database. We analyzed the structure to identify amino acid residues of KSRP in proximity to ribosomal RNA. These residues were categorized into functional domains, including RRM1, RRM2, LINKER, C-TERMINAL, and their combinations. Binding site analyses were performed to identify relevant complexes, and normalized residue frequencies were calculated to prioritize critical regions for drug targeting. Additionally, docking experiments using HDOCK, HPEPDOCK, and AlphaFold3 were conducted to assess interactions with candidate peptides. These tools utilized automated alignments and advanced rotational sampling, providing insights into the potential development of peptide inhibitors targeting KSRP. By disrupting KSRP function, such inhibitors could compromise ribosome integrity, ultimately leading to the parasite's death. This work provides a foundation for the development of novel therapeutic strategies against kinetoplastid-related diseases. Acknowledgements: The authors would like to thank the research funding agency FAPEMIG (Fundação de Amparo à Pesquisa do Estado de Minas Gerais), CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico), and CIIA Saúde for their support.
Palavras-chave: Structural bioinformatics, Docking, KSRP, Peptides
#1065833

Comparative Analysis of the Regularizing Effect of Batch Normalization in Multilayer Neural Networks - Case Study Applied to Propedia

Autores: Lucas Moraes dos Santos,Raquel Melo Minardi
Apresentador: Lucas Moraes dos Santos • moraes.lsantos@gmail.com
Resumo:
This study investigates the role of Batch Normalization (BN) as a regularization technique in Deep Neural Networks applied to the functional classification of peptides based on the peptide-protein interface (Pep-PI). In this case, we focus on comparing the effectiveness of BN with Dropout, a widely recognized regularization technique, using Propedia, a gold standard database developed for rational structure-based peptide design. The architectures analyzed include Multilayer Perceptrons (MLP) and Convolutional Neural Networks (CNN), both of which use biologically relevant input representations to characterize the structural and functional complexity of protein-peptide interactions.
To encode interactions at the atomic level, MLPs used graph-based structural signatures, representing macromolecular structures numerically and allowing the identification of structural similarities and differences. On the other hand, CNNs employed distance maps derived from the interfacial regions of protein-peptide complexes, encoding spatial relationships crucial for understanding functional interfaces. By integrating these complementary representations, the models were trained on biologically meaningful inputs that reflect the inherent complexity of the classification task.
BN, known for its ability to stabilize training and normalize layer activations, was analyzed not only as an optimization technique but also as a form of regularization. Unlike Dropout, which works by randomly deactivating units, BN mitigates the problem of internal covariate shift by stabilizing the range of activation values and allowing the use of higher learning rates. This results in faster convergence and less reliance on other regularization techniques, such as weight decay. Furthermore, unlike Dropout, which can temporarily reduce the expressive capacity of the model during training, BN keeps all units active, taking full advantage of the network's capacity.
The experimental results demonstrated the superiority of BN in both architectures. In MLPs, BN achieved significantly lower validation error (0.0228 ± 0.0015) compared to Dropout (0.0326 ± 0.0012) using batch sizes of 32, highlighting its robustness and generalization capabilities. In CNNs, BN also outperformed Dropout, with reduced errors during training (0.0094 ± 0.0019) and testing (0.0078). These results demonstrate that BN not only speeds up training but also acts as an efficient regularization without sacrificing model performance. This is particularly important in the context of structural bioinformatics, where datasets such as Propedia often involve high-dimensional data constrained by limited sample sizes.
This study establishes BN as a superior regularization technique for training deep learning models, with improved generalization and robustness compared to conventional approaches such as Dropout. Future work can explore these findings in other datasets by integrating advanced architectures such as SE(3)-equivariant models, or by combining BN with generative models to design peptides with specific functionalities. These advances promise to optimize the modeling of protein-peptide interactions and drive the development of more effective therapeutic interventions.
Acknowledgements: Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG).
Palavras-chave: Batch Normalization, Deep Neural Networks, Regularization
#1065838

AN INTERACTIVE WEB TOOL FOR EXPLORATORY ANALYSIS OF INTERATOMIC CONTACTS IN PROTEINS

Autores: Rafael Pereira Lemos,Diego César Batista Mariano,Sabrina de A. Silveira,Raquel Melo Minardi
Apresentador: Rafael Pereira Lemos • rafaellemos42@gmail.com
Resumo:
Contacts are defined as spatial relationships between molecules based on their distance, often
serving as prerequisites for characterizing molecular interactions and bonds. Methods for
calculating contacts are typically categorized as either cutoff-dependent, which rely on
Euclidean distances, or cutoff-independent, which utilize Delaunay and Voronoi tessellations.
While cutoff-dependent methods are recognized for their simplicity, completeness, and
reliability, traditional implementations remain computationally expensive, posing significant
scalability challenges in the current Big Data era of bioinformatics. To address these
limitations, we previously introduced COCaDA, a Python-based command-line tool for
large-scale protein interatomic contact cutoff optimization using alpha-carbon (Ca) distance
matrices. COCaDA demonstrated superior performance compared to other methods,
achieving faster computation times than advanced data structures like k-d trees while being
simpler to implement and entirely customizable. COCaDA classifies contacts based on seven
different types: hydrogen and disulfide bonds; hydrophobic, attractive, repulsive, and salt
bridge interactions; and aromatic stackings. Building on this foundation, we now present
COCaDA-Web, an interactive web-based platform that allows users to dynamically explore
and visualize protein contact data for entries in the Protein Data Bank (PDB). The platform
integrates the COCaDA Database, containing precomputed contact data for over 200,000
proteins and 800 million contacts, with weekly updates to include newly released entries.
Users can search entries by PDB ID, full protein name, or residue count. Detailed information
is provided for each entry, including atom nomenclature (chain, residue, and atom),
interatomic distances in angstroms, localization (intra-chain or inter-chain), and contact type.
An interactive 3D visualization highlights selected contacts, and the complete list of contacts
is available for download in .csv format. Future developments include user-defined
processing of custom entries, flexible distance cutoffs, user-tailored contact conditions, and
the ability to generate PyMOL session files with annotated contacts. By extending the
COCaDA algorithm to a user-friendly web interface, COCaDA-Web ensures accessibility for
users without programming expertise while facilitating exploratory and large-scale analyses
of interatomic contacts directly within a browser. COCaDA-Web is freely available at
https://bioinfo.dcc.ufmg.br/cocada-web.

Acknowledgements: The authors would like to thank the research funding agencies FAPEMIG (Fundação de Amparo à Pesquisa do Estado de Minas Gerais), CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico), and the CIIA-Saúde (Centro de Inovação em Inteligência Artificial para a Saúde) for their support.
Palavras-chave: Interatomic contacts, Web tool, Structural bioinformatics.
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1065839

AN APPROACH FOR EVALUATING PROTEIN BINDING SITE INTERACTIONS BY INTEGRATING INTERFACE CONTACTS AND ACCESSIBLE SURFACE AREA

Autores: Ana Luísa Araújo Bastos,Rafael Pereira Lemos,Raquel Melo Minardi
Apresentador: Ana Luísa Araújo Bastos • analuisaab123@gmail.com
Resumo:
Understanding the structural and energetic dynamics of protein interactions is essential for advancing fields such as drug discovery, protein engineering, and biomolecular research. In the Big Data era of Bioinformatics, there is a growing need for efficient, robust and scalable methods, capable of handling large datasets, such as those generated by high-throughput analyses. This study presents an approach for large-scale evaluation of potential binding sites in proteins, combining information on the variation of accessible surface area (?ASA) and protein interface interatomic contacts. With this, we aim to produce a reliable scoring metric that could serve as a guide to those approaches. To create this metric, we utilized an in-house command-line tool, COCaDA, to efficiently detect interface contacts in large-scale datasets. Seven contact types were considered: hydrogen and disulfide bonds; hydrophobic, attractive, repulsive, and salt bridge interactions; and aromatic stackings. To assess the energetic contributions of each interaction type, we assigned relative weightings based on their typical interaction strengths. The overall contribution of each residue was then determined by multiplying their interatomic contacts by their respective weighted values. ASA values were calculated using Naccess with a probe size of 1.4Å, approximating the radius of a water molecule. Residues with relative side-chain ASA values above 0.5 were classified as exposed. The ?ASA is then determined by comparing ASA values before and after interaction, with a positive ?ASA indicating residue occlusion post-interaction, while a negative ?ASA suggests increased exposure. Each residue's final score is derived by weighting contact information and ASA values, each initially contributing 50%. Thus, a high score reflects significant residue changes likely due to protein interface interactions, and clusters of high-scoring residues may indicate stronger binding sites compared to low scoring ones. Our ongoing work involves applying and refining this approach for chimeric vaccinal candidates generated by artificial intelligence models, in order to quickly score the best models. Our proposed approach is versatile and can be seamlessly integrated into other large-scale pipelines, facilitating efficient analysis of protein interactions in diverse research contexts.

Acknowledgements: The authors would like to thank the research funding agencies FAPEMIG
(Fundação de Amparo à Pesquisa do Estado de Minas Gerais), CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico), and the CIIA-Saúde (Centro de Inovação em Inteligência Artificial para a Saúde) for their support.
Palavras-chave: Protein Contacts; Accessible Surface Area; Structural Bioinformatics.
★ Running for the Qiagen Digital Insights Excellence Awards
#1065842

AN APPROACH FOR TEACHING MOLECULAR BIOLOGY IN HIGH SCHOOL USING BIOINFORMATICS

Autores: Helena Lott Costa,Raquel Melo Minardi,Diego César Batista Mariano
Apresentador: Helena Lott Costa • helenalottc@gmail.com
Resumo:
The advancement of technology is revolutionizing the way education is delivered in high schools. Recognizing this, the Brazilian government enacted Law No. 13.415/2017, which restructured high school education by introducing electives and new approaches to learning. One possible area of focus is bioinformatics, a field that requires professionals skilled in both computing and biological sciences. However, teaching bioinformatics to young students is a significant challenge. To address this, the integration of computing logic into high school curricula emerges as a promising solution. By aligning with the goals of the new high school model, this approach helps bridge the gap between students and the demands of the modern labor market. In this context, a programming course focused on bioinformatics was designed specifically for high school students. The pilot project, launched in 2024 in Belo Horizonte, Brazil, aimed to combine programming and molecular biology. The course employed teaching methodologies, such as Inquiry-Based Learning (IBL) and gamification, to engage students effectively. Activities were structured into quarterly stages, where students learned programming through Scratch and applied their skills to molecular biology projects,
including DNA transcription simulations. The initiative proved to be highly successful, motivating students and enhancing their understanding of molecular biology concepts. It also demonstrated the potential of interdisciplinary education to make complex subjects more accessible and engaging. Looking ahead, expanding this project to other schools is essential to assess its broader educational impact. By scaling the methodology, more students can benefit from this innovative approach, fostering critical thinking, problem-solving skills, and interest in STEM fields. Additionally, partnerships with educational institutions and technology companies could enhance the course's resources and reach, further preparing
students for future academic and professional challenges. This initiative highlights the importance of combining technology and education to equip students with the skills needed in an increasingly digital world.

Acknowledgment: The authors would like to thank the research funding agencies Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG), and Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq). This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior Brasil (CAPES) - Finance Code 001.
Palavras-chave: High School, Bioinformatics, Inquiry-Based Learning (IBL), Gamification, Scratch.
#1065884

MUTATIONS IN THE CFTR GENE: FUNCTIONALITY, THERAPIES, AND GLOBAL REPRESENTATIVENESS

Autores: Bárbara Zuccolotto Schneider G. Parreira,Alisson Clementino da Silva,Ronison Alves Guimarães,Joicymara Santos Xavier
Apresentador: Bárbara Zuccolotto Schneider G. Parreira • barbarazuccolotto@gmail.com
Resumo:
Cystic fibrosis (CF) is a genetic disease caused by mutations in the CFTR (Cystic Fibrosis Transmembrane Regulator) gene, which impair the functionality of the transmembrane conductance regulator protein. These mutations represent significant challenges in the development of effective therapies due to variability in their functional, structural, and thermodynamic characteristics, as well as differences in responsiveness to available CFTR modulators. This systematic review consolidated data on CFTR gene mutations, analyzing their functional and structural impacts, changes in protein stability and molecular dynamics, as well as the efficacy of therapies such as potentiators, correctors, and therapeutic combinations. A total of 2,366 articles published up to December 2024 were screened from databases such as PubMed, Scopus, Embase, Cochrane Library, BVS, and Web of Science, using specific keywords like "rare mutations," "CFTR gene," "functional analysis," "thermodynamic characteristics," and "CFTR modulators." The methodology followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, with data analysis conducted using tools like Rayyan and VosViewer. The studies were categorized based on experimental techniques, functional parameters, and responses to CFTR modulators. Conflicts in study selection were resolved by consensus between the lead author and two reviewers. The results identified new mutations potentially responsive to CFTR modulators, as well as gaps in the literature related to the investigation of rare mutations from underrepresented regions. Additionally, approaches integrating molecular modeling and machine learning showed potential to overcome limitations of traditional experimental studies. This review contributes to the identification and analysis of mutations, fostering the development of personalized therapies and promoting greater equity in the investigation and access to treatments for underserved populations.

Acknowledgements: The authors would like to thank the research funding agency FAPEMIG (Fundação de Amparo à Pesquisa do Estado de Minas Gerais).
Palavras-chave: rare mutations, CFTR gene, cystic fibrosis, CFTR modulators, protein stability
#1065906

Heat-Stressed Broilers Benefit from Probiotic Diets: A Metagenomic Study on Gut Microbiota and Production Efficiency

Autores: Thayra Gomes dos Santos,Hassan Martins Magalhães,Ana Clara De Araújo Teles,Saulo Reis Nery Santos,Milca da Silva Figueiredo Siqueira,Bertram Brenig,Sandeep Tiwari,Vasco A de C Azevedo,Rodrigo Dias de Oliveira Carvalho
Apresentador: Thayra Gomes dos Santos • thayra.gomes@hotmail.com
Resumo:
Brazil is a global leader in agribusiness, ranking as the largest chicken meat exporter and third-largest producer. However, poultry farming faces multiple challenges. Reducing antibiotic use is essential to combat bacterial resistance. Additionally, global warming impacts, such as heat stress in broilers, damage health and lead to production losses. The poultry industry is exploring nutritional alternatives like probiotics. A recent study investigated four probiotic strains, Escherichia sp., Lactobacillus sp., Lactococcus sp. e Saccharomyces sp., on Cobb-500 broilers under heat stress, comparing results with a commercial probiotic and the antibiotic bacitracin. It showed that chickens on diets with probiotics had higher slaughter weights and better feed conversion. However, the impact on gut microbiota remains unexplored. This study aimed to understand the influence of probiotics on the gut microbiota of broilers under heat stress, using metagenomic analysis to provide insights into the regulation of inflammatory processes and energy metabolism and possibly identify biomarkers for screening chicken health. The study performed in silico analyses of fecal DNA sequences obtained by shotgun paired-end sequencing. The samples were divided as: T1 - basal diet (BD); T2 - BD with a commercial probiotic of Bacillus subtilis DSM17299; T3 -BD with zinc bacitracin; T4-T7 - BD supplemented, respectively, with next-generation probiotics L. lactis NCDO 2118, L. delbrueckii 327, E. coli CEC 15, or S. boulardii; and T8 - BD supplemented with a blend of the four next-generation probiotics (T4 to T7). Our findings revealed significant variations in microbiome diversity among the study samples. Specifically, broilers supplemented with probiotics demonstrated higher microbiome diversity compared to a diet containing bacitracin or the commercial probiotic Bacillus subtillis. Moreover, the treatment with bacitracin reduced the population of several bacterial genera, highlighting Ligilactobacillus and Romboutsia while the treatment with the probiotic blend increased those taxons, exhibiting the most favorable outcomes in terms of microbiome diversity and function. In conclusion, our findings suggest that probiotic supplementation enhances microbiome diversity more effectively than bacitracin or commercial probiotics, potentially offering more efficient and sustainable strategies for poultry farming. However, further studies are required for testing this hypothesis.
The authors would like to thank the research funding agency FAPEMIG (Fundação de Amparo à Pesquisa do Estado de Minas Gerais), for their support.
Palavras-chave: Intestinal health, Functional nutrition, Sustainable poultry
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1065913

PAN-VACCINOMICS: DEVELOPING A COMMON MULTIVALENT VACCINE AGAINST MULTIPLE BACTERIAL SEXUALLY TRANSMITTED INFECTIONS

Autores: Lucas Gabriel Gomes,Sandeep Tiwari,Arun Kumar Jaiswal,Vasco A de C Azevedo
Apresentador: Lucas Gabriel Gomes • lucasgabriel388@gmail.com
Resumo:
The term Sexually Transmitted Infections describes several infectious diseases transmitted mainly through sexual contact and which can be caused by a variety of etiological agents, including viruses, bacteria, protozoans, and fungi. Of those, eight are responsible for the expressive number of cases: HIV, HPV, Herpes, and Hepatitis B, four viral infections, and Treponema pallidum, Neisseria gonorrhoeae, Chlamydia trachomatis, and Trichomonas vaginalis, four bacterial and protozoan infections. They represent a significant drain on global healthcare systems, with the World Health Organization (WHO) estimating a million new cases of curable STIs per day. Despite extensive efforts to control and eradicate STIs, they continue to be endemic in the developing world and have been a rising threat even in the developed world. Due to the unavailability of an effective vaccine for most of them, prophylactic efforts to control sexually transmitted infections (STIs) rely on incentives for sexual education and condom use. These measures have limited success in vulnerable populations. Bioinformatics has become an important tool considering the need to develop effective vaccines for combating STIs. This study aimed to develop a new multivalent vaccine by analyzing the genomes of the main bacterial STIs, which would be capable of simultaneously protecting against multiple infections. The Reverse Vaccinology approach was used to identify vaccine candidates within the shared proteome of the STIs. From this shared proteome, 4 proteins were identified as putative vaccine candidates, while 87 other proteins could be identified as putative drug targets for subsequent virtual drug screening. Subsequently, it was possible to identify the immunogenic epitopes within these vaccine candidates to construct a new multi-epitope vaccine.

Acknowledgements: Fundação de amparo à pesquisa do estado de Minas Gerais (FAPEMIG)
Palavras-chave: Sexually Transmitted Infections, immunoinformatics, reverse vaccinology
#1065915

DEVELOPMENT OF A MACHINE LEARNING BASED WORKFLOW FOR IDENTIFICATION OF ANTIMICROBIAL PEPTIDES SEQUENCES IN GENOMIC DATA

Autores: Madson Allan de Luna Aragão,Rafael Lucas da Silva,João Pacifico Bezerra Neto,Carlos André dos Santos Silva,Ana Maria Benko-Iseppon
Apresentador: Madson Allan de Luna Aragão • madsondeluna@gmail.com
Resumo:
Antimicrobial resistance represents a major global public health challenge, driving the need for innovative therapeutics to combat drug-resistant pathogens. Antimicrobial peptides (AMPs) are a promising class of candidates due to their broad-spectrum efficacy, structural diversity, and lower tendency to induce resistance. However, discovering AMPs within genomic data can be challenging, as their physicochemical properties often overlap with those of non-AMP biomolecules. Here, we introduce a Python-based pipeline integrating supervised machine learning (ML) to identify and classify AMP sequences. A dataset of 8,736 non-redundant protein sequences was retrieved from UniProt, restricted to Swiss-Prot entries to ensure quality. AMP sequences were identified based on known antimicrobial functions, while negative (non-AMP) sequences were selected from similar proteins lacking reported antimicrobial activity. An 80:20 train–test split (n=6,989 and n=1,747, respectively) was applied. Eighty-one physicochemical descriptors were calculated using the Peptides.py library and refined via statistical correlation-based feature selection. Six supervised ML algorithms were built using scikit-learn: Support Vector Machines (SVM), K-Nearest Neighbors (KNN), Random Forest (RF), Decision Trees (DT), Naive Bayes (NB), and Artificial Neural Networks (ANN). Each model underwent hyperparameter tuning and was evaluated using accuracy, precision, recall, F1-score, AUC-ROC, confusion matrices, and Matthews Correlation Coefficient (MCC). Correlation analyses among descriptors revealed expected relationships, such as molecular weight and sequence length. The inverse relationship between hydrophobicity and the Boman index indicated that excessively hydrophobic peptides might not be optimal for pathogenic membrane binding. Feature importance analyses varied notably among models. DT assigned significant importance (1.0) to molecular weight, while KNN showed less sensitivity (0.16) to this descriptor, relying on distance-based measures. SVM demonstrated moderate sensitivity (0.33), with feature importance influenced by the kernel function, but not explicitly assigned. The isoelectric point, net charge, and hydrophobicity consistently played significant roles, with their importance varying between 0.4 and 1.0 across different models, highlighting their influence on peptide-membrane interactions and relevance to the analyzed protein class. Results indicated SVM and ANN as the leading models, consistently achieving higher accuracy, precision, recall, and AUC-ROC, with all metrics ranging from 90% to 95%. SVM displayed the highest recall (90%), suggesting excellent sensitivity in detecting AMPs, while ANN demonstrated notable precision (92%), reducing false positives. DT exhibited lower AUC-ROC (82%) and MCC (61%) compared to other models, suggesting a higher likelihood of overfitting due to complex classification boundaries. NB also showed reduced MCC (63%), potentially due to its assumption of feature independence. On the other hand, SVM and ANN showed the best values for AUC-ROC (95%) and MCC (SVM = 79%, ANN = 80%). This workflow provides a simplified, efficient platform for accurate AMP detection in genomic data, allowing users to choose the most suitable model, with emphasis on SVM and ANN. By facilitating new AMP identification, this pipeline plays a significant role in discovering innovative antimicrobial agents, including those with pharmaceutical potential. The authors acknowledge the support of FAPEMIG (Fundação de Amparo à Pesquisa do Estado de Minas Gerais), UFMG (Universidade Federal de Minas Gerais), UFPE (Universidade Federal de Pernambuco) and LNCC (Laboratório Nacional de Computação Científica).
Palavras-chave: Pipeline, AMP, ML, SVM, KNN, ANN, Omics data, Hydrophobicity
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1065926

A GENERATIVE DIFFUSION PROTEIN MODEL FOR THERMOSTABLE ENZYMES USED IN BIOFUEL PRODUCTION

Autores: Adenilson Arcanjo de Moura Junior,Diego César Batista Mariano,Raquel Minardi
Apresentador: Adenilson Arcanjo de Moura Junior • arcanjomjr@gmail.com
Resumo:
Generative protein sequence models have emerged as a powerful tool in the field of
computational biology, offering significant potential for designing novel proteins that fulfill
specific functional or structural requirements. These models address a wide array of
biological challenges, including enzyme engineering, drug design, and the development of
synthetic proteins for industrial and therapeutic applications. In recent years, diffusion
models have gained attention as a groundbreaking deep learning approach for data
generation. These models, inspired by principles of non-equilibrium thermodynamics, have
demonstrated state-of-the-art performance in creating high-quality images and videos.
Prominent examples include models like MidJourney, DALL-E 3, and Sora, which are
capable of producing visually stunning and contextually accurate outputs. The success of
diffusion models in image and video synthesis has sparked interest in their application to
other domains, including the generation of biological data. In this work, we implemented a
generative diffusion model specifically tailored for designing protein sequences of
ß-glucosidase proteins. ß-glucosidases are essential enzymes used in second-generation
biofuel production. Generating new beta-glucosidase enzyme sequences capable of operating
at high temperatures has great potential for industrial applications. The generated protein
sequences were evaluated through in silico analysis.. These evaluations focused on assessing
key biochemical and structural properties, such as thermostability, and folding characteristics,
to determine how closely the generated proteins resemble their natural counterparts and to
select potential proteins that exhibit increased tolerance to heat. Our results indicate that the
proteins generated by the model exhibit biochemical and structural features comparable to
those of naturally occurring ß-glucosidases, with several examples demonstrating increased
thermostability. This highlights not only the ability of diffusion models to produce realistic
protein sequences but also their potential to optimize specific properties, such as heat
tolerance, within a given protein family. These findings underscore the potential of diffusion
models as powerful tools for accelerating advancements in protein engineering and industrial
biotechnology.

Acknowledgements: The authors would like to thank the research funding agency
FAPEMIG (Fundação de Amparo à Pesquisa do Estado de Minas Gerais).
Palavras-chave: Diffusion model, Protein Language Model, Biofuel
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1065930

ANALYSIS OF CONSERVED RESIDUES RELATED TO NUCLEOTIDE SPECIFICITY IN NUCLEOTIDYL TRANSFERASES

Autores: Angie Lissette Atoche Puelles,Aline Mendes da Rocha,Victor Mendonça de Rezende Fabri,Raquel Melo Minardi
Apresentador: Angie Lissette Atoche Puelles • angie.puelles@gmail.com
Resumo:
Nucleotidyl transferases (NTases) are enzymes responsible for transferring monophosphate nucleotides to specific substrates, playing essential roles in diverse metabolic pathways. This study utilized structural bioinformatics approaches to investigate conserved residues in NTases, focusing on identifying positions that may influence nucleotide specificity in different families of this enzyme superfamily, such as adenosine triphosphate (ATP) and cytidine triphosphate (CTP). Protein sequences
were retrieved from public databases such as PFAM and UniProt, considering three main families: PF00483, PF01909, and PF02348. Functional annotations and literature reviews were used to classify proteins by their ligands. Redundant sequences were removed, and multiple sequence alignments were performed to analyze conservation patterns within and across families. Residue conservation analysis was conducted using PFStats, employing entropy and stereochemistry metrics to identify highly
conserved positions. Conservation scores were calculated for residues within specific protein groups, and all sequences were analyzed. These scores prioritized residues based on their relevance to nucleotide specificity. The process involved identifying residues highly conserved within a group and comparing them to conservation patterns across all groups. The output of this workflow was a set of residue positions potentially linked to ligand specificity. This approach identified 12 potentially ATP-specific residues, 10 of which were exclusive. For CTP, 37 residues were identified in the PF00483 group without exclusivity, while 48 unique residues were found in the PF02348 group. However, distinguishing ATP- and CTP-binding proteins based on conserved residues remains an open question. The results underline challenges associated with structural variability and ligand promiscuity in NTases but also suggest that some conserved residues near binding sites may play important roles in enzymatic function. Future studies will focus on refining sequence alignments and integrating structural data to enhance the identification of specificity-determining residues.
Acknowledgments: The authors would like to thank the research funding agency FAPEMIG (Fundação de Amparo à Pesquisa do Estado de Minas Gerais).
Palavras-chave: nucleotidyl transferases, conservation analysis, ligand specificity and structural bioinformatics.
#1065933

COMPENSATORY MUTATIONS AND EPISTASIS: A REVIEW OF ADVANCES AND PERSPECTIVES FOR PREDICTION

Autores: Alisson Clementino da Silva,Bárbara Zuccolotto Schneider G. Parreira,Ronison Alves Guimarães,Joicymara Santos Xavier
Apresentador: Alisson Clementino da Silva • aliclementesilva@gmail.com
Resumo:
Predicting mutations that impact protein function and fitness is a cornerstone of modern biotechnology, therapeutic development, and protein engineering. Recent advancements in Artificial Intelligence (AI)-driven technologies have led to the creation of numerous methods for assessing the effects of mutations on protein stability—a key property influencing structure, function, solubility, and expression. In this context, the prediction of compensatory mutations—mutations that restore functional phenotypes—holds promise for advancing genetic therapies and biotechnological innovations. These mutations are crucial for understanding evolutionary adaptation and drug resistance. However, the complexity of epistatic interactions, which depend on relationships between different positions within a protein sequence, poses significant challenges for accurate predictions. This systematic review explored molecular parameters associated with protein mutations that can be integrated into models to better understand epistasis and predict compensatory mutations. By screening studies published up to December 2024 using keywords such as "compensatory mutation," "protein," and "machine learning," a total of 230 articles were identified from PubMed, the primary repository for relevant literature. The methodology adhered to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. The findings highlighted the importance of parameters such as binding affinity, structural features, epistatic couplings, and evolutionary data in predicting compensatory mutations. Despite these insights, the high computational cost of capturing epistatic interactions remains a significant hurdle. Promising advancements include the integration of deep learning with evolutionary and thermodynamic data, which could significantly enhance prediction models. Additionally, combining high-throughput mutational scanning with ancestral protein reconstruction techniques offers potential for more precise modeling of epistasis and improved compensatory mutation predictions. Collaborative efforts between computational biology, machine learning, and data science experts have been pivotal, underscoring the multidisciplinary nature of progress in this field.


Acknowledgements: The authors would like to thank the research funding agency FAPEMIG (Fundação de Amparo à Pesquisa do Estado de Minas Gerais).
Palavras-chave: compensatory mutation, epistasis, mutation effects on protein, machine learning
#1065972

Leptin reduction as a modulatory factor of the immune system during aging

Autores: Juan Miranda Peixoto,Polyana Mota de Carvalho Baptista,Eloisa Helena Medeiros Cunha,Felipe Caixeta Moreira,Monique Macedo Coelho,Ana Caetano Faria,Tatiani Uceli Maioli
Apresentador: Juan Miranda Peixoto • juanmpeixotoofc@hotmail.com
Resumo:
Healthy aging is a critical challenge for modern societies, necessitating a deeper understanding of its biological and immunological underpinnings. Aging is characterized by "inflammaging," a state of chronic low-grade inflammation driven by shifts in adipokine and cytokine profiles. This study focuses on leptin, an adipokine with metabolic and immunomodulatory functions, analyzing its interaction with inflammatory mediators and body composition across three age groups: 40–60 years (n=12), 61–75 years (n=15), and over 75 years (n=14). Data analysis was conducted in R, leveraging an array of packages for preprocessing, visualization, and statistical testing. Key packages included dplyr and tidyr (data manipulation), corrplot (correlation analysis), ggplot2 (data visualization), rstatix and pwr (statistical tests and power analysis). The integration of ELISA and Luminex datasets into a unified R-based pipeline enabled the comprehensive exploration of correlations between 27 inflammatory mediators, leptin, and body composition markers. Correlation matrices, Spearman tests, and dendrograms revealed significant associations, while Kruskal-Wallis and ANOVA tests validated group differences. Visualization tools such as dot plots and bar plots effectively highlighted these findings. The results demonstrated a marked decline in leptin levels in individuals over 75 years, independent of changes in body composition and biochemistry exams, suggesting a compensatory role for leptin in aging-related inflammation. Additionally, distinct immune signatures, such as elevated CCL3 and reduced CCL5 levels in the oldest cohort, underscore the interplay between metabolism and immune regulation. R proved indispensable for managing and analyzing complex datasets, offering reproducibility, scalability, and powerful visualization tools. Its versatility facilitated the integration of bioinformatics with statistical methods, advancing our understanding of age-related immune and metabolic dynamics. These insights pave the way for biomarker discovery and targeted therapeutic strategies.
Palavras-chave: Leptin, healthy aging, elderly.
#1065990

IDENTIFICATION OF GENE CONVERSION EVENTS IN HORSE IMMUNOGLOBULIN REPERTOIRE

Autores: JULIANA EDELVACY LIMA PINTO,JOÃO HENRIQUE DINIZ BRANDÃO GERVÁSIO,CARLENA TAHINA NAVAS,Joseph Chi-fung Ng,Taciana Conceição Manso,Liza Figueiredo Felicori
Apresentador: JULIANA EDELVACY LIMA PINTO • juliana_lima8@hotmail.com
Resumo:
The humoral immune response is critical for protecting the body against pathogens and establishing immunological memory after vaccination. This response depends on a diverse antibody repertoire generated through somatic recombination of immunoglobulin (Ig) gene segments in B lymphocytes. Antibodies typically consist of two light (L) chains and two heavy (H) chains linked by disulfide bridges, each containing conserved constant (C) and variable (V) regions. The variable regions are responsible for specific antigen interactions. In addition to V(D)J recombination, other mechanisms such as somatic hypermutation, gene conversion (GC), and class switch recombination further enhance antibody diversity, with the enzyme activation-induced cytidine deaminase (AID) playing a central role. Among these processes, GC is one of the least studied. Initially described in chicken light chains, which have only one V and one J gene, thus having limited diversity through V(D)J recombination. GC compensates for the limited diversity provided by V(D)J recombination, through the introduction of segments from 25 pseudogenes (PGs) into functional (FN) genes, creating a more diverse repertoire. However, despite being well characterized in chickens and rabbits, GC remains poorly characterized in other species, such as horses, which are particularly important in Brazil for the production of antibodies used in serum therapy. In horses, more than 80% of the immunoglobulin heavy variable chain (IGHV) repertoire derives from only three gene segments, raising the hypothesis that PGs could contribute to diversity through GC. This study aimed to identify GC events in the IGHV repertoire of horses. For this, Ig sequencing data from eight horses, obtained using the Illumina platform, were processed using the pRESTO toolkit and annotated through IMGT/High-V-QUEST. GC events were identified using a modified version of BrepConvert to improve output resolution. The results revealed that most GC events occurred at the end of the complementarity-determining region 2 (CDR2) and beginning of framework region 3 (FR3), suggesting a preferential region for these events, which is different from what was observed in chickens, that showed preference for the end of complementarity-determining region 3 (CDR3). This was further confirmed by analyzing a pair of genes, IGHV4-21 (FN) and IGHV3-46 (PG), where GC events were predominantly located between nucleotides 190 and 300. Most of GC events showed an size of 3-4 nucleotides and were located near to an AID motif. The PGs most used were IGHV3-46, IGHV4-53 and IGHV4-25 and the FN genes that showed the highest count of CG events were IGHV4-21, IGHV4-22 and IGHV4-29. Interestingly, the most used FN genes were the three gene segments previously described as composing more than 80% of horse IGHV repertoire. These findings suggest that GC events significantly contribute to the diversity of the equine antibody repertoire by targeting specific regions within IGHV. Therefore, this study contributes to a deeper understanding of the dynamics of GC in horses, highlighting the complexity of antibody diversification and providing a basis for future studies on the equine immune response and its implications for defense against pathogens.
Acknowledgements: Fundação de amparo à pesquisa do estado de Minas Gerais (FAPEMIG).
Palavras-chave: Gene Conversion, Immunoglobulin, Antibody Repertoire, Diversity, Pseudogenes.
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1065993

ANCILOSTOMIASIS: A BIOINFORMATIC APPROACH TO UNDERSTANDING ACEES-2

Autores: Letícia Gomes de Pontes,Estela Mariana Guimarães Lourenço,Ryan Fernandes Vieira De Souza,J. Miguel Ortega
Apresentador: Letícia Gomes de Pontes • leticiapontesproteomica@gmail.com
Resumo:
Ancylostomiasis, caused by parasitic hookworms, poses a significant global health burden, particularly in developing countries. Ancylostoma ceylanicum, infecting humans and animals, is a valuable model for studying disease development. AceES-2, a highly immunogenic protein secreted by adult worms, has been the focus of our investigation. The high-resolution crystal structure of AceES-2 reveals a fold similar to tissue inhibitors of metalloproteinases (TIMPs) and complement factors C3 and C5, suggesting a crucial role in parasite-host interactions. In this study, we employed a bioinformatics approach to elucidate the mechanism of action of AceES-2 and investigate its evolutionary relationships. Through conserved domain analysis, multiple sequence alignment (using MEGA and the Muscle algorithm), phylogenetic tree construction, and molecular modeling (Swiss-Model and PyMOL), we identified conserved and variable regions in the AceES-2 sequence, constructed a 3D structural model, and analyzed the physicochemical properties of residues, providing insights into the function and evolution of this protein. Based on the eight sequences available in NCBI, bioinformatic analysis of the AceES-2 protein (approximately 106 amino acids) revealed four conserved domains in all AceES-2 sequences, suggesting an essential function for the protein. Using the Muscle algorithm, multiple sequence alignment allowed the identification of two highly conserved positions, which may be related to enzymatic function. The constructed phylogenetic tree indicated patterns of evolutionary relationship between different species, suggesting that the AceES-2 protein plays an important role in enzymatic function. The 3D model of the AceES-2 protein, generated by the Swiss Model, revealed a globular structure with a potential active site located on the surface of the enzyme. Analysis of the protein residues showed a distribution of polar residues, and nonpolar residues with few hydrogen bonds, suggesting that the protein may interact with oligoproteins. It is important to note that this study has some limitations. The small number of sequences available in NCBI may limit the generalization of the results. Additionally, the 3D model generated by Swiss-Model is a prediction and should be experimentally validated. In summary, this study contributes to the understanding of the structure and function of the AceES-2 protein, an important target for the development of new drugs against ancylostomiasis. The results obtained provide a solid basis for future experimental studies.
Palavras-chave: bioinformatics, protein structure, enzymatic function, molecular modeling
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1065994

PHENOTYPIC AND GENOMIC CHARACTERIZATION OF VIRULENCE OF Staphylococcus chromogenes ISOLATES FROM CAPRINE MASTITIS

Autores: Gabryel Bernardo Vieira de Lima,Jônathas Moreno Silva de Souza,Diego Lucas Neres Rodrigues,Juan Carlos Ariute,Bertram Brenig,Vasco A de C Azevedo,Mateus Matiuzzi da Costa,Flávia Figueira Aburjaile
Apresentador: Gabryel Bernardo Vieira de Lima • gb.gabryelbernardo@gmail.com
Resumo:
The dairy industry is one of the most relevant in the international market. The use of milk is very extensive and versatile, it is also known that much of the world consumes it daily. In this field Brazilian production is not very large, carrying only 7% of the total global production, because the production is mainly used for consumption in the country. Even though it is not on the international level, this production is still on a large scale causing different kinds of problems, one of the most important is the bacterial mastitis that can affect the animal and consequently their lactation, causing millions in monetary loss. This scenario is focused on the cow farming industry, in other dairy fields these values are variable, because there is a high variation in the species of bacteria that affect the different types of animals. In goat milk production most of the treatments are complicated for the farmers, considering that most of them do these for subsistence, so the more virulent the bacteria, the more complicated the treatment. Therefore, the goal of this work was to identifying the phenotypic and genotypic virulence characteristics linked to of nine new isolates of S. chromogenes from caprine mastitis sequenced in Hiseq 2500 with library the 2x150 bp, compared to all genomes of the specie deposited at NCBI database. The virulence evaluation genomically was made with the software PanViTa v1.1.5 with VFDB (Virulence Factor DataBase) and the qualitative analyses was made using the methodology of Congo Red. In the phenotypic characterization the bacteria followed the normal for the group, that is the production of biofilm in all nine isolates. For the genotypic evaluation, interestingly, none of the isolates or the public genomes showed any gene related to biofilm production. The isolates had three core genes related to virulence, the genes clpP, lip and tufA, their mechanisms are linked to stress survival, exoenzyme and adherence respectively, and the gene cap8O that is a core gene for all genomes of this study, are responsible for immune modulation. This discrepancy result could be easily explained for the electrical charges, the Congo Red is anionic and normally adheres to the extracellular matrix in the biofilm production, but the cap8O codes a type 8 capsule that is produced in stress it is also negatively charged, this could end up as a target for the congo dye and thus give a positive biofilm result. Furthermore, these isolates do not represent a major pathogenic bacterium for goat milk production considering its low phenotypic virulence and not so diverse genotypic aparat.

Acknowledgements: The authors would like to thank the research funding agency
FAPEMIG (Fundação de amparo à pesquisa do estado de Minas Gerais).
Palavras-chave: biofilm; bioinformatics; genes; pathogenicity.
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1066007

DESIGNING ESTERASE ENZYMES FOR EFFICIENT PYRETHROID BIODEGRADATION: A STRUCTURE-BASED COMPUTATIONAL STRATEGY

Autores: Camila Akemi Oliveira Yamada,Mariana Quezado
Apresentador: Camila Akemi Oliveira Yamada • camila_akemi@outlook.com
Resumo:
Pyrethroids, synthetic pesticides derived from the natural compound pyrethrin, are often used
to control agricultural pests and disease vectors. However, their persistence in the
environment has raised ecological and health concerns as they disrupt the nitrogen cycle,
harm local flora, and are toxic to aquatic organisms. In addition, prolonged human exposure
has been linked to endocrine disruption and carcinogenesis. Despite these concerns, there are
still few effective methods for degrading pyrethroids. To address this challenge,
structure-based computational approaches have become essential for rationally designing
molecules, allowing for precise control over their structural and functional properties. In this
study, we use deep learning-based protein design methods (ProteinMPNN and LigandMPNN)
to develop new pyrethroid-hydrolyzing enzymes. Pyrethroid-hydrolases belong to the
a/ß-hydrolase (ABH) fold superfamily, characterized by a core domain composed of parallel
ß-strands surrounded by a-helices and a catalytic triad of serine, histidine, and
aspartate/asparagine. Structural and functional variation of this protein superfamily is
typically dependent on the lid domain. Due to their classical function of hydrolyzing ester and
peptide bonds, it is hard to classify each protein and separate them in their catalytic role.
Furthermore, despite being very similar, ABH family members often share low sequence
identity. Using a known enzyme structure that degrades pyrethroids (PDB ID: 5Y57), we
generated new protein sequences and investigated structural changes. For instance, changes at
the entrance of the active site could increase the specificity for small pyrethroids. Future work
will include docking experiments with cypermethrin and deltamethrin, two widely used
pyrethroids in Brazil, to evaluate how these structural changes affect binding affinity and
substrate specificity. As a proof-of-concept, we will evaluate and characterize their affinity for
their targets using circular dichroism and nuclear magnetic resonance (NMR) spectroscopy.
This approach holds promise for developing bioremediation strategies to mitigate the effects
of pyrethroid contamination on the environment.

Acknowledgments: The authors would like to thank the research funding agency FAPEMIG
(Fundação de Amparo à Pesquisa do Estado de Minas Gerais) and CNPq (Conselho Nacional
de Desenvolvimento Científico e Tecnológico)
Palavras-chave: pyrethroid degradation, bioremediation, protein design
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1066008

COMPARATIVE GENOMICS ANALYSIS IN THE STUDY ORNITHORHYNCHUS ANATINUS GENE EVOLUTION

Autores: Ryan Fernandes Vieira De Souza,Tetsu Sakamoto,José Miguel Ortega
Apresentador: Ryan Fernandes Vieira De Souza • ryanfernandes44@gmail.com
Resumo:
The platypus (Ornithorhynchus anatinus) is an oviparous mammal of the order Monotremata, within the clade Dipnotetrapodomorpha, which includes lungfish, amphibians, reptiles, birds and mammals. It combines unique features such as a duck's beak, egg laying, milk production and a body covered in hair characteristic of mammals. The analysis of its proteome, compared to other vertebrates, allows us to explore the evolutionary processes of these characteristics and investigate the evolutionary proximity of its genes to other lineages. This study seeks to explore the comparative genomics of the platypus by obtaining and analyzing the complete proteomes of Dipnotetrapodomorpha organisms. A total of 17,390 complete platypus amino acid sequences were obtained from the Reference Proteomes division of UniProt, along with 528 complete proteomes from the Dipnotetrapodomorpha clade, totaling 8,346,822 complete amino acid sequences. These proteomes were used to format a database with taxonomic coding using Diamond, not containing the platypus proteome, which was aligned against the database using BLASTtp with the --top 10 function. The clustering of orthologous genes was carried out using Diamond's Cluster tool with 30% coverage and 50% identity, which identifies centroids and groups the input sequences. The very-sensitive option was used to increase sensitivity and include hits with an identity below 40% and short alignments. Least Common Ancestor (LCA) analyses were performed, and the results were cross-referenced with the laboratory's Taxallnomy table to retrieve taxonomic information. This table has names for all taxa, interpolating some that do not exist in the NCBI Taxonomy, such as Cla_of_Testutinata, for the turtle class, which does not exist in the NCBI. When the clades grouped by Diamond did not include mammalian sequences, the platypus sequences were selected for analysis. Phylogenetic analysis with the TaxOnTree tool and functional enrichment via Gene Ontology were carried out manually. As a result, the 17,390 platypus genes were analyzed for the occurrence of organisms grouped with the top 10 function; 4,407 genes showed hits with LCA in the Mammalia class, which would be natural, followed by 113 hits with LCA in Amphibia, 60 in “Cla_of_Testudinata”, 17 in Aves, 10 in Lepidosauria and 2 in “Cla_of_Crocodylia”. Some of these genes, which did not have mammalian sequences in the top 10, were analyzed using phylogenetic reconstruction. Manual analysis of the trees confirmed the results obtained by Diamond. To identify the grouping of paralogs, Diamond's cluster function was run. The cluster with the highest number of platypus sequences (181), with LCA in Theria, is associated with the activity of the pheromone receptor [GO:0016503]. In addition, two clusters related to eggshell structure and formation were identified ([GO:0035804]; [GO:0035805]), containing four and one platypus sequence respectively, both with LCA in Amniota. The manually curated phylogenies of the platypus genes in these clusters confirm that the “Cla_of_Testudinata” group is the closest sister taxon to Mammalia, indicating that turtle species share the most recent common ancestor with mammals. Thus, with regard to genes related to egg formation, the platypus should be considered to have characteristics closer to turtles than to birds.
Palavras-chave: Genômica Comparativa, Evolução de Genes, Filogenia
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1066014

IDENTIFICATION OF PROTEIN-ENCODING GENES OF BIOTECHNOLOGICAL INTEREST IN BOVINE RUMEN METAGENOMES

Autores: Emerson Willian Danzer,Gabriel da Rocha Fernandes
Apresentador: Emerson Willian Danzer • emersonwdanzer@gmail.com
Resumo:
Metagenomics enables the analysis of genetic material from microbial communities without the need for laboratory cultivation, which is essential for complex environments such as the bovine rumen, where many microorganisms cannot be traditionally cultured. The rumen, a digestive compartment in cattle, harbors a microbiome capable of breaking down plant fibers into volatile fatty acids essential for nutrition. Exploring the rumen through metagenomics reveals genes with crucial functions for animal nutrition and/or biotechnological applications. In this study, 23 bovine rumen metagenome samples from the EMBL-EBI database (ERP112000) were analyzed to identify protein-coding genes of biotechnological interest. The samples were filtered using the Trimmomatic software and aligned against the Bos taurus and Zea mays genomes using STAR to remove contaminant reads. Contigs were assembled with MetaSPAdes, discarding scaffolds shorter than 1,000 base pairs. Genes were predicted using Prodigal and MetaGeneMark. A custom script generated consensus sequences of the predicted genes, which were clustered with MMseqs2 into non-redundant sequences with at least 90% identity and 80% coverage. Functional classification of the genes was performed using DIAMOND against the UniRef90 database, followed by analysis with UniProt and InterPro APIs, performing cross-references with various databases. The genetic context of the reads was evaluated using Salmon, with the non-redundant sequences as a reference. Using IDs from multiple databases, we identified ECs related to cellulose, hemicellulose, and lignin degradation pathways, which have industrial applications in second-generation ethanol production. The initial filtering reduced, on average, 13.57% of reads per sample. Mapping against Zea mays and Bos taurus resulted in average read losses of 0.0059% and 0.058%, respectively, with a cumulative loss of 0.064% during this step. After contiguous sequence assembly, the 1,000 base pair filter was applied, yielding an average of 70,202 contigs per sample, ranging from 26,767 to 92,647 assembled sequences. Gene prediction identified an average of 230,920 genes with Prodigal and 241,080 with MetaGeneMark, with the latter’s higher count attributed to its less restrictive criteria for metagenomes. The custom script reduced approximately 29% of artifact genes, resulting in 3,873,399 consensus genes, which were subsequently clustered into 1,947,651 non-redundant sequences for quantification with Salmon. Comparing the non-redundant genes to the UniRef90 database yielded 1,236,795 matches, of which 138,609 corresponded to ECs associated with lignocellulose degradation pathways. This methodological approach effectively filtered noise from raw data, reduced artifacts, and generated a representative set of non-redundant genes. The detection of genes associated with lignocellulose degradation highlights the bovine rumen’s potential as a source of biocatalysts for biofuel production, contributing to advancements in rumen understanding and bioprospecting.
We thank FAPEMIG for their support.
Palavras-chave: Metagenomics, bovine rumen, biofuels, lignocelulose, genes
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1066019

HORIZONTAL GENE TRANSFER SHAPED THE EVOLUTION OF NICOTIANAMINE SYNTHASE IN PLANTS

Autores: Matheus Lewi Cruz Bonaccorsi de Campos,Helena Ferreira Gomes,Wenderson Felipe Costa Rodrigues,Felipe Klein Ricachenevsky,Joni Esrom Lima,Luiz Eduardo Vieira Del Bem
Apresentador: Matheus Lewi Cruz Bonaccorsi de Campos • matheusdrlewi@outlook.com
Resumo:
Iron (Fe) is a vital element for the physiological functions of nearly all living organisms, including plants, where it is essential for photosynthesis. To maintain iron homeostasis, plants have developed mechanisms to regulate its uptake from the soil and ensure proper distribution among tissues and organelles. Nicotianamine synthase (NAS) is a key enzyme in this process, catalyzing the synthesis of nicotianamine (NA), a critical metabolite for metal chelation and transport in fungi, bacteria, and plants. NAS plays a central role in both iron acquisition strategies in plants: in Strategy I, it facilitates iron mobilization and transport between tissues, while in Strategy II, it is required for the biosynthesis of phytosiderophores, essential for iron uptake from the soil. Investigating NAS provides insights into the evolutionary history of plants, their adaptation to diverse ecological niches, and potential applications in agriculture. Our study aimed to investigate the distribution and evolutionary history of NAS genes within the Archaeplastida supergroup. Using a comprehensive approach, we initiated our analysis with an HMMER search to identify NAS homologous proteins within a dataset encompassing complete predicted proteomes of 53 representative Archaeplastida species. Additional sequences were retrieved through BlastP searches against the NR database, expanding the dataset to over 800 candidate proteins from bacteria, non-plant eukaryotes, and viruses. Redundant and partial sequences, along with pseudogenes, were filtered out, and the remaining proteins were aligned using MAFFT v3.2.1. Phylogenetic relationships were inferred using IQTree v1.6.12, and statistical analyses were conducted in RStudio v2022.07.2. Our findings uncovered a total of 61 NAS sequences distributed among Rhodophyta and Embryophyta, while Glaucophyta, Chlorophyta, Charophyta, and some Bryophytes displayed no detectable sequences. The Archaeplastida NAS phylogenetic reconstruction revealed a generally expected topology for the group, with exceptions including absent taxa and the clustering of monocots as a sister clade to gymnosperms rather than eudicots. Furthermore, statistical analyses showed no significant variation in the frequency of these genes among the analyzed genomes, including Poaceae, where a higher frequency had been hypothesized. Notably, the inclusion of NAS sequences from non-plant organisms disrupted the phylogenetic framework, revealing distinct temporal origins for NAS in plants. Our data suggest that NAS likely originated in bacteria and first appeared in the plant lineage with seedless tracheophytes. The presence of NAS in certain Bryophyta and Rhodophyta lineages appears to have distinct origins, as they clustered with fungal taxa, distant from tracheophytes. These findings suggest multiple horizontal gene transfer events at different points in plant evolutionary history, shaping the genomic landscape of extant plant NAS. In conclusion, our study highlights the pivotal role of horizontal gene transfer in the evolution of NAS genes in plants, providing insights into the origin and evolution of iron uptake mechanisms and their impact on plant physiology.
Thanks to FAPEMIG for financial support.
Palavras-chave: Iron-transport, Nicotinamine, Plant-evolution
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1066055

COMPARATIVE GENOMICS OF AEROMONAS HYDROPHILA STRAINS FROM FISH FARMS LOCATED IN BRAZIL

Autores: Rebeca Dias Serafim Lima,Douglas Vinícius Dias Carneiro,Diego Lucas Neres Rodrigues,Siomar de Castro Soares,Bertram Brenig,Vasco A de C Azevedo,MATEUS MATIUZZI DA COSTA,Flávia Figueira Aburjaile
Apresentador: Rebeca Dias Serafim Lima • serafim.rebecadias@gmail.com
Resumo:
The Gram-negative bacterium Aeromonas hydrophila is a pathogen of concern for aquaculture, especially for fish production. This bacterium causes the so-called mobile Aeromonas septicemia in commercially important fish, generating great economic losses. A. hydrophila is known for its pathogenic capacity, being resistant to different classes of antibiotics, including the latest generation classes. Despite studies on the genomic characteristics presented by A. hydrophila, the scenario of Brazilian isolates remains little explored. Therefore, the objective of the present work was to perform a comparative genomic study of A. hydrophila strains from fish farms located in the northeastern region of Brazil. Thirty-four isolates of A. hydrophila were sequenced, among which were found in fish, such as Oreochromis niloticus, Phractocephalus hemioliopterus, Lophiosilurus alexandri, in addition to isolates of the microcrustacean Dendrocephalus brasiliensis, and from a water sample. The isolates were sequenced, and the quality of the raw data was assessed using the FastQC tool. Trimmomatic was used to remove adapters and low-quality reads from the raw data, the Unicycler tool was used to assemble the genomes, and QUAST was used to assess the quality of the assembly. Checkm2 was used to assess the presence of contamination between the sequences, and the BUSCO tool was used to assess the completeness of the genomes. Finally, the species were identified using pyani, and the annotation was performed using PROKKA. For the comparative genomics study, GIPSy was used to investigate the presence of genomic islands, and the PanViTa tool was used to identify genes associated with virulence and antibiotic resistance traits. The 34 genomes were identified as being from A. hydrophila with a minimum degree of identity above 95%. Genes associated with 10 different classes of antibiotics and genes associated with 3 different antibiotic resistance mechanisms were found, namely antibiotic target alteration, antibiotic inactivation and efflux pump. In addition, 13 genes common to the study strains associated with antibiotic resistance were also found, with 4 genes common to all genomes compared. A total of 218 genes associated with pathogenicity were found. In addition, genes related to 8 different virulence mechanisms were found among the genomes compared. Pathogenicity islands were also found in 32 genomes compared, but contrary to
expectations, no antibiotic resistance islands were found. These findings reinforce the pathogenic capacity of A. hydrophila as well as the multifactorial virulence found in this bacterium already pointed out in previous studies. In addition, the results provide relevant information regarding the presence of genes and factors associated with antibiotic resistance in different classes of Brazilian strains, and reinforce the concern regarding the genomic monitoring of A. hydrophila in Brazil and worldwide. Further studies using other Brazilian isolates of A. hydrophila are needed to assess the evolution of this pathogen over time and alert the production sector about the risks caused by the indiscriminate use of antibiotics in aquaculture.
Palavras-chave: virulence, resistance, septicemia, bacteria, aquaculture
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1066075

Investigation of pathogenicity among strains of Serratia marcescens complex

Autores: Douglas Vinícius Dias Carneiro,Diego Lucas Neres Rodrigues,Pedro Alexandre Sodrzeieski,Leandro Cardoso Ribeiro,Juan Carlos Ariute,Liliane Moreira Donato Moura,Ana Maria Benko-Iseppon,MAURICIO CLAUDIO HORTA,MATEUS MATIUZZI DA COSTA,Flávia Figueira Aburjaile
Apresentador: Douglas Vinícius Dias Carneiro • dougbuick@gmail.com
Resumo:
Bacteria of the genus Serratia are free-living Gram-negative proteobacteria belonging to the Yersiniaceae family. Aquatic environments are their main habitats, such as rivers and marshes, and they can associate with many different animals and plants. In urban environments they are commonly found in sewers and especially in hospitals, Serratia marcescens was the sixth most frequently isolated microorganism in patients with pneumonia admitted to European Intensive Care Units. Subcutaneous infections by S. marcescens have also been reported in animals such as dogs, the elderly or those with a chronic illness. In this context, the S. marcescens strain is one of the main species of nosocomial pathogens in the world. In this sense, this study aims to identify the resistance and virulence profile, using pan-genomic methods, of strains of the S. marcescens complex due to the high degree of genomic similarity between these species (S. marcescens, Serratia nematodiphila, Serratia ureilytica and Serratia nevei). Using the PANVITA tool, 103 virulence genes and 100 resistance genes were identified. The main virulence mechanisms of the complex are motility, immune modulation and effector delivery system. It was also identified that antibiotic efflux and antibiotic inactivation are the main resistance mechanisms of the complex and the standard profile of resistance mechanisms. Even though the complex's main virulence mechanism is motility, which helps it to live freely in aquatic environments, the number of efflux pump genes with a complete structural complex causes concern. The presence of different types of efflux pump complexes that provide a cover of resistance to various types of antibiotic classes shows that strains of the S. marcescens complex are of great importance in the context of infection control. Therefore, studies addressing the multidrug resistance of the complex with an approach to preventing and combating multidrug-resistant strains are essential in public health.
Palavras-chave: One Health, Serratia marcescens, antimicrobial resistance, comparative genomics.
#1066079

GENOMICS OF UROPATHOGENIC ESCHERICHIA COLI (UPEC) FROM CYSTITIS PATIENTS

Autores: Jônathas Moreno Silva de Souza,Gabryel Bernardo Vieira de Lima,Ulisses De Pádua Pereira,MATEUS MATIUZZI DA COSTA,Flávia Figueira Aburjaile,Vasco A de C Azevedo,Bertram Brenig
Apresentador: Jônathas Moreno Silva de Souza • jonathas.morenobio@gmail.com
Resumo:
Escherichia coli is a member of the Enterobacteriaceae family and is one of the main causes
of harm in dogs, including cystitis, the clinical symptoms of which can be increased
frequency of urination, pain when urinating, fever and bloody urine, which can lead to the
animal's death. Since this bacterium is a member of the ESKAPE group, isolates of which are
causing harm and death due to the bacteria's resistance to antibiotics and drugs, which are
administered to eliminate the pathogen, they have become a worldwide concern in the area of
single health. The incidence rate of diseases caused by E. coli varies according to country,
economy and host. The inappropriate and excessive use of antibiotics has promoted
resistance to them by bacteria, which adapt to avoid the presence of the chemical compounds
in their cells and consequently their elimination. The aim of this study was to understand the
resistance and virulence characteristics of uropathogenic E. coli isolated from dogs. Twenty
isolates of uropathogenic E. coli (UPEC) affected by cystitis, collected from a baseline
sample, were used. Their total DNA was then extracted and sequenced using Illumina HiSeq
2500 (2x150bp). The quality of the sequencing was analyzed with FastQc v0.12.1 and the
assembly with the Unicyler v0.4.8 tool. The quality of the assembly was assessed using Quast
v5.0.2 and the species was confirmed using JSpecies. Structural and functional annotation
was carried out using Prokka v1.14.6. PanViTa v.1.1.5 was then used to identify resistance
and virulence genes using the CARD (Comprehensive Antibiotic Resistance Database) and
VFDB (Virulence Factor Database) databases. A total of 84 resistance genes and 282
virulence genes were found. Among the central virulence genes, 20 were related, while the
accessory genes were 29 metabolic factors. In addition, 15 central and 12 accessory genes
were related to motility, and 21 central genes were characterized as adherence. The genes
related to metabolism and nutrition had the presence of fep (A, E, C, G and D) acting in the
acquisition of iron. The fim genes (H,G,F,D,C,l,A and E) act in the adherence process, aiding
in the colonization of the pathogen. Motility is associated with the flhD, flhC, fliE and fliC
genes. 48 genes are related to resistance mechanisms, such as antibiotic efflux, causing the
drug to be ineffective, such as acrA, acrB, evgS, evgA, emrY, emrB, emrA,mdtN,
Ecol_mdfA, Ecol_acrA and TolC. These genes mostly work together with TolC to release the
compound outside the bacterial cell. Finally, the virulence genes are found as accessory genes
(166), while the core genes have 76, which is a strong factor in the permanence and spread of
the bacterium, thus promoting its proliferation. The results suggest the importance of
genomic monitoring for resistant and virulent bacteria, and its essential role as an ally in
animal health. Acknowledgments: The authors would like to thank the research funding agency
FAPEMIG (Fundação de amparo à pesquisa do estado de Minas Gerais).
Palavras-chave: Enterobacteriaceae; Genome; Cystitis; Bacterial resistance; Bioinformatics.
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1066083

Genomic surveillance of Klebsiella pneumoniae isolated from dogs in Veterinary Hospitals in Brazil

Autores: Henrique Da Silva Vieira,Max Roberto Batista de Araújo,Vasco A de C Azevedo,Bertram Brenig,Flávia Figueira Aburjaile
Apresentador: Henrique Da Silva Vieira • henriquedasilvavieira3@gmail.com
Resumo:
Klebsiella pneumoniae is a nosocomial pathogen capable of infecting animals and humans,
and is of great relevance in the context of the single health approach. In May 2024, the WHO
published a list of priority pathogens due to increasing antimicrobial resistance and global
health risk, placing K. pneumoniae at the critical level of concern. This work aimed to carry
out genomic monitoring of K. pneumoniae isolated from two dogs from different veterinary
hospitals in Brazil, comparing them with 171 genomes isolated from animals on different
continents, in order to assess the spread of antimicrobial resistance, virulence factors and
molecular epidemiology. The two isolates were sequenced on HiSeq2500, assembled with
Unicycler and quality assessed with FastQC, CheckM and BUSCO. Functional annotation
was performed with Prokka, and Pan-viruloma and Pan-resistoma analyses were performed
with PanViTa. Molecular typing was carried out with pyMLST. A total of 164
resistance-related genes and 116 virulence-related genes were predicted. Isolates LBIHP4
and LBIHP7 had 55 and 54 antimicrobial resistance genes, respectively. In addition, the
resistance genes identified in silico are related to a total of 28 different antimicrobial classes
and 11 different virulence mechanisms. The antibiogram revealed resistance to ampicillin,
amoxicillin/clavulanic acid, gentamicin, sulfamethoxazole-trimethoprim, ceftriaxone,
cefotaxime, ciprofloxacin, levofloxacin, cefepime, piperacillin/tazobactam and enrofloxacin
for isolate LBIHP4, while isolate LBIHP7 showed resistance to all these antibiotics, except
for florfenicol and pipercillin/tazobactam (intermediate). Both were sensitive to florfenicol.
As for the virulence of the isolates, LBIHP4 and LBIHP7 had 80 and 99 virulence-related
genes, respectively. Most of the genes are associated with adherence, nutritional/metabolic
factor, effector delivery system, exotoxin, immune modulation and biofilm, while the
resistance genes involve efflux pump, antibiotic target alteration, reduced permeability and
antibiotic inactivation. Molecular typing revealed that the LBIHP4 genome forms an ST11
Clonal Complex (CCs) with 36 other genomes from animal isolates around the world. ST11
is recurrently identified in Brazil and represents a high risk associated with the spread of
carbapenemase-producing K. pneumoniae (KPC), including hypervirulent and
hypermucoviscous strains. While the LBIHP7 genome was identified as ST3249, which has
not yet been described in the literature for K. pneumoniae, being the first time reported in this
study. In view of this, the results reveal an alarming pattern of resistance and virulence in the
isolates, since it puts the clinical management of hospital infections at risk, with the risk of
dissemination of resistant pathogens between animals and humans.

Acknowledgements: Fundação de Amparo à Pesquisa do Estado de Minas Gerais
(FAPEMIG)
Palavras-chave: Keywords: Molecular epidemiology; Multidrug resistance; ESKAPE
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1066094

Uncovering Novel Gene Networks in Adipose Tissue Macrophages of Obese Individuals

Autores: Felipe Caixeta Moreira,Tatiani Uceli Maioli
Apresentador: Felipe Caixeta Moreira • caixeta.felipe@gmail.com
Resumo:
Obesity is a global health crisis characterized by chronic low-grade inflammation, particularly in adipose tissue. Macrophages play a crucial role in this process, undergoing significant gene expression changes. Understanding the molecular mechanisms driving these changes is critical for developing effective therapeutic strategies. This study aimed to identify key genes that modulate macrophage function in obesity using a network- based approach. Using RNA sequencing (RNA-seq) data from white adipose tissue macrophages sourced from the GEO database, we constructed co-expression networks with the WGCNA package to identify modules of highly correlated genes. Within these modules, hub genes with high connectivity were identified as potential regulators of macrophage function. Validation of these hub genes was conducted using single-cell RNA-seq data from public datasets of mouse and human macrophages to confirm their expression across species. Our analysis identified previously recognized hub genes linked to macrophage function or obesity, including Stmn1, Cadm1, Asph, Myo1e, Sms, Pianp, Ndufs5, Ftl1, and 180027I7Rik. Importantly, we also uncovered four novel hub genes Gcsh, Rcan1, Ndufb4, and Ywhaq, not previously associated with macrophage function or obesity. These novel genes demonstrated high connectivity within the co-expression network and were consistently expressed in macrophages across both human and mouse datasets. Their discovery highlights their potential role in the molecular mechanisms driving macrophage-mediated inflammation in obesity.This study demonstrates the power of network-based approaches to identify key genes involved in macrophage function during obesity. The identification of novel hub genes, such as Gcsh, Rcan1, Ndufb4, and Ywhaq, provides valuable insights into the molecular mechanisms underlying macrophage dysfunction in obesity and suggests potential therapeutic targets for combating this complex disease.
Palavras-chave: Obesity, Macrophages, Co-expression, Gene, Network
#1066097

Mathematical Modeling of Metabolic Interactions Across Domains: A Study of the Nutritional Relationships Between Salinibacter ruber and Haloquadratum walsbyi.

Autores: ALEX CENTENO,Sara Cuadros Orellana,Carolina Marchant Fuentes,Aristóteles Góes Neto
Apresentador: ALEX CENTENO • alcenme@gmail.com
Resumo:
Metabolic interactions are ubiquitous in nature, with the potential to structure microbial communities into complex functional systems, beyond random aggregates. Notable examples of these interactions include syntrophy and nutritional mutualism, both widely present in thalassohaline ecosystems (>9% NaCl), where they provide microorganisms with adaptive strategies to overcome nutritional limitations. In this study, we present a mathematical model designed to predict the growth rates of Salinibacter ruber (S. ruber) and Haloquadratum walsbyi (Hqr. walsbyi) in monoculture and coculture scenarios, using genome-scale metabolic reconstructions of reference strains. The current model of metabolic interaction suggests a unidirectional syntrophic relationship from S. ruber to Hqr. walsbyi, where the former produces and the latter consumes dihydroxyacetone, mediated by the metabolism of glycerol as a carbon source. However, there is limited evidence regarding the metabolic contribution of Hqr. walsbyi to the ecosystem, particularly towards S. ruber. We hypothesize that, in the presence of glycerol, both microorganisms establish a bidirectional mutualistic metabolic interaction. The proposed model introduces significant innovations in parameter identification and estimation, adopting a more general structure that captures both fixed and dynamic effects. This approach surpasses traditional additive and multiplicative frameworks, addressing issues related to incidental parameters. Furthermore, it explicitly incorporates unobserved heterogeneity, representing latent metabolic factors that influence microbial growth and have not been accounted for in previous models. For instance, the potential contribution of Hqr. walsbyi to S. ruber is explicitly addressed. Another key feature of the model is its ability to correct for biases associated with endogeneity and simultaneity in the growth rates of both microorganisms. These innovations enhance the model's capacity to handle experimental variability, providing more robust parameter estimates and enabling a detailed analysis of metabolic interactions, their impact on growth rates, and their role within the ecosystem. Furthermore, the developed model is evaluated against current approaches available in the literature. This model is expected not only to outperform existing approaches in terms of precision and fit but also to provide a deeper and more quantitative understanding of the metabolic interactions between these microorganisms. Additionally, it will enable a more detailed characterization of microbial dynamics in extreme environments, enhancing predictive analysis and opening new opportunities to study metabolic cooperations under controlled conditions.
Palavras-chave: Mathematical modeling, metabolic interactions, Haloquadratum walsbyi, Salinibacter ruber
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1066107

APPLICATION OF DEEP LEARNING MODELING APPROACH AND GENETIC ALGORITHMS FOR ENHANCED ANTIVIRAL PEPTIDE OPTIMIZATION: A SARS-COV-2 SPIKE PROTEIN CASE STUDY

Autores: Frederico Chaves Carvalho,Diego Mariano,Raquel Minardi
Apresentador: Frederico Chaves Carvalho • fcc073@gmail.com
Resumo:
Antiviral peptide design is a promising approach for therapeutic intervention, particularly in combating rapidly evolving viruses such as SARS-CoV-2 and personalized medicine. This study introduces a novel pipeline harnessing the capabilities of AlphaFold for docking, with a genetic algorithm (GA) to optimize peptides targeting the receptor-binding motif (RBM) of the SARS-CoV-2 Spike protein. The process began with an initial population of 100 peptides, selected from the Propedia database, which were docked against the Spike protein using PyRosetta’s FlexPepDock protocol and scored with the REF2015 scoring function. Peptides were ranked based on binding affinity, with those exhibiting at least 30% RBM-occupation retained. Successive generations were generated via mutation and crossover operations on the top-performing peptides, employing tournament selection and a modified elitism strategy to maintain diversity. This approach included the removal of peptides that remained the best performers for more than five generations, promoting exploration of novel solutions. After analyzing the results from simulating 50 generations with populations of 50 peptides each, the pipeline demonstrated a consistent trend toward lower binding energies. AlphaFold was employed as a proxy for docking to predict the peptide-protein complexes’ structures efficiently, with subsequent scoring conducted using PyRosetta’s REF2015 scoring function. The GA-driven process resulted in peptides with improved binding affinities compared to optimization workflows relying solely on FlexPepDock protocol for docking. Furthermore, AlphaFold exhibited superior performance in discovering peptides with favorable binding characteristics and achieving convergence faster. This integrated strategy showcases the potential of combining deep learning models and traditional optimization algorithms for peptide design, enabling robust and efficient optimization of antiviral candidates. The findings highlight the utility of AlphaFold not only for structural modeling but also as a pivotal component in computational pipelines for therapeutic peptide development. Future work could explore alternative scoring functions and experimental validation to further enhance peptide optimization.

Acknowledgements: The authors would like to thank the research funding agency FAPEMIG (Fundação de Amparo à Pesquisa do Estado de Minas Gerais).
Palavras-chave: Peptide optimization, AlphaFold, SARS-CoV-2
★ Running for the Qiagen Digital Insights Excellence Awards
#1066151

SYSTEMATIC REVIEW OF AUTOMATED TOOLS FOR GENOMIC SURVEILLANCE WITH STRONG POTENTIAL FOR PREDICTIVE MODELING

Autores: Ronison Alves Guimarães,Bárbara Zuccolotto Schneider G. Parreira,Alisson Clementino da Silva,Joicymara Santos Xavier
Apresentador: Ronison Alves Guimarães • ronialvesart@gmail.com
Resumo:
Arboviruses are RNA viruses with high mutation rates due to their need to alternate between vertebrate and arthropod hosts for replication and transmission. These mutations generate genetic diversity, potentially leading to new serotypes that evade existing immune responses. The rapid evolution of arboviruses underscores the need for continuous genomic surveillance to identify emerging variants that may threaten public health. This requires urgent research on sequencing new strains, assessing their susceptibility to current vaccines and treatments, and monitoring their geographic spread. Such insights enable policymakers and public health officials to implement effective response strategies. These emerging outbreaks present significant challenges to public health surveillance and response, particularly due to the lack of automated systems capable of near real-time genomic data integration and analysis. This gap hinders efforts to track serotype distributions, monitor outbreak hotspots, and predict emerging threats. Current methods often require researchers to manually process genomic and epidemiological data, leading to inefficiencies and delays in actionable insights. This systematic review was conducted following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines and using supporting tools such as Rayyan and VOSviewer. The search was performed in databases such as PubMed and Web of Science, using keywords such as genomic analysis, arbovirus, and predictive, resulting in an initial sample of 553 articles published up to January 2025. After the rigorous application of inclusion and exclusion criteria, 185 articles were selected for detailed analysis. The study analysis highlighted the crucial importance of interoperability between analytical tools and automated pipelines for acquiring, cleaning, classifying, and integrating genomic data. These resources have proven to be essential for serotype tracking, investigating new genotypes, monitoring geographic spread, and supporting real-time public health decision-making. However, the review also revealed challenges in implementing automated tools capable of generating relevant results for predictive modeling. Although various tools are available, gaps remain in their ability to meet the demands of arbovirus epidemiological surveillance. Some of the identified obstacles include a lack of standardization, the complexity of integration with existing systems, and the need for specialized training.

Acknowledgements: Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG) and Inform Africa Research Study Group.
Palavras-chave: Genomic Data, Public Health, Predictive, Arbovirus Epidemiology, Pipeline
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1066163

IDENTIFICATION OF POTENTIAL Mt-DHFR INHIBITORS THROUGH VIRTUAL SCREENING OF BraCoLi LIBRARY

Autores: Beatriz Murta Rezende Moraes Ribeiro,Maria Fernanda Liphaus Almeida Negreli,Rafaela Salgado Ferreira
Apresentador: Beatriz Murta Rezende Moraes Ribeiro • bia.murta@hotmail.com
Resumo:
Tuberculosis, caused by Mycobacterium tuberculosis, is associated with a high mortality rate. The current treatment regimen involves the use of four drugs, which is prolonged and associated with considerable adverse effects. The emergence of multidrug-resistant bacteria strains highlights the urgence need for new therapeutic strategies. Dihydrofolate reductase (DHFR) is a key enzyme in folate biosynthesis and a promising drug target. An additional binding pocket (glycerol pocket) in M. tuberculosis DHFR (Mt-DHFR), absent in human DHFR, enables the design of selective inhibitors to this target. The aim of this study was to identify potential Mt-DHFR inhibitors from the BraCoLi library through virtual screening. A docking protocol was established using DOCK6.11. The protocol's effectiveness was retrospectively evaluated by assessing its ability to reproduce the binding poses of experimentally determined ligands available in the Protein Data Bank (PDB), through cross-docking studies, and its capacity to identify true ligands using enrichment curve analysis. An optimized protocol was developed using the PDB structure 4KNE, achieving nearly 50% reproduction of the binding mode in cross-docking studies and demonstrating good enrichment performance with an AUC of 0.77 and a logAUC at 10% of the dataset of 0.71. The BraCoLi library, comprising approximately 2,000 compounds, was screened using this protocol, resulting in the selection of thirteen compounds for further biochemical evaluation. The selected compounds were chosen based on several filters, including their fit within the active site, the presence of hydrogen bonding interactions, interactions with glycerol site residues, and other relevant interactions with the target. These compounds will be subjected to confirmatory biochemical assays with Mt-DHFR.
Palavras-chave: Mt-DHFR, virtual screening, medicinal chemistry
#1066168

DISTRIBUTED COMPUTING APPLIED TO GENOMIC DATA FOR VARIANT ANNOTATION

Autores: Ivan Gomes da Cruz,Glen Jasper Yupanqui García,Eduardo Martin Tarazona Santos
Apresentador: Ivan Gomes da Cruz • ivangomes.trabalho@gmail.com
Resumo:
This project proposes a computational framework aimed at improving the efficiency, scalability, and accuracy of the annotation process for human genetic variants, considering the exponential growth in genomic data volume. The rapid development of advanced bioinformatics tools has introduced significant challenges for traditional systems, particularly in terms of performance, scalability, and data integration. The research focuses on the MASSA tool (Multi-Agent System for SNP Annotation), originally developed to annotate human genetic variants based on multi-agent systems, and seeks to overcome performance limitations identified in large-scale datasets. The proposed approach leverages cutting-edge Big Data techniques, such as the MapReduce paradigm, to efficiently distribute and parallelize computational tasks, significantly reducing processing time and optimizing resource utilization. Additionally, the project proposes the development of automated pipelines for seamless incorporation, validation, and updating of genomic databases. Furthermore, it seeks to integrate MASSA with the DANCE (Disease-Ancestry Network) tool. This integration will enable real-time visualization of genetic variant profiles through an intuitive, interactive web interface, developed using modern front-end libraries, ensuring a user-friendly and dynamic data exploration experience. The newly designed architecture is thoroughly validated through extensive testing in varied computational environments, assessing key performance metrics such as execution time, memory consumption, and annotation accuracy. The expected outcome is a robust, scalable, and high-performance system capable of addressing the growing demands of genomic research while enabling rapid and accessible analyses of genetic variants. Ultimately, this project aspires to make a substantial contribution to the advancement of personalized medicine, fostering deeper comprehension and practical utilization of complex genomic data in both clinical and research settings.
Acknowledgments: FAPEMIG, CNPq, CAPES
Palavras-chave: Annotation, variants, human genetics, MapReduce
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1066187

Exploring the Prognostic Implications of the IRS1 NM_005544.3:c.158G>A Pathogenic Variant in Breast Cancer Patients as a Potential Biomarker

Autores: Kassyane Amanda Rodrigues Furtado,Thalia Rodrigues Zózimo,Rafaela Lopes Figueiredo de Andrade,Vasco Ariston de Carvalho Azevedo,Paulo Guilherme de Oliveira Salles,Débora Cristina De Freitas Batista,TELMA MARIA ROSSI DE FIGUEIREDO FRANCO,Carolina Pereira de Souza Melo,Leticia da Conceição Braga
Apresentador: Kassyane Amanda Rodrigues Furtado • kassyanefurtado@outlook.com
Resumo:
Breast cancer (BC) is the most prevalent among women, representing a significant public health concern.1 In 2020, BC was responsible for approximately 685,000 deaths globally.2 In Brazil, an estimated 74,000 BC new cases are expected annually between 2023-2025 period, with a mortality rate of 16.4%.3 This type of cancer imposes high morbidity and mortality, exacerbated by late detection and socioeconomic barriers.1 In this context, personalized medicine (PM) emerges as a promising approach to improve the management of these patients. The IRS1 gene (Insulin Receptor Substrate 1), identified in the nucleus of BC cells, acts as a transcriptional modulator and interact with ERa and the progesterone receptor.4 IRS1 is involved in cell proliferation, varying levels of expression in breast tumors, which may have prognostic and therapeutic implications.5 This study aims to evaluate the presence of genomic variants in the IRS1 gene in breast cancer patients treated at Hospital Luxemburgo (HL) and their association with treatment response and prognosis. Next-generation sequencing of 35 BC patients was performed using the QIAseq Multimodal Pan-Cancer Panel (Qiagen). Variant calling was conducted in QIAGEN CLC Genomics Workbench 23, considering only non-synonymous variants different from the reference allele. The IRS1 gene was selected for further study of variant classification in the ClinVar database for pathogenicity annotation, retaining only those with an rs identifier, allowing traceability in public databases. The association of genetic variants in IRS1 with clinical data of the patients was evaluated. The Kaplan-Meier Plotter (KMplot) tool was used to explore associations between variants and patient survival. Our results showed a total of 418 variants in the IRS1 gene were identified. After filtering for non-synonymous variants with an rs (Reference SNP ID) identifier, 248 variants remained, including 3 benign, 1 pathogenic, and 5 of uncertain significance. Analysis using the Kaplan-Meier Plotter revealed an association between IRS1 variants and poor patient survival [HR 3.6 (1.14-11.33); p=0.019].The presence of variants in IRS1, correlated with unfavorable outcomes, suggests its potential as a prognostic biomarker. These findings highlight the relevance of the IRS1 gene in BC patients, associated with cell proliferation and poor survival. Additional studies are needed to validate its clinical applicability on a large scale.
Palavras-chave: Keywords: Breast Cancer, Prognostic Biomarker, IRS1 Gene (Insulin Receptor Substrate 1), Personalized Medicine.
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1066199

Gut Microbiota and Geography: Understanding Richness and Diversity Across Populations

Autores: Jéssica Lígia,Gabriel da Rocha Fernandes
Apresentador: Jéssica Lígia • jessicaligia.pm@gmail.com
Resumo:
The human gut microbiota constitutes a vast community of microorganisms, with more than 100 trillion organisms inhabiting each individual and significantly influencing human metabolism. Among the factors shaping this community's structure, studies have debated specific characteristics associated with different populations. In this context, the objective of this study was to understand variations in the microbiota among distinct populations. For this purpose, raw data from four countries—Brazil, Colombia, Nigeria, and Indonesia—were analyzed, totaling 888 samples. The data were processed using the Silva 1.32 database and analyzed with DADA2 and Tag.me tools in RStudio software. Alpha diversity (Shannon index) and beta diversity (Jensen-Shannon distance) analyses were performed to evaluate the richness and diversity of microbial communities in each country. The results of the alpha diversity analyses revealed significant variations in richness between countries. Regarding diversity, Colombia and Brazil formed a closer cluster, while Nigeria and Indonesia showed greater proximity to each other. Factors commonly influencing the microbiota, such as diet, lifestyle, disease prevalence, and degree of urbanization, as well as methodological differences, may explain this clustering. For instance, while Brazil and Indonesia sequenced the V3V4 region, Colombia and Nigeria focused exclusively on the V4 region. In light of contemporary discussions on the lack of reproducibility in microbiota studies, we propose that geographical specificities should be given greater consideration. This is justified by the identification of richness and diversity patterns that are sufficiently distinct to form unique clusters among the countries analyzed. We conclude that microbiota studies must incorporate a less generalized geographical context to better understand regional variations.
Acknowledgements: The authors would like to thank the research funding agency FAPEMIG (Fundação de amparo à pesquisa do estado de Minas Gerais).
Palavras-chave: : Gut microbiota, population, diversity, 16s.
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1066205

GENOMIC ANALYSIS OF THE BIOREMEDIATOR POTENTIAL OF THE SPECIES Klebsiella michiganensis IN THE RECOVERY OF SOIL CONTAMINATED BY HEAVY METALS

Autores: Marina Nascimento da Silva,Douglas Vinícius Dias Carneiro,Diego Lucas Neres Rodrigues,Marco Gama,Vasco A de C Azevedo,ANA MARIA BENKO ISEPPON,Bertram Brenig,Flávia Figueira Aburjaile
Apresentador: Marina Nascimento da Silva • marina.nbochiard@gmail.com
Resumo:
The genus Klebsiella comprises Gram-negative bacilli and facultative anaerobes found in a variety of environments. In soils, this genus is often predominant in the rhizosphere, where it plays a key role in nitrogen fixation and promotes plant growth. Some species, such as Klebsiella michiganensis, have demonstrated high efficiency in restoring degraded soils contaminated with heavy metals, such as Cd, Hg, As, Pb and Ni. This potential is related to the high resistance of the species to these metals, allowing the removal from the environment. These compounds accumulate in plants and soils, reducing food quality and rendering land unusable. The literature suggests the use of microorganisms, including K. michiganensis, as a promising approach for bioremediation and soil remediation. This study aimed to investigate the bioremediation potential of K. michiganensis by characterising its resistance and metal removal capabilities through the evaluation of resistance-associated genes in comparison to other species in the genus. A bacterial isolate, identified as CCRMKO592, was obtained from cabbage grown in commercial plantations in Gravatá, Pernambuco. For comparison, 80 genomes were analysed from the National Center for Biotechnology Information (NCBI), including 39 from K. oxytoca and 41 from K. michiganensis. The isolate was previously extracted for analysis and then sequenced using the Illumina HiSeq 2500 platform with 2x150pb reads. The quality of the sequencing was checked using the FastQC software (v0.12.1), and the genome assembly was carried out using the Unicycle tool (v0.4.8). The identity of the species was validated using the MUMmer alignment method combined with the pyANI tool (v3.0), considering a similarity criterion of = 96%. Orthofinder, FastTree and ITOL software were used to identify orthologous genes and build the phylogenetic tree. The heavy metal resistance genes were identified using the PanViTa tool (v.1.1.5) and extracted from the BacMet database. 137 genes were associated with resistance to heavy metals and biocides, of which 116 belong to the core genome, 18 are accessory and 3 are exclusive. These genes confer resistance to a wide variety of compounds, particularly Zn, Cu, Cd, As and Fe. Unique genes, such as pcoe, feta/ybbl and rcnr/yohl, have been shown to play a strategic role in resistance to specific metals - Co, Ag and Ni. These findings corroborate previous studies and confirm that K. michiganensis has high potential as a bioremediation agent. The species represents a promising alternative for mitigating the environmental impacts of heavy metal contamination, offering economic and sustainable benefits for agriculture.
Palavras-chave: bioinformatics, bioremediation, metal resistance, resistance genes, environmental contamination.
#1066213

Microbial Resilience and Space Exploration

Autores: Hugo Mauricio Peña Mercado,Aristóteles Góes-Neto
Apresentador: Hugo Mauricio Peña Mercado • hugop02@hotmail.com
Resumo:
Space exploration is crucial for ensuring the future of humanity and safeguarding life on Earth. A significant challenge in this endeavor is understanding the resilience of life under extraterrestrial conditions. The BIOMEX project investigates the survival and adaptability of organisms from the three domains of life, including some of the oldest species on Earth, to extreme environments resembling those on Mars. By exploring the limits of life, the project aims to unravel the mechanisms that enable organisms to withstand space-like conditions, contributing to the broader search for life beyond Earth.
Among the organisms studied is the Kombucha Microbial Community (KMC), a unique consortium renowned for its ability to produce versatile biofilms and survive under harsh conditions. Its resilience makes it an excellent model for exploring microbial survival in extraterrestrial environments. This study focuses on understanding how KMC adapts to stress conditions resembling those encountered on Mars. By examining its biological functions, including biofilm production and stress responses, the research sheds light on the potential role of microbial communities in supporting future space missions.
Insights from this work can inform the development of biotechnological applications for space exploration, offering new approaches to sustain life and harness microbial functions in extreme environments.
Palavras-chave: Astrobiology. Kombucha. Genomics. Komagataeibacter oboediens.
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1066233

EVALUATION POTENCIAL INHIBITORS OF CLASS I HISTONE DESACETYLASES USING MOLECULAR DOCKING

Autores: Alessandra Gomes Cioletti,Diego César Batista Mariano,ALESSANDRA LIMA DA SILVA,Luana Bastos,Raquel Minardi
Apresentador: Alessandra Gomes Cioletti • agcioletti@gmail.com
Resumo:
Histone deacetylases (HDACs) are enzymes that play an essential role in the regulation of gene expression, with recent studies linking their inhibition to autism spectrum disorders (ASD) and other diseases, such as cancer and Alzheimer's disease. As a result, there is a growing interest in understanding the effects of HDAC inhibition, in understanding the mechanisms and processes that cause these disorders, as well as in the search for possible inhibitors of these enzymes for use in treatment. In this work, we used molecular docking to investigate the binding between HDACs and small ligands, focusing on two class I HDAC enzymes involved in embryonic development: Histone deacetylase 1 (H1) and Histone deacetylase 2 (H2), with the aim of finding patterns that will allow us to find hits to be used in the selection of new possible inhibitors of histone deacetylases. However, docking HDACs, known as metalloenzymes, remains a challenge due to their interaction with the Zn2+ ion at the bottom of the enzyme's active binding site. In most cases, a molecule targeting these enzymes establishes an interaction that coordinates with the zinc ion. Thus, the prediction needs to consider the fact that the interaction of ligands with zinc is an important aspect of computational docking. Therefore, we chose to use an AutoDock Vina protocol that extends the force field to include force maps capable of describing the energetic and geometric components of these interactions. A redocking experiment was used to establish the docking protocol in the automated process. Then, docking was performed with ligands and decoys extracted from the MUBD-HDAC database. To evaluate the results, we manually checked some outputs of the ligand and decoy docking. In all results, the ligand and decoy molecules were allocated to a position close to zinc. It was observed that ligands are capable of chelating zinc due to the presence of two oxygens at the end of the molecule, which favors a higher binding affinity. The binding energy of all ligands and decoys was also verified. The docking results indicate that, on average, the ligands had a lower surface energy than the decoys in both case studies. A lower affinity energy represents a stronger binding. Finally, in a third step, HDAC1 docking was performed with toxins obtained from the T3DB database, to select possible substances capable of inhibiting the HDAC1 enzyme based on the affinity energy.

Acknowledgements: The authors would like to thank the research funding agency FAPEMIG (Fundação de amparo à pesquisa do estado de Minas Gerais).
Palavras-chave: Docking; Vina; Histone deacetylase; Metalloproteins; Structural bioinformatics.
#1066243

STUDY OF THE PNTX4 (6-1) TOXIN FROM THE SPIDER PHONEUTRIA NIGRIVENTER AND ITS POTENTIAL USE AS A TEMPLATE FOR INSECTICIDAL PEPTIDES AGAINST THE AEDES AEGYPTI MOSQUITO

Autores: João Victor Marques Da Silva,Rafaela Salgado Ferreira,Maurício Roberto Viana Sant'Anna,Maria Elena
Apresentador: João Victor Marques Da Silva • jmarques8284@ufmg.br
Resumo:
PnTx4(6-1) is a toxin with insecticidal activity from the P. nigriventer spider venom. This toxin prolongs the inactivation of sodium channels in insects without apparent toxicity in mice, making it a promising template for the development of insecticidal peptides for agricultural pest and vector insect control. Among these vectors, the mosquito A. aegypti stands out as the transmitter of arboviruses with significant medical importance, such as dengue, Zika, chikungunya, and yellow fever viruses. Given the lack of drugs and vaccines for most of these diseases, control of this vector becomes essential. However, the high level of resistance to pyrethroid insecticides in these mosquito populations has become an increasingly reported issue in various regions worldwide, making the development of new classes of insecticides crucial. In this context, the objective of this study is to investigate PnTx4(6-1) as a model for insecticidal peptides against A. aegypti. To this end, bioinformatics methodologies were employed, including amino acid sequence alignment, secondary structure prediction, prediction of the tertiary structure modeling of the toxin and the sodium channel of interest and interactions of both molecules by docking . The analysis revealed that the toxin has paralogous and homologous proteins with conserved regions. Its secondary structure consists of two beta-strands and one alpha-helix between loops, while its tertiary structure features a cystine knot with five disulfide bonds. Three types of sodium channels were identified in A. aegypti, with several isoforms, one of which was selected for comparative modeling. The docking analysis suggests that the toxin interacts with the site 3 of mosquito sodium channel in domain IV. Additionally, in vivo tests were conducted to evaluate the effects of P. nigriventer spider venom on A. aegypti mosquito, including mortality curves and determination of LD50. In these preliminary tests, A. aegypti females microinjected with 69 nL of the venom died within the first hours after injection, with an LD50 of 0.09 ng. These results indicate that the PnTx4(6-1) toxin is a promising template for the development of peptides with activity against A. aegypti, with the potential to combat insect vectors resistant to conventional insecticides. Additionally, this study may contribute to understanding the interactions of spider toxins with insect sodium channels, as well as provide insights into the structure of the A. aegypti sodium channel.
Palavras-chave: Structure; Pntx4(6-1) toxin; Sodium channel; Aedes aegypti; Insecticidal.
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1066246

Impact of Periodontal Disease on the Subgingival Microbiome: Insights from Taxonomic and Functional Profiles

Autores: Wellington Francisco Rodrigues,Ferdinando Agostinho,Camila Botelho Miguel,Dra. Marlana Barbosa Carrijo de Carvalho,Luiz Fernando Veloso Favero,Karlla Kristinna Almeida Medeiros,Marcelo Henrique Napimoga,Carlo José Freire de Oliveira,Aristóteles Góes Neto,Siomar de Castro Soares
Apresentador: Wellington Francisco Rodrigues • wellington.frodrigues60001@gmail.com
Resumo:
Periodontal disease is a prevalent inflammatory condition driven by microbial dysbiosis within the subgingival biofilm, leading to the destruction of periodontal tissues. Understanding the taxonomic and functional profiles of the subgingival microbiome in individuals with periodontal disease compared to healthy controls is critical to elucidating its role in pathogenesis and potential biomarkers for diagnosis and treatment. This study aimed to compare the subgingival microbiome's taxonomic and functional profiles between individuals with periodontal disease and healthy controls, identifying biomarkers and pathways associated with periodontal dysbiosis. This was a stratified block case control study comparing two groups: individuals with periodontal disease (n = 10) and healthy controls (n = 9). Participants were aged =45 years, non-smokers, non-alcoholic, and had not used antibiotics in the previous 30 days. Periodontal disease was diagnosed following criteria from the American Academy of Periodontology and the European Federation of Periodontology. The research protocol was approved by the Ethics Committee in Research with Human Beings of the University of Rio Verde (CEP/UniRv - approval nº 2.304.394). Subgingival samples were collected using sterile paper cones, stored in liquid nitrogen, and processed for DNA extraction. 16S rRNA gene sequencing (V3-V4 regions) was performed on the MiSeq Illumina platform. Taxonomic profiles and functional pathways were analyzed using the EzBioCloud 16S pipeline, employing PICRUSt for predictive metagenomics. Alpha diversity (ACE, Chao1, and Shannon indices) and beta diversity (Bray-Curtis, UniFrac distances) were calculated, and LEfSe analysis identified taxonomic and functional biomarkers. Statistical significance was set at p < 0.05. The phyla Bacteroidetes (40.77%) and Fusobacteria (17.49%) dominated the periodontal disease group, while Firmicutes (29.54%) and Fusobacteria (23.90%) were most abundant in healthy controls. Porphyromonas gingivalis had the highest relative abundance (15.82%) in the periodontal group, while Fusobacterium nucleatum group dominated the healthy group (22.09%). Alpha diversity analysis revealed significantly lower species richness in the periodontal group compared to healthy controls (Jackknife, p = 0.05). However, beta diversity comparisons showed no statistically significant differences between groups. Functional analysis identified three significantly enriched pathways in the periodontal group: ko02040-Flagellar assembly (p = 0.049), ko02030-Bacterial chemotaxis (p = 0.049), and ko03010-Ribosome (p = 0.049). These pathways are associated with bacterial motility and protein synthesis, reflecting key mechanisms in periodontal pathogenesis. The subgingival microbiome in periodontal disease exhibits distinct taxonomic and functional profiles, characterized by increased relative abundance of pathogenic species such as Porphyromonas gingivalis and enrichment of pathways related to bacterial motility and chemotaxis. These findings highlight potential biomarkers and targets for therapeutic interventions in periodontal disease. The authors gratefully acknowledge the financial support provided by the Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG).
Palavras-chave: Periodontal disease, subgingival microbiome, Porphyromonas gingivalis, taxonomic biomarkers, functional pathways
#1066248

FEASIBILITY STUDY OF PROTEIN MUTATIONS WITH ESSENTIAL AMINO ACIDS FOR USE IN THE “SINGLE-CELL PROTEIN” DIET

Autores: Juliana Rodrigues Pereira Silva,Raquel Minardi,José Miguel Ortega
Apresentador: Juliana Rodrigues Pereira Silva • sofos.jrp@gmail.com
Resumo:
The term “single-cell protein” (SCP) refers to a diet based on nutrients produced by a single cell. Microorganisms that can be used as SCP include a variety of bacteria, marine microalgae, yeasts, and filamentous fungi. What most inspired this proposal was a study in which tilapia were fed yeast expressing the vitellogenin storage protein to enrich the SCP. Thus, we identified a window of opportunity to attempt the enrichment of globular proteins, replacing amino acids in their composition with a single type of essential amino acid with a view to increasing the SCP, replacing vitellogenin with designed proteins. However, knowledge generation is needed to enable these exchanges. To date, stability studies have been performed in molecular dynamics using simulations with NAMD and GROMACS with times between 50 and 1000 ns. Two approaches for targeted mutation were used to enrich human myoglobin for essential amino acids: (1) replacement of residues with a favorable score in a BLOSUM62 substitution matrix, with determination of the maximum substitution amount and (2) replacement of residues with a favorable ????G determined by the mCSM software. After exchanges, the protein sequence was modeled with I-TASSER and its structure was refined with ModRefiner and protonated at pH 7.00 by H++. Simulations were performed with NAMD and/or GROMACS, manually observing the dynamics, with RMSD and RMSF analysis, solvent exposure of the final poses and analysis of intra-protein atomic interactions by the COCaDA program. In the mCSM approach, it should be noted that ????G is determined individually for each exchange; we ordered the values from the most stabilizing to the most destabilizing, replacing deciles or quartiles incrementally and analyzing their stability. We found that exchanges for M are always more favorable than for R and for H, in this order, except in protein loops where no differences were observed. In addition to human myoglobin (3RGK), this was repeated for two other molecules, albumin (4ZBQ) and myelin (6XU5). At the present time, we have achieved the following values of substitution percentage of myoglobin residues that generated simulations with stabilization, with RMSD lower than four angstroms, respectively: (a) for BLOSUM62-based substitution, I (39.9%), M (43.1%), R (35.9%) and H (37.3%); (b) for mCSM, I (53%), M (51%), R (31.5.9%) and H (26.5%). The numbers of H-bridges, attractive contacts and salt bridges similar to those observed in the wild-type were observed by the BLOSUM62 methodology for M, I and R. By the mCSM methodology, the deciles showed a much lower or zero number of attractive contacts and salt bridges. In the solvent exposure experiments, the results expressed the manual trajectory analyses, a higher exposure for arginine and histidine and a lower exposure for methionine and isoleucine. As perspectives, we should add more stability indicators and investigate more deeply the effect that substitution by M is always more favorable than for R and H, except in protein loops, now directing the study to sensitive positions in the structure.
Palavras-chave: SCP, enrichment, essential amino acids, molecular dynamics.
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1066276

Molecular modeling and comprehensive analysis of T. cruzi cruzipain subtypes

Autores: Lucas Abreu Diniz,Ana Maria Valsani Leme Passos,Estela Mariana Guimarães Lourenço,Rafaela Salgado Ferreira
Apresentador: Ana Maria Valsani Leme Passos • anamariavalsani@gmail.com
Resumo:
Chagas disease is one of the most prevalent neglected tropical diseases in the world.
However, its pharmaceutical treatment is still limited and marked by numerous adverse
effects. Cruzipains (CZP) have been the most studied molecular targets in the discovery
of new antichagasic drugs, with cruzipain 1 being the most explored subtype. In its
recombinant form with the C-terminal domain truncated, cruzipain 1 is called cruzain and
crystals of this enzyme are reported in several studies. Recent works have shown
significant modifications in amino acid residues between the different subtypes of
cruzipains. Among the structural modifications, substitutions in the active site residues
lead to the belief that cruzain inhibitors may interact differently between CZP subtypes,
resulting in a possible change in their potency. Despite the described differences, there
are still no crystallographic structures of the cruzipain subtypes. Therefore, the aim of this
study is to investigate how changes in amino acid residues between the different
cruzipains subtypes impact protein-ligand interactions. For this purpose, the three
dimensional structures of CZPs were predicted in silico through comparative modeling
using different crystallographic protein-ligand complexes of cruzain as templates. The
models were geometrically validated and the intermolecular interactions between ligand
and active site were evaluated by visual inspection. Furthermore, the affinity of the
compounds for the cruzipain subtypes was evaluated using Fast Amber rescore program.
The methodologies allowed the obtaining of geometrically favorable models, and the
rescoring value of the complexes was similar for all subtypes of cruzipains. Initially, the
data demonstrates that the change of amino acid residues between the different CZPs does
not influence the activity of the compounds. The study will also be conducted using a
larger number of compounds and complementary tools such as molecular dynamics will
be applied for more accurate results.
Palavras-chave: Chagas disease, cruzain, cruzipain, comparative modeling
#1066286

THE ORIGIN OF THE GENES THAT CONTROL PLACENTAL DEVELOPMENT: HOW LONG BEFORE THE APPEARANCE OF PLACENTAL ORGANISMS DID THEY ARISE?

Autores: Kadu Penuela Sanches Estevam,José Miguel Ortega
Apresentador: Kadu Penuela Sanches Estevam • kadupenuela@gmail.com
Resumo:
We developed a web application named GO-Genesis (http://biodados.icb.ufmg.br/gogenesis) to trace the origin of proteins annotated to Gene Ontology Terms. Using this application, we discovered many terms with recent Least Common Ancestors (LCA) of targeted proteins, all implicated in Mammalian reproductive structures such as uterus, vagina, and sperm head. Here we analyze proteins associated with "Embryonic Placenta Development" (GO:0001892): the embryonically driven progression of placenta formation to maturity. The placenta is an organ of metabolic interchange between fetus and mother, partly of embryonic origin and partly of maternal origin. GO-Genesis identified Theria as the clade of origin for proteins from 56 species, with 25 proteins in Homo sapiens and 41 in Mus musculus. We analyzed each human protein using TaxOnTree, which performs BLAST searches in Uniprot Complete Proteomes Database, creates multiple alignments with MUSCLE, and builds trees with FastTree software. TaxOnTree adds taxonomic information to tree branches, enabling manual LCA analysis for each gene. The most recent gene, CSF2 (Granulocyte-macrophage colony-stimulating factor), appeared in Eutheria. This cytokine, which stimulates hematopoietic precursor cell growth and differentiation, likely contributed significantly to the GO process due to its recent origin. Heat shock factor protein 1 (HSP1) emerged in Tetrapoda. Several genes originated in aquatic vertebrates: HIF1A (Hypoxia-inducible factor 1-alpha) and PDFGB (Platelet-derived growth factor subunit B) in Sarcopterygii; ESRRB, TFEB, GATA2, and SP3 in Euteleostomi; and SPINT1, EGFR, IGF2, CEBPB, ARNT, PKD2, and EPAS1 in Gnathostomata. Six additional genes appeared in earlier organisms, from Metazoa to Bilateria. STRING-DB analysis mapped the protein interactions to embryonic placenta development, in utero embryonic development, reproductive structure development, and embryonic organ development. In Reactome, cellular response to hypoxia emerged as the predominant pathway, forming the second largest protein-protein interaction cluster in STRING-DB. These findings indicate that embryonic placenta development is primarily controlled by genes that arose in aquatic vertebrates, particularly from sharks to coelacanths, with two more recent additions: one in our common ancestor with frogs and another in placental organisms. A comprehensive pathway incorporating both original and interacting proteins is under development.
Palavras-chave: Keywords: gene evolution; ontology; phylogeny.
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1066396

Identification of Sepsis Biomarkers Using Modified Logistic Regression Algorithms and mRNA Expression Analysis

Autores: ROSSANA DE OLIVEIRA SOUZA
Apresentador: ROSSANA DE OLIVEIRA SOUZA • rossanaoliveirasouza@gmail.com
Resumo:
Acknowledgements: Research Support Foundation of the State of Minas Gerais (FAPEMIG)

Sepsis is a critical medical condition marked by a dysregulated inflammatory response to
infection, often leading to organ failure and death. Responsible for 11 million deaths
annually, it represents a global health challenge compounded by the difficulty of early
diagnosis due to nonspecific clinical signs. Despite modern definitions focusing on organ
dysfunction as a core aspect of the disease, effective tools for detecting sepsis in its early
stages—when intervention is most effective—are still lacking.
Molecular biomarkers such as mRNA and microRNAs have emerged as promising
candidates for the early diagnosis of sepsis, as they capture the underlying inflammatory and
pathological processes. This study explores the application of modified logistic regression
algorithms to uncover molecular patterns in gene expression data. By integrating machine
learning to address the high dimensionality of these datasets, this approach aims to achieve
more accurate diagnoses, paving the way for cost-effective, personalized therapies that can
improve survival rates.
Traditional linear models, often applied through least squares techniques, face significant
challenges when dealing with high-dimensional data, such as that from gene expression
studies. In these scenarios, the number of variables frequently exceeds the number of
samples, resulting in underdetermined systems and unreliable solutions. Additionally, poor
matrix conditioning amplifies computational difficulties, necessitating advanced
methodologies capable of addressing these complexities.
Logistic regression, a widely used tool for binary classification, is particularly effective for
analyzing complex datasets due to its interpretability. However, in high-dimensional contexts,
the method struggles with issues like coefficient instability and overfitting, which undermine
its reliability. To mitigate these limitations, advanced techniques such as regularization and ad
hoc modifications have been introduced, striking a balance between computational efficiency
and analytical precision.
Modified logistic regression enhances traditional models by incorporating additional
regularization terms, such as the identity matrix, which stabilize computations and reduce
sensitivity to noise. This refinement allows for more reliable pattern recognition in large,
complex datasets. The use of sparse matrices further optimizes memory and computational
resources, facilitating the analysis of biological data with high dimensionality and enhancing
the practical applicability of the method.
By leveraging modified logistic regression and machine learning, this study seeks to identify
molecular biomarkers of sepsis with high sensitivity and specificity. The proposed in silico
methodology offers a robust solution to the current diagnostic limitations, aiming to reducesepsis-related mortality. This innovative approach has the potential to transform the diagnosis
and management of sepsis, enabling earlier, more personalized interventions and improving
patient outcomes.
Palavras-chave: sepsis, molecular biomarkers, modified logistic regression, machine learning, early diagnosis
★ Running for the Qiagen Digital Insights Excellence Awards
#1066406

IDENTIFICATION AND CHARACTERIZATION OF POTENTIAL ALLOSTERIC SITES FOR DESIGNING SELECTIVE INHIBITORS OF SRPK1 AND SRPK2

Autores: Lucas Cecílio Vilar,Mauricio Costa,Rafaela Salgado Ferreira
Apresentador: Lucas Cecílio Vilar • lucas.vilar12@gmail.com
Resumo:
Tumor-related diseases are among the major groups of diseases requiring new therapies. SRPK1 and SRPK2 kinases play key roles for tumor development. SRPK1 acts in the VEGF regulatory pathway, associated with angiogenesis and tumor growth. SRPK2 plays an important role in cell cycle regulation and apoptosis. Overexpression of both kinases are associated with several types of cancer. Previous studies showed that, depending on the biochemical context, the inhibition of SRPK1 can promote or prevent tumor growth, highlighting the importance of selective inhibitors for each kinase. However, there are no selective inhibitors for SRPK1 or SRPK2. All inhibitors developed for SRPK1 share residual activity in SRPK2, due to the similarity of both orthosteric sites. Searching for allosteric sites in both kinases represents a possibility to develop selective inhibitors for SRPK1 and SRPK2. The main objective of this study is to identify and characterize potential allosteric sites in SRPK1 e SRPK2 that allows the screening and development of selective inhibitors for both kinases. Atomic coordinates of both kinases were obtained from Protein Data Bank under IDs 5MY8 (SRPK1) and 2X7G (SRPK2). Regions without electron density were modelled through SwissModel, and the quality of both structures were checked with ERRAT, PROCHECK and VERIFY3D tools in SAVES6.1 server. The intrinsic movements of both kinases, in their apo and holo states, were analyzed using Normal Mode Analysis (NMA). Modes showing greater conformational variability and energetically favorable conformations were further examined through Principal Component Analysis and energy minimization, with the most relevant modes selected for detailed analysis. Cavities detection and characterization were made using PocketMiner, FTMove and EPOCK tools. The NMA revealed that the 14 lowest-frequency modes accounted for approximately 80% of the motions in both kinases. Additionally, the modes for the apo and holo states were highly similar, suggesting that the presence of the ligand does not significantly restrict the motions of the kinases. Based on that, we selected Modes 7 to 20 in apo states for further analysis. For these 14 Modes, we created ensembles by evaluating the energetic favorability of the observed motions. We performed energy minimizations by displacing the original conformation in steps of 0.2Å, ranging from -2.0Å to 2.0Å. A displaced conformation was considered energetically favorable, and added to the ensemble, if its potential energy was lower than that of the original conformation, with a threshold of 30 kcal/mol. From these 14 favorable ensembles, we selected six with higher conformational diversity through PCA to check for cavities. We found five cavities with allostery potential in both kinases and were named as S1, S2, S3, S4, S5. Now we are characterizing these sites to select cavities for virtual screening, aiming to find selective inhibitors for both kinases.
Palavras-chave: Allosteric sites, Cancer, Normal Mode Analysis, SRPK1, SRPK2
#1066522

Using molecular evolution to understand and rewire protein function

Autores: Lucas Bleicher,Lucas Carrijo,Rafael Lemos,Mariana Quezado,Laila Alves Nahum,Julia Teixeira Rodrigues,Gabriel Portwood
Apresentador: Lucas Bleicher • lbleicher@gmail.com
Resumo:
Uric acid is a product of purine metabolism. Many organisms present a degradation pathway on which uric acid is converted to 5-HIU, then to OHCU and finally to S-allantoin. That is not the case for humans, who accumulate uric acid, sometimes to the point of developing conditions such as gout. This is caused by the loss of the uricase gene among some primates, and then other upstream genes due to the lack of selective pressure for the remaining steps of this pathway. Using biophysical and computational characterization of the proteins in this pathway and its homologs, we show how the second enzyme in this pathway catalyzed its reaction during the evolution of chordates and how a gene duplication on that period generated a thyroxine binding protein, transthyretin. Then, we use this information to convert a transthyretin to a HIUase.
Palavras-chave: molecular evolution, transthyretins, HIUases
#1066544

TryGAATo: a customized bioinformatics pipeline for assessing the quality of trypanosomatid genomic assemblies

Autores: Samuel Alexandre Pimenta Carvalho,João Luis Reis Cunha,Daniella Castanheira Bartholomeu
Apresentador: Samuel Alexandre Pimenta Carvalho • samuel.apimenta@gmail.com
Resumo:
Until the early 2000s, sequencing and assembling genomic data were extremely expensive, time-consuming, and labor-intensive. The advent of high-throughput sequencing technologies, their rapid cost reduction, and the development of new bioinformatic software resulted in a significant increase in the number of assembled genomes in public repositories. However, these newer sequencing technologies generate shorter reads or exhibit higher error rate than Sanger sequencing, and the assemblers are inherently error-prone, requiring an assessment of assembly quality to ensure reliable downstream analysis.
Proper genome assembly evaluation requires assessing three key properties: contiguity, completeness and correctness. Most published assemblies, however, do not follow this principle and instead analyze only one or two of them. Additionally, the methodologies employed to assess assembly quality are often unreliable or not applicable to non-model organisms, such as trypanosomatids. These unicellular parasites, many of whom are responsible for significant human, cattle and plant diseases, possess a unique and complex genomic structure, with highly repetitive genomic content, large and polymorphic multigene families, and high variability both among and within strains and populations, complicating genome assembly and subsequent evaluation. Nevertheless, the abundance of sequenced and assembled isolates and strains renders the objective selection of the most appropriate assembly for downstream genomic analysis challenging and time-consuming.
To address this issue, we developed TryGAATo (Trypanosomatid Genomic Assembly Assessment Tool), a Python pipeline specifically tailored to evaluate trypanosomatid genomic assemblies, considering their unique genomic features and the current state of genomic data. It uses the short and/or long genomic reads that produced the assembly itself to assess its contiguity, completeness, and correctness. It then calculates a quality score ranging from 0-100, indicating how close the assembly is to the “real genome”. Due to its objective and comparative characteristics, TryGAATo assists researchers conducting genomic analyses with their own data in selecting the most suited assembly, as well as aiding those assembling their own genomes, by pinpointing areas for improvement. TryGAATo’s scripts are freely accessible at github.com/EmeraldSama94/trygaato. This work was funded by Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG).
Palavras-chave: trypanosomatids, genome evaluation, contiguity, completeness, correctness
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1068136

Development of Chitin Synthase Inhibitors for Aedes aegypti Based on Plant Defensins: A Novel Approach to Vector Control

Autores: Damares Rodrigues Silva,Lilian Silva Santos,Glauciane Danusa Coelho,Bruno Medeiros Roldão de Araújo,Rafael Trindade Maia
Apresentador: Rafael Trindade Maia • rafael.rafatrin@gmail.com
Resumo:
The Aedes aegypti mosquito is one of the primary vectors of arboviruses, contributing significantly to public health issues in urban and peri-urban areas of tropical and subtropical countries over recent decades. Vector control strategies, primarily relying on chemical and environmental methods, have been crucial in reducing the transmission of these diseases. However, the increasing resistance of the vector to conventional insecticides, coupled with rapid urbanization, has highlighted the need for novel control tools and approaches.
Targeting key mosquito biomolecules, such as chitin synthase, presents a promising strategy. This enzyme is responsible for the polymerization of chitin, a biomolecule that forms the cuticles of the epidermis and trachea, as well as the peritrophic matrix lining the intestinal epithelium. In this context, the study aimed to rationally design chitin synthase inhibitors for Aedes aegypti using computational approaches based on plant defensins.
To achieve this, molecular modeling of both chitin synthase and defensins was conducted, along with molecular docking to identify complexes with the highest affinity. Alphafold Monomer v2.0 pipeline was applied to build both chitin synthase and defensins model structures. Molecular docking simulations between chitin synthase and defensins were performed on the ClusPro server (https://cluspro.org/login.php). In addition to the metrics provided by ClusPro, the Prodigy server (https://rascar.science.uu.nl/prodigy/) was also used for the analysis of the molecular interactions. The highest-ranked complex was selected as the reference for the rational design of peptides. These complexes serve as a foundation for the rational design of potential inhibitors. The identification of the interface within the chitin synthase-defensin complex was carried out using the SPPIDER server (https://sppider.cchmc.org). The amino acid residues relevant to the protein-protein interaction were used as inputs for the PepFold tool (https://mobyle.rpbs.univ-paris-diderot.fr/cgi-bin/portal.py#forms::PEP-FOLD3). The obtained peptides were subjected to molecular docking simulations with chitin synthase, and those showing promising results were further evaluated for carcinogenicity and toxicity. To assess the potential toxicity or carcinogenicity of some of the peptides, an analysis was conducted using ToxinPred (crdd.osdd.net/raghava/toxinpred/). Peptides that passed this stage were subsequently optimized. All ramachandran plots showed more than 90% of residues in the most favoured regions. Docking bindin energies among Chitin Synthase and defensins ranged from -25,8 to -17,7. The results suggest that the peptides generated exhibit strong, stable interactions with chitin synthase, indicating potential inhibitory activity.
The approach presented in this study offers a promising alternative to traditional synthetic insecticides, potentially reducing the Aedes aegypti population and, as a result, decreasing disease transmission.
Palavras-chave: Arboviruses, Enzyme, Molecular docking.
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1068229

Modeling Bioinspired Peptide Mimics of Fungal Laccases: Promising Candidates for Bioremediation Applications.

Autores: Lilian Silva Santos,Damares Rodrigues Silva,Glauciane Danusa Coelho,Bruno Medeiros Roldão de Araújo,Rafael Trindade Maia
Apresentador: Rafael Trindade Maia • rafael.rafatrin@gmail.com
Resumo:
Laccase is a multicopper oxidase enzyme typically containing four copper atoms in its structure, responsible for the oxidation of various substrates. Due to its nonspecific nature and high redox potential, this enzyme has been widely investigated for its potential use in bioremediation processes. In recent years, pollution has intensified, primarily due to population growth, rapid industrialization, and the inadequate management of urban solid waste (USW). Leachate, a toxic liquid effluent, is mainly produced in open dumps and landfills from the decomposition of organic matter.
This study aimed to evaluate, through in silico simulations, the interaction between the theoretical model of laccase from Pleurotus ostreatus and several pollutants commonly found in landfill leachates. The Autodock Vina software were used for molecular docking calculations. Docking simulations were run using Lamarckian Genetic algorithm (LGA). The grid points for Autogrid calculations were set to be 52 × 52 × 52 Å with the active site residues at the center of the grid box. The docking parameters were set to a LGA calculation of 10,000 runs. The energy evaluations were set to 1,500,000 and 27,000 generations. The Population size was set to 150 and the rate of gene mutation and the rate of gene crossover were set to 0.02 and 0.8, respectively. The obtained conformations were then summarized, collected and extracted by using Autodock Tools. The first and the last conformation was analyzed from a 10-ranked set of each complex using the VMD-Visual Molecular Dynamics. Ten compounds were selected, including methoxyethyl acetate, 2-methyl-4,6-dinitrophenol, and quinoline-2-carboxylic acid. The simulation results revealed that laccase exhibits strong affinity for several of the tested compounds, with Gibbs free energy values ranging from -8.3 kcal/mol to -6.0 kcal/mol. The compound 3-methyl-1,2-dihydrobenzo[j]aceantrylen-1-ol demonstrated the highest affinity (-8.3 kcal/mol), followed by quinoline-2-carboxylic acid (-6.2 kcal/mol) and 9,9-dimethyl-10H-acridine (-6.5 kcal/mol).
These compounds formed stable interactions with key amino acid residues at the active sites of laccase, including ALA 106, GLY 227, ASN 229, and TYR 246. These residues are essential for maintaining the stability of the enzyme complex and for facilitating its catalytic activity.
Palavras-chave: Laccase, docking, ligand.
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1068599

RNA-Seq and Molecular Docking approaches reveals LINC01133 interactions in Hs578T Triple-Negative Breast Cancer cells

Autores: Leandro Teodoro,Milton Yutaka Nishiyama Junior,Mari Cleide Sogayar
Apresentador: Leandro Teodoro • teoolt.bio@gmail.com
Resumo:
Breast cancer constitutes a group of diseases characterized by significant molecular heterogeneity. Among the subtypes, triple-negative breast cancer (TNBC), which does not express the main therapeutic biomarkers, presents the worst prognosis and high lethality. In search for new molecules that could help understanding the mechanisms involved in TNBC tumor progression, the lncRNA LINC01133 has emerged as a molecule of great interest, with potential tumor-suppressive activity. Here, we conducted integrated in silico analysis using RNA-Seq data obtained from both parental and genetically modified human TNBC Hs578T cell lines, in which, LINC01133 was knocked out by CRISPR/Cas9 in order to enrich for differentially expressed genes (DEGs). The aim was to explore possible correlations between various DEGs and pathways involved in tumor progression, as well as to analyze the predicted protein interactions of two DEGs, namely: ITIH5 and KAZN, via molecular docking with LINC01133. In silico analyses highlighted several DEGs associated with TNBC tumor progression. Ontological pathway analyses revealed strong associations between LINC01133 knockout, positive modulation of cell migration processes, increased transcriptional expression of pro-inflammatory protein-coding genes, and a substantial upregulation of two genes linked to desmosomal and cell adhesion processes, KAZN and ITIH5, respectively. Molecular docking studies between predicted active sites of LINC01133/KAZN and LINC01133/ITIH5 demonstrated strong electrostatic and hydrophobic interactions between the lncRNA and the proteins, suggesting a scaffold function of LINC01133 for these proteins. These findings indicate that LINC01133 may influence TNBC progression through interaction with proteins such as KAZN and ITIH5. Its potential as a biomarker and/or therapeutic target warrants further investigation to better clarify its actual role and significance.
Palavras-chave: Triple-negative breast cancer, lncRNA LINC01133, RNA-Seq, Gene enrichment analysis, Molecular docking
#1068750

The “sweet thirsty spider”: Building a drought-related sugarcane gene network

Autores: Pedro Cristovão Carvalho,Danyel Contiliani,Renato Gustavo Hoffmann Bombardelli,Silvana A Creste Dias de Souza,Claudia Barros Monteiro Vitorello,Antonio Figueira
Apresentador: Pedro Cristovão Carvalho • carvalhopc@usp.br
Resumo:
The development of new crop cultivars represents an important step for food security under a scenario of climate change and exponential population growth. In this context, plant breeding requires a refined selection of candidate genes to ensure that subsequent years of development will produce significant results using biotechnological approaches. However, for sugarcane (Saccharum spp.), the selection of candidate genes is hindered by the lack of genome annotation, generally established by orthology, and might not reflect the actual information regarding the gene functions and related pathways. System biology may help to overcome the lack of annotation, allowing the integration of diverse omics data to develop more robust biological networks with solid information about candidate genes and their effects in related pathways. Here, we describe the development of a drought-related co-expression network based on 16 publicly available transcriptome data from sugarcane upon contrasting water regimes. All data was standardized against the same assembled transcriptome, providing patterns of differential expression of sugarcane transcripts under water deficit, which were then analyzed for the presence of eigengenes and expression hubs by weighted correlation network analysis (WGCNA), establishing the edges between sugarcane nodes. The initial network comprises 36,265 nodes, representing the group of transcripts previously described within five clusters and 1,108,458 edges, including possible interactions among those transcripts. All the nodes from the initial network were further annotated by BLAST, EggNOG, and Interpro, aggregating all the available information for function, related pathways, orthologs, and domains for each node. Further, results from BLAST, when cross-referenced with the STRING database, revealed protein-protein interactions, which can reinforce the original edges from the WGCNA analysis while providing novel information regarding the direct interaction between nodes, together with the metabolic interaction information from KEGG. Even though the network was developed based on specific transcriptomes, it is being future-proofed by enrichment with all the corresponding gene and transcript identifications from other available genomes and transcriptomes, allowing analysis with any current or future dataset.
Palavras-chave: climate change, Saccharum, sugarcane, systems biology, water deficit
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1068821

Enhancing Melanoma Detection with Deep Learning: A Systematic Review of Current Trends and Challenges

Autores: Graziela Sória Virgens,Danilo Trabuco do Amaral,Emmanuel Iarussi,João Alfredo Teodoro
Apresentador: Graziela Sória Virgens • graziela.soria@aluno.ufabc.edu.br
Resumo:
Melanoma, while not the most prevalent type of skin cancer, has one of the highest mortality rates if not detected early. Traditional diagnostic methods, such as biopsies, are invasive, but recent advancements in artificial intelligence (AI), particularly deep learning (DL) approaches, have shown significant promise in the non-invasive and accurate detection of melanoma using medical images. This systematic review explores the current landscape of DL applications for melanoma detection, highlighting trends, gaps, and challenges in model replicability and generalization. Our analysis reveals that models trained on image datasets from European populations exhibit high accuracy for those specific groups, but their performance declines when applied to populations with different skin phototypes across other continents. Despite the availability of public datasets like ISIC and HAM10000, many studies lack transparency in data usage, which compromises the reproducibility of results. We observed a prevalent use of image resolutions of 224x224 pixels for segmentation tasks and popular architectures like ResNet and Inception; however, these studies often omit critical methodological details. This review underscores the urgent need to incorporate more diverse and high-quality data to improve the global applicability of DL models in melanoma detection. We also discuss significant barriers, such as variability in image quality and the opacity of models, which impede clinical adoption. Finally, we advocate for the standardization of datasets and increased sharing of models to foster comparison and encourage future research.
Palavras-chave: melanoma deep learning, artificial intelligence, phototypes, datasets.
#1068859

Computational investigation of heavy metal bioremediation potential in Cereus jamacaru D.C. proteins

Autores: João Alfredo Teodoro,Danilo Trabuco do Amaral
Apresentador: João Alfredo Teodoro • joao.alfredoteodoro@gmail.com
Resumo:
Bioremediation of heavy metals is an ecological and sustainable strategy for mitigating soil pollution, utilizing the natural capabilities of plants to restore degraded ecosystems. In silico bioprospecting studies play a pivotal role in identifying novel bioproducts for bioremediation and related applications. These studies facilitate the efficient exploration of molecular biodiversity at reduced costs and with minimal environmental impact. By leveraging the intrinsic abilities of organisms to neutralize pollutants, this approach accelerates the discovery of sustainable solutions for ecosystem decontamination. In this study, we investigated the potential of Cereus jamacaru (Mandacaru), a cactus native to the Brazilian Caatinga biome, for bioremediation of environments contaminated with heavy metals such as As, Cd, Cr, Cu, Hg, and Pb. Furthermore, we analyzed the genes and metabolic pathways influenced by these metals. To achieve this, bioinformatics techniques were applied to identify proteins with the potential to interact with heavy metals, utilizing transcriptomic data in combination with molecular docking techniques. The analysis of protein-metal interactions revealed promising patterns, suggesting the feasibility of simultaneous remediation in contaminated environments. Our findings highlight beta-amylase, UDP-glucose 4,6-dehydratase, and Chromophore Lyase CRL as the most promising candidates for binding arsenate, arsenite, and methylmercury. Meanwhile, Myosin 6 and Histone-Lysine N-Methyltransferase ATXR7 demonstrated strong potential for binding lead oxide and hexavalent chromium. Additionally, we identified key physiological mechanisms and metabolic pathways involved in metal detoxification, as well as the ability of certain proteins to interact with multiple heavy metals. These findings highlight potential strategies for the bioremediation of contaminated soil, water, and wastewater, where heavy metals accumulate in plant tissues. Our results underscore the identification of proteins with in silico interactions with heavy metals, though experimental validation remains necessary to confirm these findings.
Palavras-chave: Bioprospecting, Bioremediation, Heavy metals, Mandacaru, Molecular Docking.
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1075834

POTENTIAL OF Pleurotus ostreatus LACCASE TO DEGRADE MICROPOLLUTANTS PRESENT IN LANDFILL LEACHATE, AN IN SILICO STUDY

Autores: Isac Antonio Alcantara Queiroz,Adriel Martins Borges,Rafael Trindade Maia,Glauciane Danusa Coelho
Apresentador: Adriel Martins Borges • adriel852456@gmail.com
Resumo:
Population growth, increased purchasing power, and unconscious consumption
have led to a greater generation of municipal solid waste (MSW) in recent years, which
is typically disposed of in landfills. In Paraíba (PB), more than 2,737 tons of MSW are
produced daily. The moisture present in landfills promotes the production of leachate,
which contains various compounds, including micropollutants, chemical substances
with high toxic potential that can cause harm to human health and the environment even
in small quantities. Oliveira (2019) identified 54 micropollutants in the leachate from
the Metropolitan Sanitary Landfill of João Pessoa (MSLJP), highlighting the urgency of
treating this wastewater. In general, biological treatments have lower costs than
chemical and physical treatments. Studies of anaerobic digestion show that the toxicity
of the micropollutants present in the leachate limits the efficiency of the process. In this
context, basidiomycete fungi emerge as an alternative due to the excretion of a
ligninolytic complex, which includes laccase, an enzyme recognized for its ability to
degrade xenobiotic compounds. This research takes an innovative approach by
evaluating the potential of laccase from the fungus Pleurotus ostreatus to degrade
components of MSLJP leachate through molecular docking studies. Three-dimensional
structures of the ligands (micropollutants) were obtained from databases (PubChem and

ChemSpider). A theoretical model of laccase constructed in Modeller was employed in
molecular docking using UCSF Chimera and AutoDock Vina. The results were
analyzed in Discovery Studio. A screening of the most concerning molecules was
conducted based on a literature review, aiming to select the molecules with the greatest
potential to pose risks to human, animal, and environmental health. During the research,
it was identified that 53.7% of the micropollutants studied originated from
pharmaceuticals. Thirty-five molecules were selected for the molecular docking study.
Twenty-five molecules were found to be degradable by laccase as they exhibited
negative binding free energy and hydrogen bonds with amino acids present in the
enzyme&#39;s active site. Thirteen molecules (Phthalylsulfathiazole, Trichlormethiazide,
Prednisone, Hydrochlorothiazide, Sulfasomizole, Tectochrysin, Monobutylphthalate,
3,3&#39;-Diaminobenzidine, Benzoic acid-2-hydroxy-1-methylethyl-Ester, Phenicarbazide,
Pyrazinamide, and N-Nitrosopiperazine) formed hydrogen bond interactions with the
amino acids HIS 458 and ASP 208, which are electron and proton acceptors during the
enzymatic reaction, demonstrating the viability of the reaction. The compounds formed
bonds with other amino acids, including ASN 266, ASN 210, ARG 270, GLY 268,
GLY 394, PRO 396, and ASN 210. Among these, glycine 394 belongs to the active site
of the ligninolytic enzyme laccase. In silico analyses indicated that most of the most
concerning micropollutants present in MSLJP leachate could be degraded by P.
ostreatus laccase. This study encourages the performance of in vivo and in vitro studies
for MSLJP leachate treatment, contributing to the reduction of the toxicity of these
compounds.
Palavras-chave: Emerging micro pollutants; Bioremediation; Leachate; Molecular docking
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1075887

Gut-lung microbiota in piglets

Autores: Franciele Maboni Siqueira,Gabriela Breyer,Maria Eduarda Rocha,Ana Paula Muterle varela,Michele B. Mann,Jeverson Frazzon,FABIANA QUOOS MAYER
Apresentador: Franciele Maboni Siqueira • franmaboni@gmail.com
Resumo:
Enterotoxigenic Escherichia coli (ETEC) is related to piglets´ diarrhea. The effect of ETEC on the microbiota gut-lung axis was not yet described. Thus, using high throughput sequencing approaches, we explored the changes in gut-lung microbiota in healthy, asymptomatic ETEC-carriers and diarrheic piglets. Lugs and feces from animals were subjected to the V4-16S-rDNA metabarcoding sequencing. QIIIME2 v2020.2 with DADA2 package was used to merge paired-end sequences, remove chimeras, and cluster reads into amplicon sequence variants (ASV). ASV taxonomic assignment was performed with Silva 138.1 database. Bacterial communities were analyzed by phyloseq package in RStudio. Diversity and abundance indices, comparisons among groups and dysbiosis scores were analyzed. Taxa markers were searched using Discriminant Analysis Effect Size (LEfSe). Biological function prediction in such sites was performed using PICRUSt2 based on MetaCyc Metabolic Pathway Database. Principal component analysis (PCA) was performed to assess the distribution of the predicted pathways in the investigated groups. Bacterial communities were less diverse in the respiratory tract than in the gut with a site-specific composition; however, diversity was similar among the ETEC-carrier states in both sites. A dysbiosis event was observed in one diarrheic piglet in both gastrointestinal and respiratory tracts, but ETEC-carrier state tended to shift gut-lung microbiota away from the normobiosis. The most abundant bacteria taxa included Prevotella, Clostridia vadin BB60 group, and Treponema in gut, and Lactobacillus, Fusobacterium, and Oceanobacter in lung. We identified exclusive core bacteria taxa and genus biomarkers for ETEC-carrier states in both sites. Microbiome functional prediction showed that most metabolic pathways harbored by bacterial communities are shared despite ETEC-carrier state, and differences are limited to gut.
Palavras-chave: Post-weaning diarrhea, ETEC, 16S-rDNA sequencing, biomarker bacteria.
★ Running for the Qiagen Digital Insights Excellence Awards
#1078221

Deep Learning for Prostate Cancer Diagnosis: Predicting ISUP Grade and Gleason Score through Image Analysis.

Autores: Jailson Silva,Júlia Helena Ortiz
Apresentador: Jailson Silva • jailson.silva@estudante.ifb.edu.br
Resumo:
This project aims to create a deep learning model for the analysis
of prostate cancer biopsy images, with the goal of predicting two crucial
parameters: the ISUP Grade and the Gleason Score. Prostate cancer is
one of the most prevalent and aggressive diseases among men, and early
detection, along with appropriate treatment, is essential to improving
survival rates and patients’ quality of life. The analysis of biopsy images,
through advanced deep learning techniques, offers a promising path to
optimize diagnosis and reduce human errors.
The technical part of the project involves the use of Convolutional
Neural Networks (CNNs), a powerful architecture for image processing.
The biopsy images, provided in microscopic view format, were resized
and normalized to facilitate model training. The network was designed
to perform two main tasks: classify the ISUP Grade, which has five lev-
els of aggressiveness, and predict the Gleason Score, a continuous value
that reflects the aggressiveness of cancer based on cell growth patterns.
During training, the model was evaluated using precision metrics for the
ISUP Grade and mean absolute error for the Gleason Score, using the
TensorFlow and Keras frameworks.
The use of CNNs is essential, as these networks are highly efficient in
identifying spatial patterns in images, such as edges and textures, which
are critical for accurate classification and precise prediction. Based on this
analysis, the model was trained and adjusted to improve the accuracy of
predictions, comparing the model’s results with human evaluations. The
comparison between the model’s predictions and medical experts’ evalu-
ations generated a DataFrame, which allowed for an objective analysis of
the results.
From a social perspective, this project has a significant impact, as it
seeks to improve the diagnosis and treatment of prostate cancer, a disease
that affects millions of men worldwide. The automation of biopsy image
analysis can speed up diagnosis, reduce human errors, and provide more
accurate early detection. This is crucial, as fast and precise diagnoses
increase the chances of effective treatment and can save lives.
Additionally, the developed model can be implemented in hospitals
and healthcare centers in remote areas or regions with limited resources,
where the shortage of specialized doctors can delay diagnosis. The use
1
of artificial intelligence allows for the democratization of access to high-
quality diagnoses, even in places with infrastructure limitations. In this
way, more patients could receive appropriate treatment, regardless of ge-
ographical location.
In summary, the integration of deep learning technologies with medical
knowledge can bring a revolution in prostate cancer diagnosis, increasing
efficiency and accuracy, while also providing greater accessibility and so-
cial justice in healthcare. The collaboration between artificial intelligence
and human experts allows technology to assist doctors, improving clini-
cal decisions and having a significant social impact by improving survival
rates and patients’ quality of life
Palavras-chave: Deep Learning, Prostate Cancer, Medical Diagnosis, Convolutional Neural Networks, Image Analysis.
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1078619

Integration of online resources and molecular dynamics simulations to identify a putative biological target for antitumoral quinoline derivatives

Autores: Raynan Veras de Freitas,Jessika de oliveira viana,Marília Cecília Da Silva,Tayná Rodrigues Olegário,Lidiane Gomes de Araújo,Yuri Virgilio dos Santos,Participante 78477,Desconhecido6208,Cláudio Gabriel Lima Júnior,Karen Cacilda Weber
Apresentador: Raynan Veras de Freitas • raynan.veras@academico.ufpb.br
Resumo:
Cancer is a complex and multifactorial disease characterized by the accumulation of mutations in an individual's genetic material, disrupting essential metabolic pathways. Treating cancer often involves high-cost medications that can be difficult to obtain. Quinoline derivatives are a promising framework for developing new drugs due to their synthetic advantages and wide-ranging pharmacological effects, which include antimicrobial, antiviral, antimalarial, and antitumoral properties. Additionally, they are relatively easy to synthesize and are cost-effective. To produce low-cost drugs, computational techniques are employed in drug design to optimize the search for new compounds. Among these techniques, classical physics-based methods, such as docking and molecular dynamics, are widely used. Additionally, in silico target fishing techniques are highly beneficial as they help identify biological targets for a molecule based on its chemical structure and the structure of biological receptors. Recently, the integration of machine learning algorithms with graph-based representations of chemical structures has gained attention, providing effective screening platforms for biologically active molecules. In this work, a new series of 19 quinoline derivatives has been synthesized and submitted to the target fishing online servers SwissTarget, ChemMAP, SEA, and TargetNet to identify putative targets. After identifying a promising target for antitumoral activity, ligand-receptor docking of the entire series was conducted in GOLD 2022.3.0 to evaluate the molecules based on their binding affinities to the target.
200 ns molecular dynamics simulations were carried out in Gromacs 2021.2 for the three best-ranked molecules (QNL-2d, QNL-3e, and QNL-3f) in complex with the identified target. Also, these molecules were submitted to the online platform pdCSM-Cancer to predict cell line specific antitumoral activity. All four target fishing servers have suggested Dihydroorotate Dehydrogenase (DHODH) as the highest-scored target for the query molecule (QNL-1g). DHODH is a validated target for cancer because its inhibition disrupts nucleotide synthesis, impairing DNA replication and RNA synthesis in rapidly proliferating cancer cells, thereby inducing anti-proliferative and pro-apoptotic effects. Our molecular dynamics simulations demonstrated that stable complexes can be formed with this enzyme. The pdSCM-Cancer server suggested that the three compounds can be active in the following cell lines: MCF7, MDA_MB_468, T_47D, SF_539, SNB_19, SNB_78, U251, HCT_15, HCT_116, SW_620, K562, MOLT_4, P388, SR, OVCAR_3, OVCAR_8, PC_3, CAKI_1, SN12K1 and UO_31. Experimental tests have confirmed that QNL-2d is active against human leukemia cell line K562, with an IC50 of 16.6 µM. Our findings indicate that QNL-2d has potential as a promising candidate for leukemia treatment, and its interaction with DHODH may serve as a possible mechanism of action. We are conducting additional experiments to further develop this molecule and advance it to the next stages of drug development.
Palavras-chave: Machine learning in cancer drug discovery, Quinoline derivatives, Dihydroorotate Dehydrogenase (DHODH), Molecular dynamics simulations, In silico screening
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1088834

EG-Net: A Gene Regulatory Network for Precision Genetic Engineering in Oil Palm (Elaeis guineensis Jacq.) breeding.

Autores: Thalliton Luiz Carvalho da Silva,Wellington Rangel dos Santos,Roberto Coiti Togawa,MANOEL TEIXEIRA SOUZA JUNIOR
Apresentador: MANOEL TEIXEIRA SOUZA JUNIOR • manoel.souza@embrapa.br
Resumo:
The African oil palm (Elaeis guineensis Jacq.) is the world's primary source of edible vegetable oil. Its long juvenile period (around 3 years to first harvest) and extended time to peak yield (8 to 10 years), despite a substantial lifespan of 25 to 30 years, significantly hinder variety improvement through traditional breeding. Plant genetic engineering and editing offer compelling alternatives by enabling targeted genetic modifications, unlike conventional breeding where desirable traits are often linked to undesirable ones. This precision is crucial for a long-term crop like oil palm, requiring careful gene selection to ensure intended outcomes without disrupting other vital functions. Gene Regulatory Networks (GRNs), which orchestrate gene expression through the interactions of diverse regulatory elements (transcription factors, long non-coding RNAs, miRNAs, and epigenetic regulators) and their target genes, provide a holistic understanding of these relationships. Leveraging expression data from 21,852 features (genes and proteins from the oil palm reference genome) across 428 in-house (Embrapa's) and public transcriptome datasets and 3,069 potential regulators, a GRN, named EG_Net, was constructed to predict the influence of each potential regulator on each target gene. This is the first comprehensive, genome-wide gene regulatory network for this economically vital oilseed crop and serves as a critical resource for prioritizing candidate genes in oil palm improvement programs. Functional validation through targeted case studies focused on informed gene selection to enhance tolerance or resistance to critical abiotic (drought and salinity) and biotic stresses (red ring and fatal yellowing diseases) revealed EG_Net's ability to illuminate gene-regulatory interactions and guide precise genetic engineering strategies, offering valuable insights into oil palm biology and breeding. EG_Net already serves as a powerful tool to de-risk the often arbitrary selection of genes for genetic engineering or editing in oil palm breeding programs, and its reliability is poised to increase with the availability of more transcriptome datasets for this species. Furthermore, the analytical framework presented is broadly applicable for deciphering gene regulatory networks in a wide spectrum of perennial plants.
Palavras-chave: Precision breeding, transcription factor, long non-coding RNAs, epigenetic regulators, red ring disease, fatal yellowing disease
★ Running for the Qiagen Digital Insights Excellence Awards
#1091941

EFEITOS COLATERAIS DA QUIMIOTERAPIA NA SAÚDE MENTAL DOS PACIENTES ONCOLÓGICOS

Autores: ANA MAIARA MARTINS DE OLIVEIRA,Marcelo Gomes da Silva,Francisco Ronald da Silva Arruda,Geórgia Maria Melo Feijão,Camila Maria de Oliveira Ramos
Apresentador: ANA MAIARA MARTINS DE OLIVEIRA • maiaramartins.psi@gmail.com
Resumo:
INTRODUÇÃO
O câncer é amplamente reconhecido como um problema de saúde pública mundial, representando um conjunto de doenças identificadas pelo crescimento desordenado de células que podem invadir tecidos ou órgãos. Segundo o Instituto Nacional do Câncer (INCA) (2022), o termo "câncer" abrange mais de 100 tipos de doenças malignas, nas quais as células se multiplicam de forma descontrolada, formando tumores que podem ser benignos ou malignos. Essas células malignas, devido a alterações genéticas, têm a capacidade de se espalhar para outras partes do corpo, processo conhecido como metástase (Assunção, 2023).
De acordo com a Agência Internacional de Pesquisa sobre o Câncer (IARC), órgão vinculado à Organização Mundial da Saúde (OMS) (2024), a estimativa global de novos casos de câncer para o ano de 2022 foi de 20 milhões, com 9, 7 milhões de óbitos registrados. O projeto é que 1 em cada 5 pessoas no mundo desenvolverá câncer ao longo da vida, um dado alarmante que reforça a magnitude do problema. No Brasil, entre os anos de 2023 e 2025, estima-se uma média anual de 704 mil novos casos de câncer, com as regiões Sul e Sudeste concentrando as maiores taxas de incidência. Especificamente no estado do Ceará, a previsão é de mais de 94 mil novos casos até 2025 (INCA, 2022).
Além dos impactos físicos provocados pelo câncer, a doença compromete significativamente a qualidade de vida dos pacientes em múltiplas dimensões. Os efeitos emocionais mais comuns incluem ansiedade, depressão e o medo da morte, fatores que geram conflitos internos e desestabilizam o equilíbrio emocional do indivíduo (Silva, 2022). Neste contexto, a quimioterapia, uma das abordagens terapêuticas mais utilizadas no tratamento oncológico, é diretamente sobre as células cancerígenas, mas também afeta células saudáveis, provocando uma série de efeitos colaterais indesejáveis. Entre os mais comuns estão náuseas, perda de apetite, fadiga extrema, queda de cabelo e alterações no paladar, além de impactos significativos no bem-estar mental e emocional dos pacientes (Silveira, 2021).
Esses efeitos colaterais, tanto físicos quanto emocionais, podem intensificar sentimentos de angústia e incerteza em relação ao futuro, levando muitos pacientes a experimentar um estado de vulnerabilidade psicológica durante o tratamento. Nesse cenário, o acompanhamento psicológico desde o diagnóstico torna-se essencial para mitigar os impactos negativos da doença e melhorar a qualidade de vida dos pacientes. Intervenções psicológicas podem auxiliar no enfrentamento dos desafios emocionais que surgem ao longo do tratamento, fornecendo suporte para lidar com o estresse, o medo e as mudanças corporais e sociais decorrentes do câncer (Gomes, 2019).
Dentro desse contexto, a psico-oncologia desempenha um papel central no suporte aos pacientes com câncer. Essa especialidade oferece apoio psicológico não apenas aos pacientes, mas também aos seus familiares e às equipes de saúde envolvidas no tratamento. Ao proporcionar um suporte emocional contínuo, a psico-oncologia ajuda a promover o equilíbrio emocional e o enfrentamento saudável durante todas as etapas do tratamento oncológico, desde o diagnóstico até a fase de reabilitação ou cuidados paliativos (Assunção, 2023).
Palavras-chave: Efeitos Colaterais, Oncologia, Saúde Mental
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1094449

Modeling food web structure and complexity in tropical and temperate macroinvertebrate communities.

Autores: Pedro Pontes Bueno Guerra,José L. Mello,Erika,Julia Maria Braga,Laura Joia Venuso,Gretel Bauer,Brian Bush,Bryan L. Brown,Christopher M. Swan,Kurt E. Anderson,Victor Satoru Saito
Apresentador: Pedro Pontes Bueno Guerra • ppbguerra@gmail.com
Resumo:
Biodiversity organization is affected by ecosystem temperature, which influences fundamental biological processes such as metabolic rates, growth patterns, and life cycle dynamics. These biological processes are critical for the functioning of ecosystems, as they influence the energy flow within food webs. Despite this, the impact of temperature variations on trophic interactions and food web dynamics remains insufficiently explored.
This study investigates differences in macroinvertebrate food web diversity, complexity, and structure across two regions with distinct climate characteristics. Monthly sampling was carried out over a one-year period in preserved first- to third-order tropical (Intervales State Park, São Paulo, Brazil) and temperate (Stoney Creek Watershed, Virginia, United States) streams. Individuals were identified, measured, and biomass was estimated using allometric size-mass equations. A model was constructed to generate hypothetical food webs following two key criteria: (i) organisms were assigned to general trophic groups (detritivores, herbivores, omnivores, predators), and (ii) feeding interactions were determined based on size relationships, with predator-prey mass ratios constrained between 10 and 50 times.
Preliminary results revealed comparable family-level diversity between tropical and temperate food webs; however, greater complexity was observed in tropical food webs across metrics such as connectivity, connectance, and linkage density. This higher complexity appears to be linked to the presence of larger predators, both in terms of abundance and average size, resulting in a greater number of interactions and an increase in network complexity. These findings underscore the importance of further investigations into the factors governing trophic network complexity and the influence of temperature on ecosystem function and structure. A deeper understanding of these processes is essential for predicting ecological responses to climate change and for conservation efforts.
Palavras-chave: biodiversity, climate, complexity, food webs
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1096552

Methylome in Bovine Tissues: Comparison Between Individuals Generated by Assisted Reproductive Technologies (ART) and Natural Reproduction

Autores: Jéssica Macedo Rafael de Arruda,Letícia dos Santos Nascimento,Paulo Vítor Maravilha Braga,Letícia Romualdo Santos,Pethersen José Moraes Dos Reis Bueno Rocha,Bianca Moreira,MARCIO VALERIANO DA SILVA JUNIOR,Anna Beatriz Masiero Celidonio,PAULA MAGNELLI MANGIAVACCHI,Álvaro Fabrício Lopes
Apresentador: Jéssica Macedo Rafael de Arruda • jessica.crei@gmail.com
Resumo:
Assisted Reproductive Technologies (ART), such as in vitro fertilization (IVF) and somatic cell nuclear transfer (SCNT), have led to significant advances in livestock production, stem cell research, the reproduction of endangered species, and the creation of transgenic animals from genetically modified cells. In cattle, these methodologies offer substantial advantages due to their economic importance in food production, agricultural inputs, the biomedical industry, therapeutic cloning, and biopharmaceuticals. Additionally, they enable the production of genetically superior offspring in a shorter period compared to natural methods. However, these technologies, especially cloning, are often associated with epigenetic alterations that may compromise embryonic development, reduce pregnancy rates, and affect the health of animals in adulthood. During nuclear reprogramming, some epigenetic marks from the somatic cell may persist, partially maintaining the epigenetic memory of the donor cell and leading to gene deregulation. This can result in abnormal demethylation and remethylation patterns, as well as significant differences in blastocyst cell composition. Furthermore, artificial practices such as ovarian stimulation and in vitro maturation (IVM) may interfere with the acquisition of maternal imprinting in oocytes, particularly in prepubertal donors, contributing to an increased incidence of anomalies such as large offspring syndrome (LOS). The identification of new differentially methylated regions (DMRs) may provide evidence that individuals generated by ART exhibit distinct epigenetic patterns compared to naturally conceived animals. The research group has already conducted DMR analysis in bovine clones for the H19 (H19-DMR), KCNQ1OT1 (KvDMR1), and PEG1/MEST (PEG1-DMR) genes in liver and heart tissues, detecting specific alterations in DNA methylation at a multilocus level. Based on the group's results, new global DNA methylation analyses will be conducted on other tissues, particularly muscle and blood, using WGBS (Whole Genome Bisulfite Sequencing) samples. These tissues were selected because of the availability of samples from both ART-generated and naturally conceived cattle, allowing a direct comparison of epigenetic alterations between these groups. The DNA methylation analyses will be performed using bioinformatics pipelines from data obtained through WGBS. Tools such as Trim Galore, FastQC, and WGBStools will be used for data processing and analysis. The identification of new DMRs in imprinted genes is expected to provide insights into how ARTs negatively impact the bovine methylome, contributing to the understanding of syndromes associated with these reproductive technologies.
Palavras-chave: DNA methylation, bovine, Assisted Reproductive Technologies, ART, in vitro fertilization, FIV, somatic cell nuclear transfer, SCNT, epigenetics, epigenetic reprogramming, differentially methylated regions, DMR, Whole Genome Bisulfite Sequencing, WGBS, bioinformatics, Trim Galore, FastQC, WGBStools, bovine clones, reproductive technologies
★ This work is running for the Next Generation Bioinfo Award
#1098436

From Algorithm to Cure: Computational Strategies Against Emerging Pathogens

Autores: MONIQUE MENEGACI BARBOSA,Mayla Abrahim Costa,Beatriz de Castro Fialho,Eduardo Krempser da Silva
Apresentador: MONIQUE MENEGACI BARBOSA • niquemenegaci@gmail.com
Resumo:
Introduction: The growing threat of antimicrobial resistance (AMR) poses a critical challenge to global public health, necessitating the urgent development of new effective strategies for infection prevention and treatment. The integration of bioinformatics and machine learning offers a promising solution by enabling the rapid identification of vaccine targets and the development of vaccines against emerging pathogens, such as Mycobacterium tuberculosis, which exhibits high drug resistance. Objective: To develop an innovative computational tool for vaccine target prospection using machine learning and bioinformatics techniques. This tool will be applied to the in silico design of RNA-based vaccines optimized for pathogens prioritized by public health, with an initial focus on Mycobacterium tuberculosis. Methods: The research will utilize public databases like DrugBank to identify priority vaccine targets. Global alignments will be conducted using the MAFFT tool to construct consensus sequences of identified proteins. Subsequently, immunogenicity, allergenicity, and toxicity predictions will be performed on the obtained sequences. The final tool will integrate these data into a user-friendly interface for rapid and accurate analysis across multiple pathogens. Results: Preliminary results suggest that the developed tool has significant potential to accelerate vaccine target identification processes, enabling the creation of optimized sequences for use in messenger RNA vaccines. Initial tests indicate that integrating bioinformatics with machine learning not only improves the accuracy of vaccine target predictions but also reduces costs and time compared to traditional methods. The in silico prospection of new vaccine antigens against Mycobacterium tuberculosis could identify promising targets for validation in subsequent research phases. Conclusion: The developed tool represents an innovative approach to vaccine design, leveraging machine learning and bioinformatics to transform immunization strategies against emerging pathogens and address the pressing challenges of antimicrobial resistance (AMR). By streamlining the discovery and development of immunobiologicals, this methodology not only accelerates the vaccine research cycle but also enhances global preparedness for microbiological emergencies, driving progress in Science, Technology, and Innovation. Importantly, its application holds significant promise for low- and middle-income countries (LMICs), offering scalable and cost-effective solutions to combat AMR and improve public health outcomes worldwide.
Palavras-chave: Machine Learning, Antimicrobial Resistance (AMR), Vaccines, Technological Foresight, Emerging Pathogens
#1100736

RNAi-Mediated Immune Response in the Fungus Cultivated by Leafcutter Ants

Autores: Sabrina Ferreira De Santana,Participante 30099,Lucas Yago Melo Ferreira,Roenick Proveti Olmo,Paula Luize Camargos Fonseca,Jonathan Zvi Shik,Eric Roberto Guimarães Rocha Aguiar
Apresentador: Sabrina Ferreira De Santana • sabrinabiotec2@gmail.com
Resumo:
Leafcutter ants (Atta spp.) dominate Neotropical ecosystems as keystone species and agricultural pests, causing an estimated $8 billion annually in crop damage across the Americas. Their ecological and economic impact stems from an obligate mutualism with the fungus Leucoagaricus gongylophorus, which they cultivate as food in massive underground gardens. While this agricultural system is remarkably efficient, the ants must maintain healthy fungal crops, yet viral pathogens (mycoviruses) that could destabilize this mutualism remain poorly understood. Our study focused on two recently discovered mycoviruses (LgMV1 and LgTlV1), investigating the fungal RNA interference (RNAi) pathway as a potential defense mechanism. We hypothesized that analyzing small RNA (sRNA) profiles could reveal viral pathogenicity, as often harmful viruses trigger RNAi-based immune response. Through integrated in silico analyses and multiple RNA deep sequencing strategies, we demonstrated that the fungus possesses a complete and functional RNAi system, with essential genes (Dicer, Argonaute, and RdRp) actively producing small RNAs (sRNAs). Our results revealed that the fungus's RNAi system responds differently to each virus: while LgMV1 triggered a strong immune response, with the production of characteristic vsiRNAs (viral small RNAs, 20-22 nt long and with 5' uracil bias), LgTlV1 did not induce a significant reaction. This differential response suggests either: a neutral or mutualistic relationship where the virus provides beneficial functions, an evasion mechanism such as replication inside protected cellular compartments (mitochondria or vesicles), suppression of the RNAi pathway through viral-encoded suppressors, or extremely low viral replication rates that avoid detection by the host surveillance system. Phylogenetic analysis of RNAi genes confirmed their similarity to known pathways in other fungi. Additionally, the detection of endogenous sRNAs with typical Dicer signatures reinforced the system's role in transcriptional regulation. The findings suggest that RNAi acts as an immune "filter", distinguishing pathogenic viruses (like LgMV1) from potential neutral or symbiotic ones (like LgTlV1). This selectivity may be crucial for maintaining the symbiosis between the fungus and the ants, preventing disruptions in fungal garden cultivation. In summary, the study highlights RNAi as an evolutionary tool for stabilizing mutualisms, potentially enabling hosts to tolerate neutral/beneficial microbes while combating pathogens. The approach used—integrating genomics, transcriptomics, and bioinformatics—provides a model for exploring host-microbiome interactions in other symbiotic systems.
Palavras-chave: RNA interference, small RNAs, symbiosis, pathogen, mycovirus
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1100918

OrganPipe: An automated tool for the assembly, annotation, and curation of mitochondrial and chloroplast genomes

Autores: Renato Renison Moreira Oliveira,Bruno Marques Silva,Michele Molina,Marx Oliveira Lima,Tiago Ferreira Leão,Santelmo Vasconcelos,Gisele Lopes Nunes
Apresentador: Bruno Marques Silva • brunomsbqi@gmail.com
Resumo:
The analysis of organellar genomes is crucial for understanding evolutionary biology, cellular functionality, and genetic diversity. Existing tools for organellar genome assembly, such as NOVOPlasty, GetOrganelle, and MitoHiFi, often suffer from limitations like reliance on single k-mers or reference genomes, resulting in suboptimal assemblies and the need for manual parameter adjustments. To address these challenges, we developed OrganPipe, an automated pipeline that enables iterative assembly and annotation of mitochondrial and chloroplast genomes using multiple seed and k-mer combinations. OrganPipe simplifies genome assembly by leveraging Snakemake for efficient workflow management and a straightforward configuration file to set parameters. This design allows users to run the entire pipeline with a single command line, eliminating the need for multiple manual adjustments and ensuring reproducibility across different computational environments. The pipeline employs multiple bioinformatics tools, including NOVOPlasty for de novo assembly, Pilon for error correction, and BWA for read alignment. For annotation, it integrates MITOS2, MitoHiFi, and nHMMER, ensuring accurate identification and curation of protein-coding genes, transfer RNA (tRNA), and ribosomal RNA (rRNA). OrganPipe supports both short- and long-read sequencing data for mitochondrial genome assembly, while chloroplast genome assembly relies exclusively on short reads.
To demonstrate the OrganPipe efficacy we assembled and annotated the first complete mitogenomes of two invertebrate species (Eulimnadia colombiensis and Pyrearinus pumilus) and plastomes of two plant species (Furtadoa mixta and Melanoxylon brauna). The pipeline enabled the identification of seed and k-mer combinations that failed to produce circularized genomes, providing valuable insights into parameter selection. Furthermore, OrganPipe annotates all assemblies generated by NOVOPlasty and MitoHiFi, ensuring comprehensive analysis of each genome configuration. OrganPipe advances large-scale genomic studies by providing comprehensive parameter exploration and graphical outputs for rapid data interpretation. Additionally, the pipeline generates detailed reports in CSV format, aiding manual curation and facilitating genome submissions to GenBank. Its ability to iterate through multiple seeds and k-mer values in a single run eliminates the need for manual adjustments, making it accessible to both experts and non-bioinformaticians. This pipeline is a step forward in organelle genome analysis, promoting accurate, efficient, and accessible genomic research. OrganPipe is freely available at https://github.com/itvgenomics/OrganPipe.
Palavras-chave: Genome Assembly, Mitogenome, Plastomes, Pipeline
#1103526

Metagenomics analysis of the seed-borne microbiome of Zea mays L.

Autores: Sarah Henaut Jacobs,Beatriz Elisa Barcelos Cyríaco,Francisnei Pedrosa da Silva,Fábio Lopes Olivares,Thiago M Venancio
Apresentador: Sarah Henaut Jacobs • henautjacobs@gmail.com
Resumo:
Population growth and the poor allocation of global resources are creating a scenario of increasing food insecurity in the coming decades. One way to mitigate this problem is by increasing the production of staple and economically relevant agricultural commodities, such as corn (Zea mays L.). A way to achieve this is by exploring beneficial bacteria in the plant's microbiome. The microbiome is defined as the collection of microorganisms present in a sample, and in plants, it can influence defense mechanisms and contribute to their development. The resident microbiome is the one vertically transferred from the mother plant to the seed, and is commonly associated with initial development and protection against pathogens. Metagenomics is the science that evaluates microbiomes. It consists of analyzing all the genetic material from the selected sample, identifying bacteria, fungi, and viruses present. This study aimed to identify and characterize the resident microbiome of two contrasting corn varieties (the hybrid SHS 5050 (SH) and the landrace Sol da Manhã (SOL)). We identified plant growth-promoting bacteria associated with the varieties and their physiological functions in plant development. We found a core microbiome shared by both varieties composed exclusively of members of the Burkholderiaceae family, known for their potential in promoting plant growth, as well as a microbiome exclusive to SOL. Overall, SH was shown to have a more homogeneous and less diverse microbiome. We also identified 18 MAGs (Metagenome-Assembled Genomes), 4 of which belong to uncultivable bacterial species that have not yet been characterized. We conducted functional analyses of the microbiome, aiming to relate gene presence to the physiological traits of the microbiome, where we also found a greater functional capacity in the SOL microbiome. The results obtained here open up the possibility of developing inoculants based on variety-specific traits associated with the microbiota, such as increased productivity or resistance to pathogens.
Palavras-chave: Plant microbiome, agricultural biotechnology
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1106273

Characterization of Transposable Elements in Streptococcus agalactiae: A Comparative Analysis in Isolates from humans, fish, and cattle

Autores: Victor Aldair Quispe Baca,Lorena Maria Rudnik,Alexandre Rossi Paschoal,laurival antonio vilas Boas
Apresentador: Victor Aldair Quispe Baca • vitor_aldair@hotmail.com
Resumo:
Transposable elements (TEs) are mobile DNA sequences found in virtually all forms of life; prokaryotes mainly have insertion sequences (ISs) and transposons (Tns). In prokaryotic genomes, TEs play a role in evolution and plasticity, contributing to adaptation to various environmental conditions and enabling the development of new traits, such as antibiotic resistance.

This study aims to characterize TEs in genomes of Streptococcus agalactiae (GBS) isolated from different hosts (humans, fish, and cattle); these isolates represent various geographical regions and serotypes (mainly: Ia, Ib, III, V). GBS is a pathogenic bacterium of relevance in human and veterinary health, as well as significant economic losses.

The analysis will be divided into five main parts: initially, complete (human, cattle, and fish) and incomplete (only cattle) genomes obtained from the NCBI repository, plus four genomes received from the Veterinary Mycology Laboratory at the State University of Londrina (UEL) will serve as input data for the RM2 and EDTA tools. Redundancy in the annotations generated by these two tools will be reduced using CD-HIT. Subsequently, RepeatMasker will be employed to identify and mask regions corresponding to TEs, using the non-redundant annotation generated in the previous step.

In the second part, the genomes will be re-analyzed with the ISEScan tool for the specific identification of ISs, generating annotation files. The third step will consist of validating the annotations produced by RepeatMasker and ISEScan using the TEsorter tool, which relies on sequence homology and conserved protein domains to refine TE annotations.

Subsequent steps will involve a phylogenetic analysis of the identified and validated TEs. The aim of this analysis is to infer the evolutionary relationships between the TEs present in the different GBS isolates and to evaluate whether there are clustering patterns related to the host or the distribution of TEs. Finally, the fifth part of the research will be dedicated to the analysis of the GBS Pan TE Genome, focusing on the identification and characterization of TEs within the core genome (elements present in all isolates), accessory genome (elements present in some isolates), and unique elements for specific isolates. This analysis will help understand the contribution of TEs to the genomic diversity of the species concerning the different hosts of origin.

The research aims to contribute to understanding the genomic diversity of S. agalactiae in relation to its different hosts and provide insights into the dynamics of the bacterium's mobilome, investigating the potential relationship between transposable elements and adaptation to different hosts and ecological niches, possibly revealing patterns associated with host specificity or pathogenicity. The motivation for this study lies in the scarcity of detailed information on transposable elements in S. agalactiae and their potential impact on health (as public health risks) and socioeconomic factors (as losses in aquaculture).
Palavras-chave: Transposable Elements, Streptococcus agalactiae, Genomics, Bioinformatics, Insertion Sequences, Phylogenetic Analysis, Pan TE Genome.
★ Running for the Qiagen Digital Insights Excellence Awards
#1107969

PIMBA 3.0: orchestrated and faster metabarcoding analysis

Autores: Tiago Ferreira Leão,Luan Pinto Rabelo,Bruno M. Silva,Gisele Lopes Nunes,Renato Renison Moreira Oliveira
Apresentador: Tiago Ferreira Leão • tiago.leao@pq.itv.org
Resumo:
DNA metabarcoding is a powerful technique for identifying and analyzing multiple species within an environmental sample using DNA sequences. By amplifying and sequencing specific genetic markers from mixed DNA samples, we can detect and characterize diverse organisms present in a given environment or host sample. The rapid expansion of Next-Generation Sequencing data has driven the development of user-friendly tools for DNA metabarcoding analysis, increasing accessibility. However, while existing tools provide robust solutions, many are limited in database customization, restricting their flexibility in taxonomy assignment. To address this, our group previously developed PIMBA, an accessible pipeline that integrates Qiime and BMP scripts for OTU clustering, supports ASV inference with Swarm, and includes optional OUT/ASV correction using the LULU algorithm. However, PIMBA was originally implemented in Bash, which posed limitations in structure and processing speed. To enhance usability, reproducibility, and scalability, we have developed a Snakemake-based version of PIMBA. Snakemake provides structured workflow management, automated parallelization, and seamless integration with containerization technologies, making it significantly faster and more efficient than traditional Bash scripting. This new Snakemake-based pipeline (PIMBA 3.0) optimizes metabarcoding analyses, offering a powerful tool for biodiversity research, ecological monitoring, and health sciences applications. Our benchmarking analysis (comparison between version 3.0 and 2.0) demonstrated a substantial reduction in execution time, particularly in the PIMBA Prepare mode, which is crucial for processing large-scale metabarcoding projects with numerous samples. We tested two public datasets: a medium-sized guano (bat feces) dataset for the COI marker; and a small phyllosphere fungi without shading dataset, targeting the ITS marker. Because the PIMBA Prepare mode in Snakemake has a higher degree of parallelization than the PIMBA Run mode, it showed the greatest reduction in execution time (up to 76.7%). The lowest gain in time was for the Run mode for OTUs using the ITS marker region from Fungi and the NCBI RefSeq NT database (only 4.8%). Among the Run mode analyses, the fastest execution was with the COI marker and the BOLD database (both using OTUs or ASVs). These fastest executions (COI and BOLD database for both OTUs and ASVs) also showed a significant speed improvement with the Snakemake implementation: 45.33% for OTU analysis and 45.65% for ASV analysis. Overall, we can confidently state that Snakemake not only provided a better structure for the pipeline (more modular, organized and error-proof) but also significantly reduced run times. Additionally, we introduced PIMBA Place, a new module designed for phylogenetic placement of OTUs/ASVs into a reference tree, providing evolutionary insights into unclassified sequences. In summary, PIMBA 3.0 is a significant enhancement of the original pipeline, making it a valuable addition to the metabarcoding toolbox. PIMBA 3.0 is freely available at https://github.com/itvgenomics/pimba_smk.
Palavras-chave: PIMBA, DNA metabarcoding, Snakemake, PIMBA Place
#1108178

Integrating Hierarchical Clustering and SVM for Accurate Breast Cancer Subtype Classification

Autores: Ana Beatriz Miranda Valentin,Glaucia Maria Bressan,Elisangela Ap. da Silva Lizzi
Apresentador: Ana Beatriz Miranda Valentin • anavalentin@alunos.utfpr.edu.br
Resumo:
Breast cancer is one of the most prevalent and life-threatening diseases worldwide, representing a major public health problem. The main difficulty in addressing breast cancer lies in its heterogeneity, as it comprises multiple molecular subtypes, each with distinct biological characteristics and treatment responses. Consequently, accurate classification of breast cancer subtypes is essential for guiding personalized treatment strategies and improving patient outcomes. In this study, we propose a hybrid computational approach that combines both unsupervised and supervised machine learning techniques to classify breast cancer subtypes (Luminal A, Luminal B, Basal, Her2 and Normal) using gene expression data from The Cancer Genome Atlas (TCGA). The first stage of the methodology involves the application of unsupervised learning through hierarchical clustering. Two different similarity metrics - Pearson correlation and Euclidean distance - are employed to assess clustering performance and identify natural groupings within the gene expression profiles. The clustering results provide information about the biological patterns and consist of a foundation for subsequent supervised classification. the Mojena method, used to determine the optimal number of clusters, identified 43 clusters for the Euclidean distance and 38 clusters for the Pearson correlation. In the second stage, the Support Vector Machine (SVM), a supervised learning algorithm, is implemented in order to classify the breast cancer subtypes, due to its promising results on the literatture. In addiion, The SVM is emphasized due to its strong theoretical foundation and effectiveness in high-dimensional data settings such as gene expression analysis. To optimize the performance of this classifier, hyperparameter tuning is conducted using Optuna, an automatic hyperparameter optimization framework based on Bayesian optimization. This process ensures that the model is fine-tuned to achieve maximum accuracy and generalization. The experimental results demonstrate that integrating unsupervised clustering with supervised learning significantly improves the accuracy and interpretability of breast cancer subtype classification. Regarding the hierarchial clustering, although both metrics result in a predominantly adequate division, the clusters obtained using the Euclidean distance present a clearer delimitation between the subtypes than Pearson correlation, suggesting a greater discriminative capacity. Since the Euclidean distance considers absolute differences between samples, it can be more sensitive to scale and amplitude variations in the data. The supervised SVM method achieved an accuracy of 77.97% using 10-fold cross-validation and hyperparameter optimization performed with Optuna. The model employed the Radial Basis Function (RBF) kernel, a popular choice in SVMs that maps input data into a higher-dimensional space, allowing the classifier to effectively separate non-linearly separable classes. The hybrid approach not only facilitates the identification of meaningful subgroups within the dataset but also enables the construction of high performing predictive models. Notably, the SVM classifier, when combined with hyperparameter tuning, exhibits strong performance in distinguishing between subtypes. In conclusion, this study illustrates the potential of combining hierarchical clustering and supervised machine learning, particularly SVM, to address the complex problem of breast cancer subtype classification. The proposed methodology contributes to the development of more accurate diagnostic tools and for individualized treatment planning, supporting precision medicine initiatives.
Palavras-chave: Breast cancer, tumor classification, clustering, machine learning
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1108440

Prioritizing genes related to gestational COVID-19 and different phases of neurodevelopment

Autores: Maria Laura Gabriel Kuniyoshi,Sérgio Nery Simões,David Correa Martins-Jr,Helena Brentani
Apresentador: Maria Laura Gabriel Kuniyoshi • marialaura.kuniyoshi@usp.br
Resumo:
COVID-19 during pregnancy is associated with a higher risk of delivery complications, and a hypothesis proposes that it can impact long-term neurodevelopmental outcomes in the offspring. The literature on gestational COVID-19 and neurodevelopment is still controversial: some researchers indicate increased risk of neurodevelopmental delay, while others report no effects. These discrepancies may stem from many factors, such as the timing of infection, since different stages of gestation relate to different neurodevelopmental steps. Despite its importance, the molecular basis of this potential link remains poorly understood. This study aims to identify genes associated with gestational COVID-19 that are also differentially expressed (DE) in early and late pregnancy. We intersected six RNA-sequencing studies comparing gene expression in the placenta between individuals with and without COVID-19. We identified the genes that were DE in at least four studies. There were 19532 coding and non-coding transcripts that were DE in at least one study; of these, 17432 were unique to a single study, and 27 were DE in four studies—no gene was shared by more than four. Then, we used the NERI algorithm taking as input: (i) the 27 genes DE in four studies as seeds mapped in the protein-protein interaction (PPI) network; (ii) the PPI network; (iii) transcriptomics data from the dorsolateral prefrontal cortex at early (≤16 post conceptional weeks) and late (≥17 post conceptional weeks) gestation. From this analysis, we prioritized 50 genes that were likely related to both gestational COVID-19 and neurodevelopment. Finally, we analysed these fifty genes with PAN-GO human functionome and assessed the enrichment of genes related to the neurodevelopmental disorders: autism spectrum disorder (ASD), attention deficit/hyperactivity disorder, intellectual disability (ID), and tic disorders. We found three genes that have been linked to both ASD and ID (BBS10, MAPK1, and SMAD4), and ID only (ACTB and CSNK2A1). The functionome indicated a significant enrichment in the pathways of the transforming growth factor beta (TGF-β) / SMAD signalling pathway, known to regulate apoptosis. In conclusion, this integrative approach prioritized genes that can affect how gestational COVID-19 may affect different neurodevelopmental stages. Future research may expand to include more COVID-19 datasets and other brain areas.
Palavras-chave: COVID-19, Neurodevelopment, Pregnancy, Transcriptomics, Protein-Protein Interaction Networks
#1108760

De novo protein design guided by structure and phylogeny for Yellow Fever virus diagnostics

Autores: Junior Olimpio Martins,Patrícia da Silva Antunes,Luiz Mário Ramos Janini,Carlos Henrique Bezerra da Cruz,Ricardo Durães-Carvalho
Apresentador: Junior Olimpio Martins • jr.om@hotmail.com
Resumo:
Yellow fever virus (YFV) is an arthropod-borne flavivirus with a case-fatality rate reaching 60% in severe forms. Clinical symptoms range from mild febrile illness to life-threatening hepatitis and hemorrhage, often overlapping with diseases such as malaria, viral hepatitis, and other hemorrhagic fevers. Clinical diagnosis is hindered by a short viremic phase (3–4 days post-infection) and significant serological cross-reactivity with evolutionarily related flaviviruses like dengue and West Nile. In endemic rural regions, where access to the required infrastructure and personnel for molecular diagnostics is limited, clinicians must rely on symptomatology and travel history, underscoring the need for more specific and accessible diagnostic tools.
This work aimed to design protein antigens containing conserved regions of YFV for potential use in diagnostics. YFV genome sequences from human hosts were retrieved from the Virus Pathogen Resource (ViPR), filtered for quality, and aligned using MAFFT. Evolutionary analyses were conducted with BEAST 1.10 under a Skygrid coalescent model and a log-normal relaxed molecular clock, yielding 10,000 phylogenetic trees. A maximum clade credibility tree was obtained with TreeAnnotator to identify regions conserved over time within the non-structural 1 (NS1) protein.
Candidate regions were selected based on inter-genus sequence divergence and structural criteria including solvent accessibility (calculated with PyMOL), proximity to glycosylation sites (predicted via NetNGlyc), and steric hindrance from glycan shielding (assessed with GlycoSHIELD). Using these regions as motifs, 80,000 protein backbones were generated via RoseTTAFold Diffusion. The most compact structures were then processed with ProteinMPNN for sequence design and AlphaFold3 for structure prediction. Final ranking considered AlphaFold3 confidence scores, backbone RMSD, and similarity to the native NS1 conformation.
Three top candidates were identified, and one was selected for expression based on its structural integrity and epitope accessibility. Future work will focus on characterizing this protein’s performance in ELISA assays, evaluating its sensitivity and specificity against anti-YFV antibodies as well as other flaviviruses. Parallel efforts will explore computationally designed aptamers targeting the same epitope, which may be explored in potential point-of-care diagnostic assays.
Palavras-chave: Protein design, phylodynamics, Yellow Fever virus
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1111282

ResistoHub: A Web-Based Platform for Advancing In Silico Antimicrobial Resistence Analisys

Autores: Rafaella Santana Bueno,Camila Gazolla Volpiano,Adriana Seixas
Apresentador: Rafaella Santana Bueno • rafaellab@ufcspa.edu.br
Resumo:
Despite the proliferation of bioinformatics tools for anti-microbial resistence (AMR) gene analysis, challenges related to precision, usability, accessibility, and the need for technical expertise persist. More critically, the lack of robust validation in real-world clinical settings, regulatory approvals, and universally accepted gold standards further limits their adoption and practical application in guiding antibiotic choices. To address these issues, this study aims to develop an integrated web platform that centralizes and disseminates information on bacterial resistance and AMR bioinformatics tools, promoting collaboration among researchers, healthcare professionals and institutions. Unlike existing resources, which often focus solely on listing bioinformatics tools without fostering engagement, this platform will not only aggregate information about available tools, but also provide an interactive space for users to exchange insights, discuss challenges, and share experiences regarding their use. By creating an active network for knowledge sharing, the platform aims to enhance the practical application of bioinformatics, encourage widespread adoption of these tools, and help overcome technical and practical barriers in AMR research and clinical settings. A literature review is being conducted to identify the tools and databases utilized in AMR research over the past 18 years, drawing from sources such as PubMed, GitHub, and Bio.tools. This review will provide a structured catalog of available tools, detailing methodologies, associated databases, performance metrics, input data requirements, and computational frameworks. The repository will compile these details in a user-friendly format, allowing researchers to find the most suitable tools for their studies, while also providing a space for discussion, troubleshooting, and knowledge exchange to ensure effective adoption and continuous learning. The platform will be developed using modern web technologies to ensure scalability, responsiveness, and accessibility among diverse user groups. In addition to the tool repository, the platform will feature a community forum to encourage interdisciplinary discussions and problem-solving, as well as an educational section with tutorials, protocols, and research methodologies tailored to different levels of expertise. The expected results include the establishment of a centralized and interactive repository of bioinformatics tools for AMR analysis, improving accessibility, usability, and informed decision-making among researchers. By offering structured methodological resources and fostering knowledge exchange, the platform will contribute to improving the quality and reproducibility of AMR gene detection. Furthermore, given that translating AMR bioinformatic findings into clinical recommendations, such as antibiotic selection, requires rigorous validation and standardization, the platform will highlight these limitations and promote discussions of best practices for clinical implementation. To evaluate and promote the platform, surveys will be conducted among research groups and graduate programs in bioinformatics and antimicrobial resistance to evaluate its perceived utility and adoption potential. In addition, the project will seek to establish a collaborative network to support the continuous development and maintenance of the platform.
Palavras-chave: Bioinformatics, antimicrobial resistance, genomic analysis, web platform, epidemiological surveillance, educational resources.
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1112366

Deep Learning for Blood Cell Classification: An Explainable Artificial Intelligence Approach for Diagnostics in Hematology

Autores: ALISON HENRIQUE MARCELINO,Claudia Stoeglehner Sahd,Heron dos Santos Lima,Elisangela Ap. da Silva Lizzi,Cristhiane Gonçalves,MARCELLA SCOCZYNSKI RIBEIRO MARTINS,Heron dos Santos Lima
Apresentador: MARCELLA SCOCZYNSKI RIBEIRO MARTINS • marcella@utfpr.edu.br
Resumo:
The microscopic analysis of peripheral blood smears is considered the gold standard for detecting various hematological disorders. However, it is a time-consuming, repetitive process, prone to subjective interpretation, as different operators may reach different conclusions from the same sample. This study proposes a deep learning-based approach to enhance the accuracy of blood cell classification and segmentation, thereby reducing the workload of pathologists and improving diagnostic efficiency. We employed the deep neural network model ResNet50 to classify images into six cell types—basophils, eosinophils, erythroblasts, lymphocytes, monocytes, and platelets—complemented by SHAP (SHapley Additive exPlanations) to provide explainability to the model's decisions. The "Blood Cells Image Dataset," comprising 17,092 expert-annotated images from the Hospital of Barcelona using the CellaVision platform, was used in this study. The model achieved a test accuracy of 97.33% (98.93% in training and 96.13% in validation), with minimal overfitting, as evidenced by the small gap in loss values (0.55 in training vs. 0.60 in testing). Class-wise performance was notable, with eosinophils reaching an F1-score of 0.98 and platelets 0.99. The model also showed strong discriminative capability for challenging morphologies such as lymphocytes and monocytes, with F1-scores of 0.95 and 0.96, respectively. SHAP analysis revealed clinically meaningful patterns: lymphocytes were characterized by high nuclear density and an elevated nucleus-to-cytoplasm ratio, while platelet recognition required fewer features due to their distinctive morphology. Furthermore, the model exhibited high predictive confidence, assigning 94.52% probability to lymphocyte classifications. These findings support the model's potential as a reliable tool for augmenting diagnostic accuracy in hematological assessments.
Palavras-chave: Blood cell classification, Deep learning, SHapley Additive exPlanations, Explainability,Neural network
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1112523

Investigation of Explainable AI Techniques for Assessing Behavioral Patterns in Child Players during Gaming Sessions

Autores: Maria Fernanda Oliveira de Figueiredo,Ana Carolina Ataya,João Vitor Bezerra,Myriam Regattieri De Biase da Silva Delgado,MARCELLA SCOCZYNSKI RIBEIRO MARTINS,Rita Silva Julia,CESAR AUGUSTO TACLA,Adriana Aparecida Guimarães,Gabrielly de Queiroz Pereira
Apresentador: MARCELLA SCOCZYNSKI RIBEIRO MARTINS • marcella@utfpr.edu.br
Resumo:
The proposal presents the implementation of machine learning-based language processing algorithms to evaluate gamers’ performance during gaming sessions, with a specific focus on neurodivergent children, since the role of digital games to stimulate the development of cognitive and motor skills in such children represents an emerging and promising field of research. The study uses transcriptions from publicly available gameplays, aimed at children, and employs Transformer and Long Short-Term Memory (LSTM) networks to identify behavioral patterns, offering insights through explainable AI techniques. Experimental data is being collected from public elementary and high schools, and include both neurotypical students and individuals diagnosed with Down Syndrome or Autism Spectrum Disorder. Simultaneously, while the experimental data is still unavailable, an original language dataset is generated for algorithm testing purposes. The dataset is composed of transcriptions from publicly available gameplay videos, totalling 5880 phrases that are used to train and fine-tune sentiment analysis models, validate the robustness of the language processing pipeline, and enhance the explainability techniques applied. This step ensures the models are trained and validated before analyzing real-world data from schools. While the gaming content used to generate the dataset does not explicitly address neurodivergent contexts, forms of entertainment targeted at children are considered suitable for both neurotypical and neurodivergent audiences. This ensures the dataset aligns with the future study’s inclusive objectives. Each phrase is initially labeled by the pre-trained language model as (0) negative, (1) neutral, or (2) positive. Subsequently, manual verification is conducted to correct occasional misclassifications, based on defined criteria for each class: Positive statements indicate player progress, happiness, or satisfaction with the game; Negative statements reflect difficulty, frustration, or dissatisfaction; and Neutral statements lack indicators of any sentiment. After classification, the corpus is used to fine-tune the pre-trained algorithm. For the LSTM model, which lacks pre-training, we also include three publicly available Portuguese datasets, enabling the model to grasp core aspects of Portuguese before being fine-tuned on the gameplay dataset. After training and testing the language models, results indicate that while the Transformer-based model and LSTM achieved comparable training metrics, with a validation accuracy of 0.82 for both models, the LSTM outperformed the Transformer in testing - contrary to expectations. From the obtained results, the Transformer architecture indicates a strong bias towards classifying sentences as neutral, despite the presence of negative or positive words. It is hypothesized that this behavior stems from an unequal distribution of examples, since the majority of gaming phrases represent neutral situations. Additionally, since the model is pre-trained on a multilingual corpus, it may also not be adequately adapted to the portuguese-only context. Finally, the explainability techniques provided transparent visual tools, offering valuable support for professionals analyzing behavioral patterns in children, highlighting the words that influenced sentiment predictions, and demonstrating the potential for machine learning and explainable AI to enhance understanding of behavioral patterns in neurodivergent children. Future research steps include further augmenting the gameplay dataset in order to widen the contexts of example phrases, attempting to address the observed limitations in model performance.
Palavras-chave: Explainable AI, players behavior, Transformer, LSTM
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1112541

In silico analysis of the pharmacokinetic properties and toxicity of chitosan, graphene oxide, and cannabidiol.

Autores: Bruna de França Nário Oliveira,Milena Costa da Silva,Suédina Maria de Lima Silva,Marcus Vinicius Lia Fook,Rafael Trindade Maia
Apresentador: Bruna de França Nário Oliveira • bruna.nario@estudante.ufcg.edu.br
Resumo:
In silico analysis of the pharmacokinetic properties and toxicity of biomaterial compounds has proven to be a powerful tool in the development of new therapies and technologies. In this study, a computational evaluation of the pharmacokinetic properties and toxicity of chitosan, graphene oxide (GO), and cannabidiol (CBD) was carried out, aiming to provide preliminary insights into their behavior in the body and potential risks associated with their therapeutic use. Chitosan, a polysaccharide derived from chitin, has been extensively studied for its biocompatible properties and its potential in various biomedical applications, such as drug delivery and controlled release systems. Graphene oxide, in turn, is a highly promising material in nanotechnology, with several applications in areas such as biomedicine, sensors, and nanomaterial-based therapies. CBD, a non-psychoactive cannabinoid, has attracted attention due to its potential therapeutic properties in various conditions, including chronic pain, epilepsy, and anxiety disorders.
The in silico analysis was conducted using the “ADMETlab 3.0” molecular analysis and prediction platform for pharmacokinetic properties, such as absorption, distribution, metabolism, and excretion (ADME), as well as the evaluation of the toxicity potential of each substance. For chitosan, good absorption and distribution properties were observed, suggesting a favorable profile for therapeutic use. Graphene oxide showed a promising pharmacokinetic profile, with good absorption and favorable metabolism, in addition to low toxicity, being considered a relatively safe compound for therapeutic applications.
The results obtained highlight the potential of these compounds for the development of new therapies and biomedical devices, while also emphasizing the importance of further investigations, especially regarding the long-term biocompatibility of graphene oxide and the possible interaction of CBD with other drugs. This study provides an initial foundation for future research on these materials and their effects on the human body, contributing to the advancement of personalized medicine and nanotechnology-based therapies.
Palavras-chave: ADME, Toxicity, Cannabinoids, Biopolymers, Pharmacodynamics.
★ This work is running for the Next Generation Bioinfo Award
#1113396

Development of an algorithm for extracting and structuring free-text fields from electronic health records

Autores: Arthur Shuzo Owtake Cardoso,Luciana Rodrigues Carvalho Barros
Apresentador: Arthur Shuzo Owtake Cardoso • shuzoarthur@usp.br
Resumo:
The study focuses on the development of an algorithm to extract and structure free-text fields from electronic health records (EHRs) to improve the retrieval of critical clinical data, specifically for breast cancer patients. Breast cancer, a highly prevalent disease globally, requires precise staging (TNM system) and treatment monitoring, but much of this information is buried in unstructured EHR text fields, making systematic retrieval difficult and posing challenges for research and clinical use. The research, conducted at the São Paulo State Cancer Institute (ICESP), analyzed EHRs from 2008 to 2022, using Python-based natural language processing (NLP) tools, particularly regular expressions (RegEx), to mine and structure data such as TNM staging and treatment modalities (adjuvant, neoadjuvant, palliative). Two extraction strategies were implemented: (1) searching for complete TNM staging notations (e.g., "T2N1M0") and (2) isolating individual T, N, and M components when full staging was absent. The algorithm processed over 400,000 records, removing duplicates and empty entries, and successfully extracted TNM staging for 92.7% of patients (over 13,000 cases) and treatment modality data for 84.7%. To validate the algorithm’s accuracy, results were compared against a manually curated gold standard of 705 patient records reviewed by a medical team. For the T component, the algorithm achieved an 85.5% match rate with the gold standard, with discrepancies including 7% missed entries (present in gold standard but not retrieved), 5% incorrect matches (unmatched staging), and 2.5% over-retrieval (staging found by the algorithm but absent in the gold standard). For the N component, accuracy improved to 90%, with 3% missed entries, 5% incorrect matches, and 1.5% over-retrieval. The M component showed the highest precision at 97%, with only 1.5% missed entries and 1% incorrect matches. These results demonstrate the algorithm’s robustness, particularly for metastatic status (M), while highlighting areas for refinement, such as improving tumor (T) and nodal (N) staging retrieval. The structured outputs were integrated into the HCFMUSP REDCap database, enabling standardized research use. This work underscores the potential of automated tools to transform unstructured clinical text into actionable data, bridging gaps in large-scale oncology research and EHR utility.
Palavras-chave: Electronic Health Records (EHRs); Text Mining; TNM Staging
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1114033

Identification of potentially sporulating MAGs from vertebrate fecal microbiomes using SpoMAG, an R-based Machine Learning tool

Autores: Douglas Terra Machado,Otávio José Bernardes Brustolini,Ana Tereza Ribeiro de Vasconcelos
Apresentador: Douglas Terra Machado • dougterra@gmail.com
Resumo:
Sporulation, a survival strategy in Firmicutes, enables bacteria to endure harsh conditions through metabolically dormant spores. Predicting sporulation potential in uncultured bacteria, such as those recovered from metagenome-assembled genomes (MAGs), remains challenging due to the lack of phenotypic data. To address this, we developed SpoMAG (Sporulation potential in MAG), an R-based Machine Learning (ML) tool that predicts a genome’s sporulation capability through gene annotation information. We evaluated four supervised ML algorithms: Random Forest (RF), Support Vector Machine (SVM), XGBoost, and Neural Networks, by training them on a dataset of 136 bacterial genomes with experimentally confirmed sporulation phenotypes. Based on standard evaluation metrics (accuracy, sensitivity, specificity, precision, and F1-score), the RF and SVM models were selected for integration into SpoMAG, forming a stacked meta-model that combines their probability outputs. SpoMAG processes input files containing gene names and KEGG Orthology annotations, filters sporulating-associated genes, and predicts the sporulation capability of a given genome. To interpret feature importance, we applied SHapley Additive exPlanations (SHAP), identifying key genes that consistently influenced predictions. SpoMAG was applied to 809 high-quality MAGs from fecal microbiomes of cattle (n=199), poultry (n=199), swine (n=167), and human (n=244) collected across five Brazilian states. Given the taxonomic novelty of many MAGs, we used FastANI to assess strain-level similarity and potential sharing across hosts. Principal Component Analysis (PCA) and beta-dispersion analysis (betadisper) were employed to explore sporulation gene patterns and intergroup variability. SpoMAG was applied in three scenarios: (i) non-Firmicutes MAGs, (ii) Bacilli MAGs, and (iii) Clostridia MAGs. The 496 non-Firmicutes were correctly predicted as non-spore-forming. Among Bacilli, nine putative novel spore-formers (orders Paenibacillales and Bacillales) exhibited higher sporulation gene counts (betadisper: F=12.62, p=0.0007). In Clostridia, 54 potential novel sporulators were identified, particularly within Acetivibrionales (human/poultry samples). While all Clostridiales MAGs were predicted as spore-forming, variability was observed among Christensenellales and Oscillospirales. Novel candidates were also detected in understudied genera (Herbinix, UBA3818, and Alkaliphilus), underscoring SpoMAGs’ utility in discovering novel spore-formers. Notably, despite similar sporulation gene counts in some Clostridia, gene composition varied significantly (betadisper: F = 42.85, p < 0.001), highlighting the need for ML-based approaches over simple gene-count thresholds. Most predicted spore-formers were host-specific (ANI < 95%), but seven strains were shared across hosts (ANI > 95%), including an Acetivibrionaceae strain (~99% ANI) detected in poultry, swine, and humans. PCA revealed distinct clustering between Bacilli and Clostridia, followed by further divergence within Clostridia. Fifteen genes were identified as key model predictors, while nine genes were present in all 63 MAGs predicted to be spore-formers, suggesting potentially conserved core functions for future investigations. SpoMAG offers an annotation-based approach for predicting sporulation in MAGs contributing to the discovery of novel spore-formers and advancing research in microbial ecology, probiotics, and synthetic biology.
Palavras-chave: Sporulation, SpoMAG, Firmicutes, Metagenome-Assembled Genomes, Machine Learning, Sporulation Prediction, Spore-forming Bacteria
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1114044

Detection of novel Clostridia and Bacilli species from multiple vertebrate hosts using phylogenomics and sporulation gene profiling

Autores: Douglas Terra Machado,Beatriz do Carmo Dias,Rodrigo Cayô da Silva,Ana Cristina Gales,Fabíola Marques de Carvalho,Ana Tereza Ribeiro de Vasconcelos
Apresentador: Douglas Terra Machado • dougterra@gmail.com
Resumo:
The analysis of Metagenome-Assembled Genomes (MAGs) greatly enhanced our understanding of microbial dark matter, particularly within the vertebrate gut microbiome. Among these microorganisms, spore-forming bacteria, notably from the Firmicutes phylum, are of special interest due stress-resistant capabilities and potential impacts on host health. However, taxonomic resolution of many putative spore-formers remains incomplete, primarily due to cultivation challenges. MAGs provide a cultivation-independent approach to investigate sporulation-associated genes and identify novel bacterial taxa across diverse gut microbiomes. In this study, we analyzed 146 MAGs obtained from poultry, swine, cattle, and humans across five Brazilian states, to identify novel Firmicutes species and assess their sporulation potential. Using a curated set of 160 sporulation-associated genes, we evaluated gene presence across MAGs based on gene annotation datasets. To improve taxonomic classification, we applied a phylogenomic pipeline utilizing 94 ribosomal proteins and compared the 146 MAGs without taxonomic attribution at the family level with 359 reference genomes, 499 family-known MAGs, and four outgroup genomes. This effort refined 22 MAGs to the family level, with 13 from Clostridia and 9 from Bacilli. Average Nucleotide Identity (ANI) between genomes was calculated using FastANI, and functional profiling was performed using KEGG Orthologies (KOs) from the EggNOG database, focusing on metabolic pathways related to sporulation and germination. Phylogenetic reconstruction revealed potential novel species within Clostridia families, such as Borkfalkiaceae, Lachnospiraceae, Monoglobaceae, and Oscillospiraceae, as well as within Bacilli families, including Bacillaceae and Erysipelotrichaceae, suggesting the presence of novel or underexplored taxa. Among the analyzed MAGs, 124 remained unclassified, emphasizing the need for further investigation into uncultivated spore-formers with potential host-health implications. PERMANOVA analysis indicated that MAGs belonging to the Bacilli class exhibit a distinct sporulation gene composition compared to those in Clostridia (p=0.001). This difference was not observed among hosts (p=0.125). Additionally, a positive correlation was observed between genome size and sporulation gene count (r=0.58, p<0.0001), suggesting that larger known-taxa sporulating genomes can have a greater capacity to encode the genetic machinery necessary for sporulation. We also identified carbohydrate and amino acid metabolic pathways potentially supporting sporulation processes in the investigated MAGs. Altogether, this study contributes to understanding spore-forming bacteria from different hosts with further implications for microbiome-based therapies and infectious disease management. These findings provide evidence for novel species diversity across multiple hosts, their adaptive strategies, and potential applications for host health.
Palavras-chave: Sporulation, Firmicutes, Metagenome-Assembled Genomes, Phylogeny, Sporulation Prediction, Spore-forming Bacteria
★ Running for the Qiagen Digital Insights Excellence Awards
#1115100

Molecular and Bioinformatics Analyses of Olfactomedins-like Suggest Their Potential Roles in Regulating Fish Skeletal Muscle Growth

Autores: Lucas Camilo Moraes Alves,Eduardo Rosa da Silva,Érika Stefani Perez,Daniel Garcia de la Serrana,Bruna Tereza Thomazini Zanella,Maeli Dal Pai,Bruno Oliveira Silva Duran
Apresentador: Lucas Camilo Moraes Alves • lucascamilo2000@gmail.com
Resumo:
Skeletal muscle is very abundant in fish, being the main component of the meat and a source of nutrients in the human diet. Muscle growth is regulated by internal and external factors, such as amino acids, IGF1, hormones, temperature, photoperiod, water quality, and feeding, impacting fish farming production. However, many genes and proteins regulating this process remain poorly studied, such as the olfactomedin-like (olfml) family. This study characterized and indicated some functions of olfmls in fish skeletal muscle using omics data and bioinformatics approaches. Public single-cell RNA and transcriptomic data from pacu (Piaractus mesopotamicus) and seabream (Sparus aurata) muscle cells were analyzed to assess gene expression. Additionally, phylogenetic analysis, molecular network assembly, and gene ontology enrichment were performed. Gene expression of olfmls within muscle tissue showed variations: olfml1 was higher in fibroblasts (61.1 nTPM) and smooth muscle myocytes (35.5 nTPM); olfml2a was higher in endothelial (24.9 nTPM) and smooth muscle cells (21.1 nTPM); olfml2b was higher in smooth muscle cells (47.8 nTPM) and fibroblasts (35.7 nTPM); and olfml3 was higher in fibroblasts (140.5 nTPM). Interestingly, the expression levels were lower in skeletal myocytes: olfml1 (5.9 nTPM), olfml2a (6.8 nTPM), olfml2b (5.9 nTPM), and olfml3 (9.8 nTPM). Phylogenetic analysis between teleost superorders Ostariophysi, Acanthopterygii, and Protocanthopterygii revealed the existence of genes olfml1, olfml2a, olfml2ba and olfml2bb (paralogues), olfml3a and olfml3b (paralogues) in fish genomes. In pacu muscle cells, the expression of all olfml genes increased after amino acid treatment, while IGF1 treatment increased the expression of olfml1 and olfml3b (p < 0.05). In seabream muscle cells, amino acid treatment only increased expression of olfml1 and olfml2ba (p < 0.05), although all olfml genes exhibited high read counts. Gene network analysis revealed that olfmls connected to multiple genes related to muscle development, metabolism, protein synthesis, extracellular matrix, and cell signaling, mainly through olfml3b and myocilin (from the same family). Protein network analysis showed that all Olmfl interacted with proteins involved with extracellular matrix and cell signaling, especially Olfml2bb, Olfml3a, and Olfml3b. In addition, Olfml2a is associated with transcriptional regulation, Olfml2ba is linked to metabolic processes, and Olfml1 showed involvement in muscle development and maturation. We identified the enrichment of several relevant biological processes, cellular components, and molecular functions for skeletal muscle. Respectively, we highlight "myoblast differentiation, sarcomere organization, and extracellular matrix", "sarcoplasm and cytoskeleton", and "calcium ion binding and actin filament binding". Our findings contribute to the understanding of molecular pathways modulated by olfmls and their influence on the regulation of fish muscle growth, opening perspectives for applications in aquaculture.
Palavras-chave: Fish. Skeletal Muscle. Muscle Growth. Bioinformatics. Transcriptome.
★ Running for the Qiagen Digital Insights Excellence Awards
#1115274

Gene expression analysis in Alzheimer's disease via PCA: a study of the trade-off between explained variance and the complexity of the number of components.

Autores: Kaylaine Beatriz Gomes de Barros,Victor Ehiti Itimura Tamay,Glaucia Maria Bressan,Elisangela Ap. da Silva Lizzi
Apresentador: Kaylaine Beatriz Gomes de Barros • kaylaine@alunos.utfpr.edu.br
Resumo:
Alzheimer's disease is a complex neurodegenerative condition that presents significant challenges to the field of neurology. The identification of reliable biomarkers for early diagnosis remains a pressing need, given the disease's clinical complexity and variability. This study aims to explore approaches for analyzing highdimensional gene expression data related to Alzheimer's desease, employing dimensionality reduction techniques.
Principal Component Analysis (PCA) was used to reduce the complexity of large datasets while preserving meaningful information within latent variables—principal components. This technique involves linear algebra and matrix operations to transform the original high-dimensional dataset into a new coordinate system structured by the principal components.
Data acquisition was conducted using the open-access AlzData repository, which contains genomic and clinical information from individuals diagnosed with Alzheimer's disease. The original dataset consisted of 484 genes (rows) and 7,713 samples (columns), where each cell value represents the gene expression level of a specific gene in a given sample. During the preprocessing step, the data were organized into a matrix structure. The analysis was conducted using the R programming environment, and the significance level was set at 5%.
The results of the PCA application showed that 26 principal components were required to explain 80% of the data variance. With 40 components, 85% of the variance was explained, and 71 components were needed to reach 90%. This indicates a steep increase in the number of required components for relatively small gains in explained variance. Specifically, moving from 80% to 85% explained variance required an increase of approximately 54% in the number of components (from 26 to 40), and achieving 90% required a further 77.5% increase (from 40 to 71). Overall, to raise the explained variance from 80% to 90%, the number of components more than doubled—a 173% increase—highlighting a clear trade-off between accuracy and complexity.
This non-linear behavior indicates that the final increments in explained variance incur a high computational cost. While a few components capture most of the data's structure (following the law of diminishing returns), explaining residual variance demands an increasingly large number of dimensions. Thus, balancing the number of components with the proportion of variance explained is fundamental to avoid overburdening model performance without meaningful gains in interpretability. These findings indicate that despite the original dataset’s high dimensionality, substantial dimensionality reduction can still retain the majority of relevant information.
In conclusion, PCA demonstrates effectiveness for condensing Alzheimer's genomic data, yet requires careful consideration of the trade-off between retained variance and interpretability. While the technique facilitates pattern recognition and the identification of potential biomarkers, its application must be judicious to avoid computational overload without practical benefit.
Palavras-chave: Alzheimer's disease, dimensionality reduction, gene expression, multivariate statistic.
★ Running for the Qiagen Digital Insights Excellence Awards
#1115430

EasyOmics LampAID: tool to speedup screening of LAMP PCR primersets

Autores: David Aciole Barbosa,Fabiano Menegidio,Daniela Leite Jabes,Regina Costa de Oliveira,Luiz Roberto Nunes
Apresentador: David Aciole Barbosa • aciole.d@gmail.com
Resumo:
Since the creation of polymerase chain reaction (PCR) by Kary B. Mullis, in 1983, several adaptations of this technique have been proposed. PCR is, undoubtedly, one of the most important discoveries of the XXth century and contributed with the development of diverse fields such as forensics, diagnostics and molecular sciences in general, also leading to many alternative PCR versions – qPCR, rtPCR, Multiplex PCR, and so on. Nonetheless, in spite of its overall simplicity and widespread use around the world, the PCR reaction depends on the availability of relatively expensive thermocyclers, capable of alternating temperatures during reaction incubation. An outstanding alternative to PCR was developed in the late 1990s when Notomi et al. proposed to amplify and detect specific DNA sequences with the aid of an isothermal reaction known as LAMP (Loop-mediated Isothermal Amplification). LAMP displays a series of advantages over PCR, as it can be carried out at a constant temperature (thus dispensing the need for expensive thermocyclers) and delivers more amplified DNA in a shorter period of time. Moreover, due to the large amount of amplicons delivered in LAMP reactions, amplification can be readily detected by fluorometric/colorimetric dyes and/or by turbidity, due to pyrophosphate precipitation. LAMP employs a specific DNA polymerase with strand displacement activity (Bst Pol), along with a set of six to eight primers. Thus, a LAMP primer set is significantly more complex than the traditional pair of primers traditionally used in PCR reactions, especially considering that they also must comply with a series of thermodynamic and positional constraints. As a result, the design of primer sets for LAMP reactions is not a trivial task, especially when performing diagnostic reactions, aiming at amplifying genomic regions from a specific organism, without amplifying non-target taxa. A few bioinformatics tools, coupled with omics databases, can be used for in silico validation of primer specificity, but this strategy often involves laborious and time-consuming approaches. In this sense, we present lampAID, a software tool capable of screening of multiple LAMP primer sets against genomic databases, thus assisting in the identification of primer sets capable of recognizing specific targets in LAMP reactions. LampAID can work on thousands of genomes and detect which organisms can have their DNA potentially amplified in a LAMP reaction, in minutes. LampAID’s user-friendly mini folder-wise pipeline execution provides alignment-like visualization, making it much easier to detect when a primer set is free from unwanted amplification.
Palavras-chave: Isothermal PCR, in silico PCR, primer analysis, genome comparison, lamp software
#1115454

Reconstruction of Gene Regulatory Networks with Multiple Dependence and Correlation Metrics in Gene Expression Data

Autores: Lucas Otávio Leme SIlva,Glaucia Maria Bressan,Alexandre Paschoal,Fabricio Martins Lopes
Apresentador: Lucas Otávio Leme SIlva • lucasotavio750@gmail.com
Resumo:
Understanding fundamental biological processes, especially gene interactions, led to the concept of gene regulatory networks (GRNs), applied to cell differentiation, development, and disease progression. Analyzing these GRNs can shed light on the fundamental mechanisms of gene interactions, helping to unravel cellular functioning and the mechanisms underlying complex diseases. Assessing gene regulatory networks remains challenging due to the high dimensionality of gene expression data and limited samples. For example, the DREAM5 benchmark for Escherichia coli contains 4,511 genes and transcription factors, but only 805 samples, a scenario that makes it difficult to effectively apply deep learning-based models. Recent research has focused on developing sophisticated models to infer GRNs, often relying on correlation or dependence metrics for reconstruction. However, an in-depth analysis of different metrics is not addressed. Therefore, the objective of this work is to perform a comprehensive analysis of different dependency metrics and their respective behaviors, and also to propose a new metric that is the adaptation of mutual information using different entropies besides Shannon, such as Renyi and Tsallis. In total, 62 analyses were evaluated, covering linear, monotonic, nonlinear and nonmonotonic dependencies. The analyses were applied to pairs of genes and transcription factors present in the DREAM5 dataset, which contains 2,066 known interactions. The ability of each numerical metric to assign high scores to true interactions compared to false ones was evaluated, while statistical tests were only considered as significant or not (95% significance level). Preliminary bivariate results indicated that 486 of the 2,066 interactions were not identified by any numerical metric, and 572 were identified by only one. To find an ideal subset of numerical analyses, the number of correct relations identified had to be weighted and penalized by the number of metrics chosen. A planning for selecting subsets of metrics was formulated based on a scoring function, defined by: score(S) = alpha*Coverage(S) - beta(|S|), in which S is the subset of analyses selected. For preliminary results, alpha=1 and beta = relations/metrics = 2066/34 ~ 61 were used. The optimization of this function led to the selection of nine metrics that maximize the score. This subset was able to identify 1,152 of the 2,066 interactions. Despite finding about 55% of the relationships, the results still reveal important limitations in the detection of gene interactions based on dependency metrics. Among the main challenges are: the scarcity of samples compared to the high dimensionality of the data, the possibility of spurious correlations between unrelated variables, and the presence of local dependencies, observable only in specific subsets of experimental conditions. In contrast, all relationships were identified using statistical tests such as Hilbert Schmidt Independence, Heler Heler Gorfine, among others. However, the number of false positives is extremely high, but they can be good analyses to reduce the sample space. To advance the reconstruction of the genetic network, a new numerical metric that satisfies the following criteria is necessary: detection of arbitrary relationships (linear, non-linear, monotonic and non-monotonic), defined scale, robustness against false positives and ability to identify local dependencies.
Palavras-chave: Gene Regulatory Networks, Inference, Correlation, Metrics, Entropy
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1115638

Longitudinal Metagenomic in Wastewater Reveals Antimicrobial Resistance and Virulence Factors Genes Dynamics.

Autores: Júlia Firme Freitas,Thaís Teixeira Oliveira,Lucymara Fassarella Agnez Lima
Apresentador: Júlia Firme Freitas • juliaffreitas@hotmail.com
Resumo:
Wastewater represents a microbial reservoir that mirrors the interplay between human activity, environmental conditions, and public health dynamics. Wastewater-based epidemiology has become a critical sentinel for monitoring antimicrobial resistance genes (ARGs) and virulence factors genes (VFGs). This study presents the first year-long longitudinal analysis of wastewater microbiomes in Natal, Brazil. We conducted weekly wastewater sampling in three wastewater treatment plants (Baldo, Ponta Negra, and Beira Rio) over 12 months, processing samples as monthly composites. Metagenomic DNA was sequenced using Illumina platforms, followed by comprehensive bioinformatics analysis, including taxonomic profiling, ARG/VFG annotation (CARD, BV-BRC, and CZID databases), and viral identification. Metagenome-assembled genomes (MAGs) were reconstructed to characterize resistant strains. In addition, co-occurrence networks revealed relationships between viruses, ARGs, and VFGs. Statistical analyses incorporated climatic and tourism data. Our study revealed distinct seasonal microbial shifts corresponding to tourism peaks, with multidrug resistance genes disproportionately prevalent in low-abundance taxa. Viral populations showed stronger ARG than VFG associations, suggesting phage-mediated resistance transference. From 95 MAGs, we identified 33 multidrug-resistant strains, mainly Tolumonas and novel ARG carriers in Aquaspirillaceae - a previously unrecognized resistance reservoir. Notably, five high-abundance ARGs (msrE, mphE, sul1, tetC, ges-1) consistently co-occurred with VFGs, potentially representing high-risk strains combining resistance and virulence traits. The limited virus-VFG connections (except crAssphage) suggest different evolutionary pressures shape resistance and virulence dynamics. The observed tourism-linked resistance peaks highlight human activity as a key driver of ARGs occurrence in this tropical ecosystem. While ESKAPE pathogens were abundant, they contributed minimally to multidrug-resistant MAGs, challenging conventional surveillance approaches focused on dominant pathogens. Our study establishes tropical wastewater as a sensitive ARG surveillance tool, particularly valuable in resource-limited settings. Identifying novel resistance carriers and seasonal resistance patterns provides specific targets for public health intervention. We demonstrate that minor microbial populations and viral communities play underappreciated roles in resistance dissemination. These findings advocate for (1) expanded wastewater monitoring in developing regions, (2) tourism-focused ARG containment strategies, and (3) integration of viral ecology into resistance surveillance frameworks. The study provides a model for longitudinal ARG tracking in underrepresented tropical cities while highlighting the urgent need for global wastewater surveillance networks to combat the growing antimicrobial resistance crisis.
Palavras-chave: Metagenome, ARGs, VFGs, viral, bacteria
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1115790

Exploring the Role of Linker Length in Brontictuzumab scFv Interaction and Stability

Autores: Aline de Oliveira Albuquerque,Andrielly Henriques dos Santos Costa,Luca Milério Andrade,Diego da Silva Almeida,João Herminio Martins Da Silva
Apresentador: Aline de Oliveira Albuquerque • albuquerque.aline31@gmail.com
Resumo:
Brontictuzumab (BRON) is a humanized IgG2 antibody that targets the Negative Regulatory Region (NRR) of Notch1. BRON has shown clinical efficacy against solid tumors and hematological malignancies, including T-cell acute lymphoblastic leukemia, which often harbors activating mutations in the NRR. While full-length antibodies remain the dominant therapeutic format, antibody engineering enables the design of smaller fragments to optimize pharmacokinetics and overcome production challenges. Among these, single-chain variable fragments (scFvs) offer notable advantages, including enhanced tissue penetration, rapid clearance, and compatibility with bacterial expression systems. Formed by covalently linking an antibody VH and VL domains with a flexible peptide linker, their modularity and potential for multivalent engineering make them attractive for immunotherapy and molecular targeting. Critically, their performance is influenced by the length and composition of the linker, which can affect folding, binding affinity, and overall stability. In this study, we evaluated scFv derivatives of BRON incorporating glycine-serine linkers of varying lengths (9, 12, and 15 residues) to assess the impact of linker size on antigen interaction and stability. scFv models were generated using RoseTTAFold and AlphaFold2, refined with locPREFMD, and validated via MolProbity, QMEANDisCo, and VoroMQA. Protein-protein docking with ClusPro antibody mode was applied to produce scFv-NRR complexes. Results were filtered based on energy, interface similarity, and interaction with key epitope residues (e.g., Leu1710 at the S2 cleavage site). Selected complexes were analyzed for interface composition via PDBSum, validated through heated Molecular Dynamics (MD) and submitted to 400-ns MD simulations in duplicate. Our results indicated that the models produced by RoseTTAFold without locPREFMD refinement showed superior stereochemical, energy-based, and contact-based validation scores. The filtered scFv-NRR complexes had interface RMSD under 5 Å during heated and extended MD simulations, indicating structural stability. Notably, scFv12 underwent early rearrangement followed by stabilization, resulting in a compact and stable interface with limited S2 site accessibility. In contrast, scFv15 presented a conformation characterized by higher solvent exposure at S2, likely due to increased VH–VL separation, as evidenced by radius of gyration analysis. Interface analysis revealed that CDR contribution shifted with linker length, as CDR-L1 dominated in scFv9. scFv12/15 displayed a balanced CDR contribution and matched the Fv chemical profile of residues composing the interface. Principal Component Analysis (PCA) indicated that while scFv9 and scFv12 partially overlapped with Fv-induced states, scFv15 explored distinct regions associated with increased S2 site accessibility. Hydrogen bond analysis showed scFv9 and scFv12 formed stable interactions with Leu1710, a residue critical for NRR autoinhibition, suggesting potential for S2 site blockade. Despite the highest number of total hydrogen bonds, scFv15 lacked stable contacts with Leu1710, correlating with its elevated S2 exposure. Additionally, MM/GBSA analysis confirmed that scFv12 had the binding free energy most closely matching the Fv-NRR complex average, further supporting its favorable interaction profile. These results underscore the critical influence of linker length on scFv-NRR interaction dynamics and structural behavior. The scFv12 construct emerged as a promising candidate, enabling stable interaction with key regulatory residues, making it a promising candidate for further in vitro validation and therapeutic development.
Palavras-chave: scFv, linker length, Notch
#1116583

INVESTIGATION OF NEW CRUZAIN INHIBITORS THROUGH ULTRA-LARGE SCALE VIRTUAL SCREENING

Autores: Estela Mariana Guimarães Lourenço,Beatriz Murta Rezende Moraes Ribeiro,Lucas Abreu Diniz,Tetsu Sakamoto,J. Miguel Ortega,Rafaela Salgado Ferreira
Apresentador: Estela Mariana Guimarães Lourenço • estela.mariana@hotmail.com
Resumo:
Chagas disease remains a public health concern, characterized by high mortality rates and a lack of effective treatment, which are frequently associated with limited efficacy and considerable adverse effects. More than a century after its discovery, the development of new drugs for Chagas disease remains a complex and challenging task. Screening large libraries of commercially available compounds represents a promising strategy for identifying novel candidates with improved potency, selectivity, and chemical diversity. However, the rapid and continuous growth of compound libraries makes computational approaches essential for optimizing screening workflows and guiding the prioritization of bioactive compounds. In this work, we developed an ultra-large scale virtual screening workflow to identify novel compounds with potential antichagasic activity. A molecular docking protocol was developed using the DOCK6.12 software. For its validation, 96 cruzain inhibitors, compiled from the literature, and 2,250 decoys generated using DUDE-Z, were used. Molecular docking results were employed to construct a ROC curve and perform enrichment analyses (AUC = 0.87; EF 1% = 16.1; EF 5% = 7.2; EF 10% = 5.8). Using the established molecular docking protocol, score values obtained from a dataset of one million molecules from the ZINC20 database were used to develop a machine learning model. External validation (30% test set) demonstrated high predictive performance (R² = 0.95). Subsequently, a commercially available subset of ZINC20, comprising around 647 million compounds, was screened using the predictive model. The top 6.47 million compounds, based on predicted scores, were selected to molecular docking simulations. The results were analyzed through visual inspection, applying criteria such as the number of chiral centers, synthetic accessibility, and protein–ligand interactions. The most promising candidates will be further evaluated in vitro against the cruzain enzyme.
Palavras-chave: Cruzain, ultra-large scale virtual screening, molecular docking, machine learning
★ Running for the Qiagen Digital Insights Excellence Awards
#1116601

Conformational Plasticity of Toxoplasma gondii CDPK1 and Its Implications for Selective Inhibitor Design

Autores: João Pedro Bezerra Carvalho,Deborah Antunes,Daniel Adesse,Ana Carolina Ramos Guimarães
Apresentador: João Pedro Bezerra Carvalho • joao.bezerra.ismart@gmail.com
Resumo:
Toxoplasma gondii is the protozoan parasite responsible for toxoplasmosis, a globally prevalent disease affecting nearly two-thirds of the world’s population. Despite its widespread impact, therapeutic options remain limited. TgCDPK1, a calcium-dependent protein kinase from T. gondii, has emerged as a promising molecular target for drug development due to its essential roles in key parasite processes such as motility, host cell invasion, and egress. This study employed computational approaches to explore the conformational plasticity of TgCDPK1 and its impact on inhibitor binding. Building on previous work from our group, which conducted structural comparisons between TgCDPK1 and the human kinase BUB1—both featuring a glycine gatekeeper residue—we performed molecular dynamics (MD) simulations of TgCDPK1 complexed with two different ligands (ANP and UW2). These simulations revealed distinct binding site volume profiles throughout the trajectories. Subsequently, a virtual screening campaign was conducted using six known TgCDPK1 inhibitors and ATP against multiple MD-derived frames. The results demonstrated diverse interaction patterns across frames, highlighting the role of conformational variability in modulating ligand binding modes. Additionally, 36 TgCDPK1–ligand complex structures were retrieved from the Protein Data Bank and analyzed for B-factor distribution to identify flexible regions within the kinase domain. Regions with elevated B-factors suggested local flexibility that may contribute to the protein's binding adaptability. A comparative analysis of binding site volumes, performed using DoGSite3 via the ProteinPlus platform, revealed differences of up to 300 ų among TgCDPK1 structures, further supporting the hypothesis of a malleable binding pocket. Altogether, these findings reinforce the notion that TgCDPK1 exhibits a flexible binding site capable of accommodating diverse ligands through conformational adaptation—an important feature for the rational design of selective inhibitors.
Palavras-chave: Toxoplasma gondii, TgCDPK1, Calcium-Dependent Protein Kinase, Conformational Plasticity,
#1116766

BRCA1-mutated triple-negative breast cancer cells treated with Olaparib reveal immune response and structural variation driven by transposable elements

Autores: Daniela Moreira Mombach,Jaqueline Carvalho de Oliveira,Elgion Lucio da Silva Loreto,Pedro A F Galante
Apresentador: Daniela Moreira Mombach • danielamombach@gmail.com
Resumo:
Breast cancer is the most common malignancy in women worldwide, with triple-negative breast cancer (TNBC) posing a significant challenge due to its lack of estrogen receptor, progesterone receptor, and human epidermal growth factor receptor 2 expression. TNBC exhibits high invasiveness, metastatic potential, frequent relapse, and poor prognosis, often harboring TP53 mutations and BRCA1/2 mutations. PARP inhibitors like Olaparib (Ola) leverage synthetic lethality in BRCA1/2-mutated TNBC by adding to lack of homologous recombination repair the impairment of base-excision repair pathways, causing unrepaired DNA damage accumulation and apoptosis. While Ola enhances progression-free survival in germline BRCA-mutated metastatic TNBC, a deeper understanding of the factors driving Ola treatment efficacy and its impact on tumor cells is critical to optimizing therapeutic strategies. Transposable elements (TEs) further complicate this dynamic. Though only a small fraction (<0.05%) of TEs remain active, particularly LINE-1 (L1), they drive genetic instability and diversity, influencing tumorigenesis via cis- and trans-regulatory mechanisms. In cancer, TE dysregulation—fueled by genomic instability and L1 promoter hypomethylation - enables co-evolution with tumor cells, acting as both oncogenic drivers and therapeutic vulnerabilities. On the other hand the relationship between cell repair mechanisms and TE expression and activity is not fully understood, but it seems logical. Therefore, we decided to explore the dynamic interplay when BRCA1 and TP53 are depleted together with PARP inhibition. Our study explores Ola’s effects across four TNBC cell lines - two BRCA1-mutated (SUM1315, MDA-MB-436) and two wild-type (MDA-MB-468, BT549), all TP53-depleted—to explore TE regulation and treatment response. Our findings reveal that Ola enhances immune responses exclusively in BRCA1-mutated cell lines. Further analysis revealed that Ola activates TE-TE chimeras capable of forming hybrid double-stranded RNA (dsRNA) and self-dsRNA structures. Additionally, tumorigenesis-associated gene-TE chimeras, such as GINS1 and BRCC3, were up-regulated in BRCA1-mutated cell lines, highlighting TE-driven immune activation and oncogenic impacts. Comparison of control and Ola-treated long-read whole-genome sequencing of MDA-MB-436 cells identified a L1 somatic insertion and a non-tandem L1 duplication with template switching, the latter marked by a 15 bp deletion and 14 bp microhomology following Ola treatment. This signature implicates microhomology-mediated break-induced replication (MMBIR) as a key repair mechanism under Ola treatment. In the absence of BRCA1-mediated homologous recombination and with PARP trapping, Ola-induced replication fork collapse likely triggers MMBIR, resolving double-strand breaks via short homologous sequences and contributing to structural variation. Structural variations in repetitive DNA regions may drive complex genomic rearrangements, potentially generating neoantigens and sensitizing cells to immunotherapy. All together, our findings highlight Ola’s impact in TNBC, linking BRCA1 depletion, TE activation, and immune response, while suggesting MMBIR-mediated TE activity could both increase instability and offer therapeutic opportunities.
Palavras-chave: triple-negative breast cancer, transposable element, parp inhibitor, BRCA1, structural variation
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1116781

PREDICTOR OF PLATINUM RESISTANCE IN PATIENTS WITH HIGH-GRADE SEROUS EPITHELIAL OVARIAN CANCER

Autores: Cristiane Esteves Teixeira,Nayara Gusmão Tessarollo,Glenerson Baptista,Alexandre Chiavegatto Filho,Mariana Boroni
Apresentador: Cristiane Esteves Teixeira • tncristiane@gmail.com
Resumo:
Ovarian cancer is the most lethal gynecological malignancy, with an estimated 446,000 new cases and 314,000 deaths projected by 2040. High-grade serous ovarian cancer (HGSOC), its most common and aggressive subtype, is often diagnosed at advanced stages, resulting in a poor prognosis. Although standard treatment includes cytoreductive surgery and platinum-based chemotherapy with paclitaxel, survival rates remain low. Machine learning (ML) and bioinformatics approaches present promising tools for predicting treatment response and identifying novel biomarkers, thereby supporting personalized oncology. Clinical, pathological, and transcriptomic data from HGSOC patients in the Cancer Genome Atlas (TCGA) were utilized to build a predictive model for response to platinum-based chemotherapy. Patients were classified as platinum-resistant (progression-free interval, PFI ≤ 6 months) or platinum-sensitive (PFI > 6 months), with further subdivision of sensitive patients into potentially sensitive (PFI > 6 and ≤ 12 months), sensitive (PFI > 12 and ≤ 24 months), and very sensitive (PFI > 24 months). Differential expression gene (DEG) analysis using DESeq2 identified 1,300 DEGs between resistant and sensitive groups, including VEGFD and FGF17. The data were divided into training (70%) and testing (20%) sets. Recursive Feature Elimination (RFE) was applied to select relevant variables, and 400 genes and clinical variables (age, CA-125 levels, and tumor stage) were retained in the model.. A LightGBM model was trained with 10-fold cross-validation. Model performance was evaluated using the area under the ROC curve (AUC) and sensitivity. LightGBM achieved an AUC of 0.655 and a sensitivity of 0.733 in the test set. Neuroligin 1 (NLGN1) and Carnitine Palmitoyltransferase 1C (CPT1C) were highlighted among the top predictors according to AI explainable tools. NLGN1 has been previously linked to platinum sensitivity and overall survival in HGSOC. CPT1C is known to promote tumor cell migration and metastasis, particularly to the ovaries. This LightGBM-based model successfully identified promising biomarkers for predicting chemotherapy response in HGSOC. Validation using RNA-seq data from the INCA cohort is underway. These findings may enhance patient stratification and support the development of personalized treatment strategies in ovarian cancer.
Palavras-chave: machine learning, ovarian cancer, RNA
★ Running for the Qiagen Digital Insights Excellence Awards
#1116782

Leveraging Sample-Specific Strings to Enhance Gene Fusion Detection

Autores: Luisa De Melo Barros Penze,Lucas Peres,Joao Meidanis
Apresentador: Luisa De Melo Barros Penze • l238001@dac.unicamp.br
Resumo:
The advent of long-read sequencing technologies, a major advancement in
next-generation sequencing (NGS), has significantly improved the study of tran-
scripts and RNA-level biological processes. One critical process under study is
gene fusion, a genomic event that can lead to cancer and other diseases. Several
computational tools have been developed to detect gene fusions, most of which
rely on alignment-based methods that compare sequencing data to a reference
transcriptome. While effective, these methods are computationally intensive
and may struggle with noisy or highly rearranged datasets.
This study investigates a potential enhancement in gene fusion detection
through the use of an alignment-free strategy. Specifically, the research lever-
ages a recently developed algorithm capable of identifying sample-specific strings
(SFSs) – strings that are absent in a reference transcriptome but present in a
sample. The objective is to evaluate whether this strategy can enhance fusion
detection pipelines by filtering reads that contain SFSs prior to the search of
gene fusions. By analyzing only a subset of the original data, this method can
reduce the computational burden on alignment-based tools. Furthermore, it
may decrease the number of false positive fusion events because SFSs are ex-
pected to be found in reads that carry relevant information, serving as evidence
of genomic variation.
A custom pipeline was implemented in bash to automate the execution of
three existing gene fusion detection tools: LongGF, JAFFAL and CTAT-LR.
These tools were tested using simulated RNA sequencing data with varying
error rates to mimic real sequencing conditions. Two scenarios were analyzed:
one using the original FASTQ files and another using filtered reads obtained
by the SFS detection tool. A comparative graph was generated to visualize the
impact of filtering on fusion detection performance. The initial findings indicate
that SFS-based filtering has the potential to impact the performance of gene
fusion detection tools. However, the current implementation requires further
refinement to achieve more reliable results. As a method of comparison, the
symmetric difference was computed between the fusion events detected in both
scenarios – those identified by the tools but absent from the simulated ground
truth, and those present in the simulation but not detected. The results show
that SFS-based filtering helped reduce the number of false positives, that is,
fusions detected by the tools but not present in the ground truth. On the other
hand, it led to an increase in false negatives, that is, fusion events present in the ground truth but not detected by the tools after filtering.
The next phases of the project will apply the same testing approach with real
biological datasets to evaluate whether SFS-based filtering can effectively reduce
noise and computational complexity by removing reads that do not contain gene
fusions. In addition, a planned technical improvement involves modifying the
data structure used by the SFS detection tool. The current implementation
is designed for double-stranded sequences, appropriate for DNA, whereas RNA
transcript analysis requires adjustments to support its single-stranded nature.
Palavras-chave: transcriptome, RNA, alignment, cancer, long-read, sample-specific strings, gene fusion, sequencing
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1116794

Stage-Specific Metabolic Network Modeling of Mitochondria and Glycosomes in Trypanosoma brucei: A Flux Balance Analysis Framework for Differential Bioenergetic Topologies

Autores: Mayke Bezerra Alencar,Bruno Ribeiro Pinto,Gabriela Torres Montanaro,Ariel Mariano Silber
Apresentador: Mayke Bezerra Alencar • mayke@usp.br
Resumo:
Trypanosoma brucei, the etiological agent of human African trypanosomiasis (HAT) and
nagana in livestock, exhibits unique metabolic adaptations associated with its parasitic
lifestyle. Deciphering its metabolism is critical for identifying therapeutic vulnerabilities
and understanding its organellar specialization, particularly in mitochondria and
glycosomes, organelles central to energy metabolism. Here, we present iTbMIT, the
first genome-scale metabolic model (GEM) tailored to the mitochondria and
glycosomes of two T. brucei life-cycle stages with starkly divergent organellar
functionalities: the bloodstream form (BSF) and procyclic form (PCF). iTbMIT integrates
genomic, proteomic, and biochemical data from literature and resources
including TritrypDB, KEGG, BRENDA, and BiGG, encompassing 107 genes, 239
reactions, and 233 metabolites for the PCF model, and 107 genes, 244 reactions, and
239 metabolites for the BSF model. The model employs flux balance analysis, flux
variance analysis, and sampling methods to simulate metabolic fluxes, prioritizing ATP
production as the objective function. This model predicts stage-specific metabolic half-
marks, predicting vulnerabilities, and identifying potential drug targets that could
disrupt the parasite's ATP budget. This work advances our understanding of T.
brucei metabolism and provides a valuable computational framework for guiding
experimental validation and developing novel therapeutic strategies against this
pathogen.
Palavras-chave: Genome-Scale Metabolic Model, Metabolic Network, Trypanosoma brucei, Bioenergetics, Metabolism
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1116874

Integrative scRNA-seq Analysis Across Multiple Animal Models Highlights Widespread Excitatory-Inhibitory Network Dysregulation and Conserved Cellular Signatures in Autism Spectrum Disorder.

Autores: João Victor Silva Nani,Victor Jardim Duque,André de Souza Mecawi
Apresentador: João Victor Silva Nani • joao.nani@unifesp.br
Resumo:
Autism Spectrum Disorder (ASD) presents significant heterogeneity, complicating the identification of core pathophysiological mechanisms. While single-cell RNA sequencing (scRNA-seq) offers cellular-level insights, integrating data across diverse studies is crucial to overcome technical variability and leverage the strengths of multiple genetic and environmental animal models. We addressed this by constructing a unified single-cell reference framework, integrating scRNA-seq data from 11 distinct ASD animal models, encompassing 155.132 ASD and 158.075 control cells across various brain regions and developmental stages. Bioinformatic harmonization using reciprocal principal component analysis, followed by high-resolution clustering, differential gene expression (DEG) analysis, and cell communication modeling, enabled robust comparisons. Our comparative analyses identified conserved transcriptomic alterations across numerous neuronal and non-neuronal cell types. Notably, transthyretin (Ttr), crucial for thyroid hormone and retinol transport, was significantly upregulated in multiple cell types, while members of the Olig family of transcription factors, essential for glial development, also showed consistent dysregulation. Furthermore, we observed cell-type specific DEG burdens; for instance, mature oligodendrocytes showed a high overlap between DEGs and functionally enriched genes, including those involved in metabolic regulation (Pcsk1), inflammation (Ppia), and oxidative phosphorylation (Jund), corroborating pathway analyses highlighting mitochondrial dysfunction. In-depth cell-cell communication analyses using CellChat revealed a striking increase in predicted signaling interactions between inhibitory and excitatory neurons in ASD models, strongly supporting the excitatory-inhibitory (E/I) imbalance hypothesis. This was linked to specific dysregulation in pathways including PTN, SLIT, and PTPR signaling. Subsequent NicheNet analysis predicted key ligand-receptor pairs mediating this altered crosstalk, such as inhibitory neuron-derived ligands Pdgfa, Adamts3, and Reln potentially impacting target gene expression (e.g., Arl4a, Egr1) and receptor activity (e.g., Sort1, Nrp1/2) in excitatory neurons. Functional enrichment analyses consistently pointed towards disruptions in synaptic organization, vesicle cycling, extracellular matrix dynamics, and neuroinflammatory pathways across various cell types. Crucially, cross-validation confirmed the clinical relevance of these findings. Many identified DEGs matched high-confidence ASD risk genes in the SFARI database, including cell-type specific dysregulation of Ermn in non-neuronal cells, Atp2b2 (calcium transport) across multiple neuronal subtypes, and key neurodevelopmental transcription factors like Foxg1 (in L5/6 NP neurons) and Mef2c (in MEIS2-like interneurons). Comparison with human postmortem cortical scRNA-seq data revealed significant concordance, particularly in the parietal cortex, with shared and consistent DEGs in excitatory neurons, inhibitory neurons, and mature oligodendrocytes enriching for pathways related to synaptic translation, ribosome function, and neurodevelopmental/neuropsychiatric disorders, further implicating oligodendrocyte dysfunction (DLGAP/NRXN network genes). Ultimately, by harmonizing complex data from diverse ASD models, this integrative approach overcomes model-specific variations to find fundamental, conserved molecular disruptions underlying the disorder. Our unified framework pinpoints robust transcriptomic signatures, particularly the pervasive excitatory-inhibitory network imbalance driven by specific signaling pathways and significant glial contributions, strongly validated against clinical data. This work not only provides a high-resolution database of convergent cellular pathology in ASD but also illuminates critical, shared mechanisms, offering a powerful roadmap for future functional studies and the rational design of targeted therapeutic strategies.
Palavras-chave: Autism Spectrum Disorder, Single-cell RNA Sequencing, Integrative Transcriptomics, Animal Models
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1117003

Evaluation of Sequencing Strategies for the Synchronous Detection of Pathogens in Environmental Samples

Autores: Letícia Lima,Ana Lídia Pires de Assis Pinto,Daniel Andrade Moreira,Maithê Magalhães,Thiago Estevam Parente
Apresentador: Letícia Lima • limaf.leticia@gmail.com
Resumo:
Next-generation sequencing (NGS) has become a fundamental tool in modern pathogen surveillance, enabling the early detection and characterization of infectious agents. The generation of high-resolution genomic data allows for the rapid development of public health responses, including the design of diagnostic assays, therapeutic interventions, and vaccines. Beyond its clinical applications, the use of NGS at the human-animal-environment interface plays a crucial role in identifying potential outbreak threats and monitoring pathogen circulation in urban or natural reservoirsIn this study, aiming to optimize the simultaneous detection of eukaryotic, prokaryotic, and viral pathogens in environmental samples, we used pigeon excreta to test the effectiveness of three high-throughput sequencing strategies: DNA sequencing, mRNA sequencing, and ribosomal RNA-depleted RNA sequencing. All generated libraries underwent quality control and trimming using FastQC and Trimmomatic. Cleaned reads were subjected to taxonomic classification with Kraken2 and Bracken.To interpret the results and reduce the impact of potential false positives, reads per million (RPM) were calculated for each taxon. Based on abundance patterns across the four domains of biodiversity, we established confidence thresholds for pathogen detection. Pathogens exceeding these thresholds were flagged for further validation, and reference-based read mapping is being conducted to confirm their presence and assess classification accuracy.Our findings contribute to the improvement of surveillance strategies by providing a robust approach for broad and sensitive pathogen detection with potential public health impact.
Palavras-chave: Next-generation sequencing, Metagenomics, Pathogen surveillance,
#1117004

Analysis of COVID-19 case notification in Paraná: an approach based on CRISP-DM and multinomial logistic regression models.

Autores: Matheus Da Silva Lizzi,Glaucia Maria Bressan,Elisangela Ap. da Silva Lizzi
Apresentador: Matheus Da Silva Lizzi • lizzimatheus08@gmail.com
Resumo:
Covid 19 is monitored by the Ministry of Health in Brazil, and in the states by the respective State Health Departments. Paraná reported 3 million cases. Furthermore, due to the broad clinical spectrum, there is underreporting, which demands robust analytical approaches. The goal of this study is to implement a complete data analysis flow, from collection to visualization, to evaluate factors associated with the final classification of cases in Paraná, BR, through multiple logistic regression models using the CRISP-DM methodology, with computational support from the R program.
Using CRISP-DM, the study begins with epidemiological understanding of COVID-19 spread in Brazil and focusing in Paraná. We then explore the data, addressing the compulsory notification flow — from basic health units to insertion into the system via the influenza form (suspected COVID-19), according to the e-SUS NOTIFICA manual, including contact tracing. In the collection and preparation stage, the data were extracted from Open DataSUS, selecting annual notifications (2020–2023) by state, with 64 variables from the complete form. Columns such as sex, race/color, age, case evolution (cure, home treatment, hospitalization, and death), symptoms, and final classification (confirmed, discarded, etc.) were filtered. In the preprocessing, incomplete or invalid records were excluded, resulting in an analytical database with 9,286 entries. For statistical modeling, multiple logistic regression models with multinomial probability distribution were applied to understand the final classification based on the selected variables. The evaluation was based on Akaike criteria, residual analysis (Hosmer-Lemeshow), simulated envelope plots, and accuracy. The implementation generated informative dashboard, using the flexdashboards package with the CRISP-DM flow, descriptive graphs, and model results, hosted and significant in HTML. All processing was performed in R, with specific packages and a significance level of 5%.
The results from the simple logistic regression models obtained the raw odds ratio of each predictor and then generated the multiple logistic regression model, obtaining the adjusted odds ratio. Thus, the race/color and sex variations are evidence for the final classification of cases, with black and brown people having a greater chance of being exceptional as discarded cases compared to white people, and men having a lower chance of diagnosis confirmation compared to women. The age variable had no statistical evidence in the final classification. The evolution of the case is informative, since cases with death or hospitalization are strongly associated with confirmed cases, efficient secondary health care in the health system, since mild cases (home treatment) or influenza do not usually evolve to death, and are monitored by primary health care. The diagnosis of the model according to the adjustment metrics was satisfactory, residuals within the assumptions and low AIC.
It is concluded that the CRISP-DM methodology proved to be an interesting resource in public health, since it encompasses everything from understanding the problem to visualizing and disseminating the findings to the general public, managers and health professionals. Furthermore, this work was developed in a high school scientific initiation initiative, enabling initial and intensive computational training in the area of applied data analysis.
Palavras-chave: COVID-19, Notification System, CRISP-DM, Statistical Modeling, Public Health
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1117130

COMPUTATIONAL ASSESSMENT OF EIF4E ISOFORMS IN COWPEA (VIGNA UNGUICULATA (L.) WALP): FROM GENOME MINING TO STRUCTURAL PROFILING

Autores: Madson Allan de Luna Aragão,Saulo Rafael Mendes Penna,Fernanda Alves de Andrade,Carlos André dos Santos Silva,José Diogo Cavalcanti Ferreira,VALESCA PANDOLFI,Ana Maria Benko Iseppon
Apresentador: Madson Allan de Luna Aragão • madsondeluna@gmail.com
Resumo:
Cowpea (Vigna unguiculata (L.) Walp) is an important crop in tropical and subtropical regions, due to its nutritional properties and adaptability. The eukaryotic translation initiation factor 4E (eIF4E) is a key protein involved in cap-dependent translation. Its interaction with viral proteins has been associated with host susceptibility/resistance to viruses of the Potyviridae family, such as the Cowpea aphid-borne mosaic virus (CABMV). This study provides a detailed in silico structural analysis of eIF4E isoforms in V. unguiculata, aiming to identify structural determinants that may influence viral resistance or susceptibility. Gene mining was performed using annotated genomes of common bean (Phaseolus vulgaris) and V. unguiculata, along with pre-release genomic assemblies from five Cowpea cultivars (BR14 Mulato, IT85F-2687, Santo Inácio and Pingo de Ouro). Conserved sequence domains were identified using NCBI’s CD-Search, and sequence alignments were subsequently performed with Clustal Omega and visualized in Jalview. Theorical tridimensional models of the isoforms were generated using AlphaFold3, with model accuracy evaluated through pLDDT and PAE scores and validated using ProSa-web, PROCHECK and QMEANDisCo metrics. Molecular Dynamics (MD) simulations were performed using GROMACS 2022.4 with the GROMOS 53A6 force field in a solvated system under physiological conditions (0.15 M NaCl) for 100 ns. Structural stability and conformational flexibility were assessed via RMSD, RMSF, radius of gyration and hydrogen bond analysis. Electrostatic surface potential maps were generated using the APBS server to analyze the charge distribution across the molecular surfaces. Genomic mining of Cowpea cultivars revealed the presence of three genes encoding eIF4E isoforms, located on chromosomes 4, 6 and 7. The protein sequences corresponding to the isoforms from chromosomes 4 and 7 exhibited high sequence conservation, whereas the isoform encoded on chromosome 6 showed lower conservation, suggesting potential functional divergence, altered affinity for the 5' cap region, or distinct involvement in translation hijacking mechanisms during phytopathogen infections, such as those caused by CABMV. The tri-dimensional modeling revealed that eIF4E isoforms in V. unguiculata maintain a conserved "cupped hand" fold, typical of cap-binding proteins. Notably, isoforms encoded by chromosomes 4 and 7 showed significant structural similarity, particularly due to the presence of positively charged regions and flexible loops in the region involved in interacting with the mRNA 5' cap and eIF4G. In contrast, chromosome 6 isoforms displayed a distinct electrostatic potential and enhanced rigidity relative to the others, potentially impairing their interaction with the translation machinery. These findings suggest that the structural diversity among eIF4E isoforms may underlie differential responses to translation regulation and virus susceptibility in Cowpea cultivars. Altogether, these insights support future efforts in breeding and biotechnological approaches aimed at enhancing viral resistance in V. unguiculata.
Palavras-chave: Translation initiation factor, Potyvirus, Pathogen Resistance, Structural characterization, Molecular modeling, Molecular dynamics
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1117324

Acetohydroxamic acid as a potential inhibitor of the urease enzyme in Acinetobacter spp. clinical isolates

Autores: Ana Lídia Pires,Fabio Mota
Apresentador: Ana Lídia Pires • anaassis.10@gmail.com
Resumo:
Urease is a nickel-containing metalloenzyme that hydrolyzes urea into ammonia and carbon dioxide, facilitating colonization of mammalian host environments. Bacterial urease is recognized as an important virulence factor in pathogens such as Helicobacter pylori, Klebsiella pneumoniae, and Proteus mirabilis. Acetohydroxamic acid (AHA) is a well-characterized urease inhibitor and is approved in some countries for the treatment of urinary tract infections (UTIs) caused by urease-producing bacteria, including Proteus mirabilis. AHA acts by reversibly inhibiting the bacterial enzyme urease, thereby reducing the hydrolysis of urea and the production of ammonia, which leads to a decrease in urinary pH and ammonia levels. This mechanism helps prevent the formation of struvite stones and supports the effectiveness of antibiotics in treating chronic infections. Recent studies have demonstrated that AHA can also inhibit urease from other bacteria, such as K. pneumoniae. The aim of this study was to use structural modeling, molecular docking, and molecular dynamics simulations to evaluate whether AHA could inhibit urease from clinical isolates of Acinetobacter baumannii, another major pathogen associated with multidrug-resistant nosocomial infections and frequent hospital outbreaks. Gene sequences encoding the urease subunits from clinical isolates of Acinetobacter were retrieved from the NCBI RefSeq database and translated into predicted protein sequences. The most prevalent urease variant was structurally modeled using AlphaFold3, and nickel ions were positioned in the active site with AlphaFill. Molecular docking with AutoDock Vina was performed to assess the interaction with AHA, revealing an affinity of -4.3 kcal/mol in the most favorable binding mode. Structural superposition of the modeled Acinetobacter urease with the Klebsiella aerogenes urease (PDB ID: 1FWE) complexed with AHA yielded an RMSD of 0.310 Å, indicating high structural similarity. The residues involved in inhibitor binding occupied equivalent positions to those in the experimental structure, suggesting conservation of key interactions between the two genera. These findings indicate that urease may also be a relevant therapeutic target in the treatment of clinical Acinetobacter isolates and support the potential repositioning of acetohydroxamic acid and other urease inhibitors as therapeutic strategies against pathogenic strains of Acinetobacter.
Palavras-chave: Acinetobacter; Urease; Acetohydroxamic acid; Molecular docking
#1117378

Altered microglial communication in Alzheimer’s dementia

Autores: Loren dos Santos,Ricardo A. Vialle,Yanling Wang,Shinya Tasaki,David A. Bennett,Roberto Tadeu Raittz,Katia de Paiva Lopes
Apresentador: Loren dos Santos • litssantos60@gmail.com
Resumo:
Microglia are the resident immune cells of the central nervous system, playing a crucial role in maintaining neural homeostasis and responding to pathological events. In their resting state, they continuously survey the microenvironment with highly motile processes, detecting subtle changes in neuronal activity and synaptic integrity. Upon activation, triggered by injury, infection, or neurodegeneration, microglia undergo morphological and functional changes, adopting phenotypes that can be neuroprotective or neurotoxic, depending on the context. Emerging research has highlighted their involvement not only in classical immune responses but also in synaptic pruning, neurodevelopment, and the modulation of neuronal networks. Here, we leveraged single-nucleus RNASeq (snRNASeq) data previously generated from postmortem human dorsolateral prefrontal cortex (DLPFC) of 449 participants from the Religious Orders Study (ROS) and the Rush Memory and Aging Project (MAP). These studies include the full spectrum of brain states among older individuals, including those with Alzheimer’s dementia (AD), mild cognitive impairment (MCI), and non-AD controls. Released data included clusterization and annotation of 86,612 cells into 16 microglial populations. Cell–cell communication analysis was performed in two ways: (1) by including cells from all participants to generate a comprehensive map of ligand–receptor interactions across the entire microglial repertoire of the DLPFC; and (2) by performing analyses separately between AD and non-AD cases in order to highlight differences in cell communication in the context of the disease. We identified 17,668 pairs of ligand–receptor interactions, of which 240 (approximately 9.6%) were prioritized for downstream analysis. These interactions originated from 73 signaling pathways (e.g., SPP1–ITGAV–ITGB1, APP–TREM2–TYROBP). Comparative analysis revealed that the interaction strength varied depending on diagnosis or microglia-specific populations. The highest number of outgoing interactions was observed in Mic.15, a subgroup of inflammatory cells enriched for immune response, cell chemotaxis, and phagocytosis (CD83⁺CCL3/4⁺). While Mic.11, characterized by stress response cells, displayed the highest number of incoming communications. Overall, participants with AD showed 17.2% fewer interactions than the non-AD group (13,527 interactions in AD versus 16,337 interactions in non-AD), and 218 ligand–receptor unique pairs were differentially expressed in AD. Upregulated interactions included glutamatergic signaling (involving NMDA, AMPA, Kainate, and Metabotropic receptors), lipid metabolism pathways such as Cholesterol–LIPA_RORA, and interactions regulated by disease-associated microglial genes (e.g., APOE–TREM2–TYROBP), with ICAM1–ITGAX–ITGB2 occurring exclusively in AD. Inflammatory cytokine interactions, such as those involving SPP1, were also upregulated. Downregulated interactions included chemokines such as CCL3–CCR1 and CXCL12–CXCR4. These findings underscore the complexity of microglial communication in the aging human brain and highlight distinct ligand–receptor interaction patterns associated with AD. By mapping these cell–cell signaling networks, our study aims to provide new insights into microglial contributions to neurodegeneration.
Palavras-chave: Microglia, Cell-cell communication, Alzheimer's disease, Ligand-receptor, Single-nucleus RNA-seq
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1117418

Search and characterization of ion channels involved in TMEM176B under different conditions to control immune systems

Autores: SIMONE QUEIROZ PANTALEAO,YOLANDA MARIA BARROS MARCELLO,Eric Allison Philot,Roberto C. N. Quiroz,Angelo J. Magro,Ignacio J. General,Marcelo Hill,K?THIA MARIA HON?RIO,Ana Ligia Scott
Apresentador: SIMONE QUEIROZ PANTALEAO • simone.qrzp@gmail.com
Resumo:
The TMEM176B protein is a biological target related to oncogenesis, tumor suppression, and biomarkers. It is an ion channel protein expressed in cells of the immune system whose mechanism of action has not yet been fully elucidated, making it essential to study. In this work, we analyzed the physicochemical profile of regions with key channel characteristics, considering the dimer, trimer, and tetramer forms, under pH variations (4.0, 4.5, and 5.0), using the MOLE 2.51,2 online server, to evaluate the charge distribution and polarity of these regions. The protocol defined was composed by the steps: (I) calculation of the Delaunay triangulation and the Voronoi diagram; (II) construction of the protein's molecular surface; (III) identification of cavities; (IV) determination of the entrance to the possible channels; (V) determination of the exit from the channels; (VI) location of the most suitable channels and (VII) filtering. The location of the channels considered the refinement of the weight functions, weighting length, radius, and geometric differences using functions such as the Voronoi scale weight function, prioritizing channels located towards the interior of the protein, adjusting tolerances to bottlenecks and discarding regions not suitable for ion passage. To analyze the probable channels, we identified transmembrane pores and profiles in terms of layers, geometry, length, and radius, determining the composition of the residues lining these regions. We evaluated each channel detected by the tool, observing the cavities and molecular surfaces of TMEM176B, obtaining the length of the channels, bottleneck radius, hydropathy, charge, polarity, mutability, lipophilicity in terms of LogP (octanol/water partition coefficient of the fragments surrounding the channel) and LogD (lipophilicity of ionizable compounds), LogS (water solubility of the fragments surrounding the channel) and ionizable residues. According to the parameters tested, it was observed that the tetrameric form of the TMEM176B protein at pH 4.0 showed a behavior more compatible with the biological function of a channel protein than the other systems tested. Once we understood the behavior of the channel, we applied the virtual screening technique to identify bioactive molecules present in the database of Brazilian natural products (NuBBEDB3,4) with the AutoDockVina program. The primary filters used were: binding energy; ligand efficiency calculation; molecular interactions detected by the BINANA algorithm; ADME-Tox properties (absorption, distribution, metabolism, excretion, and toxicity), using the SwissADME, pkCSM, ADMETLab, EmolTox, and PHARMIT tools. In total, molecules from the following classes were checked: alkaloids, amino acids and peptides, aromatic derivatives, carbohydrates, flavonoids, chalcones, lignoids, lipids, polyketides, phenylpropanoids, tannins, and terpenes, allowing the identification of candidate molecules for TMEM176B inhibition with satisfactory physicochemical profiles.
Palavras-chave: TMEM176B, channels, physicochemical characteristics, pH, natural products.
#1117462

The role of macrophage cell-states in triple-negative breast cancer therapy response

Autores: Lucas Aleixo Leal Pedroza,Gabriela Rapozo Guimarães,Fabricio Souto,Leandro de Oliveira Santos,Mariana Boroni
Apresentador: Lucas Aleixo Leal Pedroza • lucas.aleixoleal17@gmail.com
Resumo:
Breast cancer (BRCA) is the most common and leading cause of cancer-related deaths among women, especially in the triple-negative molecular subtype (TNBC), where there is low responsiveness to treatments and high recurrence. This is largely attributed to the complexity of its tumor microenvironment (TME) and the role of immune cells, particularly macrophages, which possess high plasticity and the capacity to interact with diverse cell types, activating signaling pathways that contribute to therapeutic resistance, including immunotherapies. This work aims to investigate the treatment-resistance mechanisms associated with macrophage subpopulations in TNBC using Single-Cell RNAseq (scRNA-seq) data. Six public scRNA-seq datasets from 414 human biopsies were downloaded - 191 normal, 131 Naive, 32 anti-PD-1, 11 neoadjuvant_chemotherapy + anti-PD1, 33 radiotherapy + anti-PD1, 15 neoadjuvant_chemotherapy. The data underwent rigorous quality control, integration, and annotation using scVI and scANVI tools. Compositional analysis, differential gene expression, and pathway enrichment were performed using scCODA MAST and ClusterProfiler, respectively. After quality control, 1.4 million cells were integrated and annotated into 13 cell types, including mononuclear phagocytes, which were re-clustered and annotated into monocytes, dendritic cells, and macrophages. These were annotated according to their ontogeny as Resident Tissue Macrophages (RTM) marked by FOLR2 and PLTP, and monocyte-derived macrophages (Mac) marked by VCAN and S100A8. The macrophage subpopulations were annotated based on their enriched gene expression programs, revealing distinct transcriptional and functional profiles. These included: (i) interferon-primed macrophages (Mac-IFN and RTM-IFN), characterized by high expression of IFIT1 and IFIT2; (ii) antigen presentation-associated macrophages (Mac-AgPress), upregulated by genes involved in the MHC class II antigen presentation pathway, including C3; and (iii) lipid-associated macrophages (Mac-LA), defined by expression of lipid metabolism–related genes such as LPL, FABP4, and TREM2. Additionally, we identified (iv) interstitial tissue-resident macrophages (RTM-INT), which were annotated based on their anatomical localization rather than a functional gene program, and were marked by LYVE1, associated with vascular and interstitial compartments. When analyzing the relative distribution of cellular state in TNBC by treatment using scCODA, a significant amount of Mac-AgPress (p ≤ 0.0001) and RTM-INT (p ≤ 0.001) was observed in normal samples, while these were virtually absent in tumor samples under any treatment, and treatment-naive patients. These findings are consistent with the literature that shows the antigen presentation is impaired in BRCA as a form of immune evasion. On the other hand, Mac_LA (p ≤ 0.001), Mac-IFN (p ≤ 0.01), and RTM-IFN (p ≤ 0.0001) were significantly present in samples treated with PD-1 inhibitors, either alone or combined with radiotherapy, and were scarcely present in treatment-naive patients and those who received only neoadjuvant chemotherapy, and almost undetectable in normal samples. When observing the expression of exhaustion markers among cellular states, it was noted that Mac-LA showed high expression of PDCD1, which encodes for PD1, in samples treated with all approaches involving anti-PD1. This suggests a potential role of Mac-LA in reducing the effectiveness of immune checkpoint inhibitors such as anti-PD1 through pharmacological interaction, inducing an immunosuppressive profile in the TME and exhaustion of CD8 T lymphocytes.
Palavras-chave: Breast Cancer, scRNA-seq, Macrophage, Immunotherapy
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1117591

Uncovering the Heterogeneity in Ovarian Cancer through Spatial Transcriptomics

Autores: Laís Costa Soares,Cristóvão Lanna,Nayara Gusmão Tessarollo,Pedro Videira Pinho,Alessandra Serain,Diego Gomes,Luciana de Castro Moreeuw,Fabiane Macedo,Cláudia Bessa,Andreia Melo,João Viola,Mariana Boroni
Apresentador: Laís Costa Soares • laiscosta.biomed@gmail.com
Resumo:
Ovarian cancer is the 7th most common malignancy in women worldwide and the 8th in Brazil. High-grade serous ovarian carcinoma (HGSOC), which accounts for 70% to 80% of ovarian cases, is aggressive and invasive, with a 70% to 85% recurrence rate after treatment within 18 to 24 months. Patients are usually diagnosed at advanced stages due to both nonspecific symptoms and the anatomical location of the ovary, which favors metastatic progression. This study aims to investigate the cellular and molecular heterogeneity associated with spatial patterns of gene expression in the tumor microenvironment, focusing on different responses to chemotherapy and its impact on tumor resistance. This project was approved by the Research Ethics Committee under protocol number 12681819.2.0000.5274. Fresh frozen samples were provided by the National Tumor Bank of four patients classified as good responders (GR, n = 2) and poor responders (PR, n = 2), according to their overall survival. The samples were stained with H&E and imaged using the Aperio ImageScope software. The mRNA was captured using the Visium Spatial Gene Expression kit (10x Genomics) and sequenced on the NovaSeq 6000 platform (Illumina). Initially, the data underwent quality control, including the removal of off-tissue spots, spots with high mitochondrial RNA content, low gene counts and/or low overall counts. Outliers were also excluded based on the median absolute deviation. Subsequently, the samples were integrated using the scvi-tools framework and clustered using the Leiden algorithm at a resolution of 0.5. Marker genes for each cluster were identified via differential expression analysis using the Wilcoxon Rank Sum test. Using DESeq2 to analyze cluster counts per sample, we identified clusters 4 and 6 as significantly enriched in specific response groups. Cluster 4, enriched in the PR group, exhibited marker genes such as SEPHS2 and CYP2S1, associated with selenoamino acid and drug metabolism, respectively. Gene Set Enrichment Analysis (GSEA) using ClusterProfiler and the REACTOME database indicated that this cluster was significantly enriched in pathways related to selenoamino acid metabolism and biological oxidation - processes involved in cellular detoxification and potentially linked to chemotherapy resistance. Conversely, cluster 6, predominant in the GR group, showed downregulation of these pathways and was enriched in p53 stabilization and cell cycle checkpoint pathways, with key markers including CDKN2A and AURKB, supporting this profile. To assess the relevance of these clusters in a broader context, we conducted deconvolution analysis using BayesPrism on a TCGA cohort (n = 429). In samples with a higher proportion of the cluster 1 signature - characterized by elevated epithelial markers and presumed tumor identity - we observed a lower representation of cluster 6. Our findings suggest that the pathways enriched in cluster 4 may contribute to tumor progression, whereas those associated with cluster 6 may be linked to the restriction of tumor cell proliferation. Therefore, ongoing analysis will help validate the outcomes and determine their relevance.
Palavras-chave: Spatial Transcriptomics, Ovarian Cancer, Tumor Microenvironment
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1118140

Decoding Cenostigma pyramidale: First High-Quality Nuclear Genome of a Tree Native to the Caatinga

Autores: Ana Luíza Trajano Mangueira de Melo,Flávia Layse Belém Medeiros,Manassés Daniel da Silva,MARIA LUIZA CARVALHO FARIAS,Eliseu Binneck,José Ribamar Costa Ferreira Neto,Ana Maria Benko-Iseppon
Apresentador: Ana Luíza Trajano Mangueira de Melo • analuizat163@gmail.com
Resumo:
Cenostigma pyramidale (commonly referred to as Catingueira) is a native extremophile leguminous species known for its remarkable drought tolerance and ability to thrive under salinity. Despite its considerable biotechnological potential, C. pyramidale remains largely unexplored from an omics perspective. To address this gap, the present study aimed to sequence and characterize the complete nuclear genome of this species. Plant material was germinated in a greenhouse at the Universidade Federal de Pernambuco (UFPE), followed by leaf tissue collection and genomic DNA extraction, quantification, and purification. Libraries were prepared using the Illumina Nextera™ DNA Flex library preparation kit. Paired-end sequencing was performed on the Illumina NovaSeq 6000 platform, and genome size was estimated using the Partec CyFlow Space instrument. Raw reads were subjected to quality assessment using FastQC (version 0.11.9) and MultiQC (v. 1.10.1), followed by adapter and primer trimming using Trimmomatic (v. 0.32). De novo assembly was executed with Velvet software (v 1.2), and genome completeness was evaluated using gVolante with the BUSCO pipeline (Benchmarking Universal Single-Copy Orthologs v.5; Fabaceae ortholog set). For genome annotation, protein-coding gene models were predicted via GeneMark-ET (v.4.6) and AUGUSTUS (v 3.3.3). Putative gene functions were inferred through pattern matching against Pfam, UniProtKB/SwissProt, and ten additional databases. General gene features were determined using Genestats script. The estimated genome size of C. pyramidale was ~0.88 Gb. The assembly yielded 378,944 scaffolds, spanning 731.77 Mbp, with 35.52% a GC content, and ~85.30× genomic coverage. Regarding general gene features, 148,285 loci were identified, which was higher than those of other legume genomes, such as soybean (Glycine max; 46,430 loci), Stylosanthes scabra (60,220 loci), and peanut (Arachis hypogaea; 66,469 loci). The analyzed genes averaged 948.43 bp in length, with a mean of 4 exons per gene—a value significantly higher than that reported for Trifolium pratense (~3.6 exons), another Fabaceae species. The genome assembly exhibited an average gene density of one locus per 5,934 bp. In addition, 148,593 mRNA transcripts were identified, accounting for alternative splicing events, along with 1,044 transfer RNA (tRNA) genes. Among the predicted putative proteins, 2,065 were classified as peptidases and 5,740 as secreted proteins. The assembly validation revealed that 68.32% of core eukaryotic genes were completely assembled in C. pyramidale, with completeness increasing to 94.72% when partial gene models were incorporated. This genome completeness profile closely parallels that of S. scabra (94.4%), though notably lower than the near-complete BUSCO score observed in Phaseolus vulgaris (99.2%). These findings provide unprecedented insights into the structural composition of the C. pyramidale genome, addressing a critical knowledge gap in extremophilic tree species endemic to the Caatinga biome.
Palavras-chave: Omics, Fabaceae, drought tolerance, extremophile.
#1118157

Genome-Wide Identification and Analysis of Transposable Elements in Cenostigma pyramidale

Autores: Ana Luíza Trajano Mangueira de Melo,Flávia Layse Belém Medeiros,Manassés Daniel da Silva,MARIA LUIZA CARVALHO FARIAS,Eliseu Binneck,José Ribamar Costa Ferreira Neto,Ana Maria Benko-Iseppon
Apresentador: Ana Luíza Trajano Mangueira de Melo • analuizat163@gmail.com
Resumo:
Transposable elements (TEs) are repetitive DNA sequences that are capable of mobilizing and replicating throughout the genome. Interestingly, once dismissed as “junk DNA”, these elements are now recognized as key drivers of mutations, gene duplications, and alterations in gene expression. Cenostigma pyramidale, commonly known as Catingueira, is a leguminous species native to Brazil that is notable for its remarkable tolerance to drought and high soil salinity. Given its notable physiological resilience, this study aimed to identify, characterize, and annotate TEs within its nuclear genome. The studied nuclear genome was provided by the Laboratório de Genética e Biotecnologia Vegetal at the Universidade Federal de Pernambuco (LGBV | UFPE). To search and identify TEs, a repeat sequence database was constructed using a combination of de novo and homology-based prediction approaches, employing the software RepeatModeler2 and RepeatMasker (https://www.repeatmasker.org/), respectively. The latter tool was used to predict and quantify the percentage of repetitive sequences within the C. pyramidale genome. The analyzed genome was predominantly composed of repetitive sequences, accounting for 61.6% (487,820,523 bp) of the sequenced nucleotides. The largest proportion consisted of interspersed repeats (51.13%), with TEs representing the most significant fraction. These were further classified into three main categories: unclassified transposons (32.48%), retrotransposons (15.23%), and DNA transposons (1.69%). The TE’s proportion found in C. pyramidale (61%) was lower than that reported at the literature for wheat and barley (approximately 85%) but similar to that of Stylosanthes scabra (~60%), an important extremophilic legume. Therefore, the abundance of mobile elements in plants varies considerable. Among retrotransposons, long terminal repeat (LTR) elements emerged as the predominant subclass (10.72% of genomic content), with the Gypsy/DIRS1 (5.50%) and Ty1/Copia (5.21%) superfamilies constituting the major representatives. Notably, LTR retrotransposons exhibit extraordinary proliferation in plant genomes, comprising as much as 75% of nuclear DNA content- a genomic phenomenon that underscores their evolutionary significance in shaping plant genome architecture. In addition, among non-LTR retrotransposons, the L1/CIN4 element (2.85%), belonging to the LINE superfamily, was the most abundant. Genomic analysis revealed Hobo-activator elements as the predominant DNA transposon class (0.22% genome coverage). The assembly comprised 48.87% non-repetitive sequences encoding 148,285 putative protein-coding genes. Remarkably, retrotransposons dominated the mobile genetic element repertoire (15.23% genomic abundance), a pattern consistent with eukaryotic genome evolution trends. The results of this study provide initial insights into the structural composition of C. pyramidale genome, particularly the substantial contribution of TEs. Furthermore, they establish a foundation for a deeper understanding of the potential role of these elements in promoting the genetic diversity associated with C. pyramidale.
Palavras-chave: mobilome, repetitive sequences, Fabaceae, transposons, extremophile.
#1118176

Beyond the Glycan Shield: In Silico Antibody Engineering for Selective Targeting of Gal-3BP

Autores: Andrielly Henriques dos Santos Costa,Jean Vieira Sampaio,Aline de Oliveira Albuquerque,Eduardo Menezes Gaieta,Diego da Silva Almeida,Patrick England,Geraldo Rodrigues Sartori,João Herminio Martins Da Silva
Apresentador: Andrielly Henriques dos Santos Costa • andrielly.costa@fiocruz.br
Resumo:
Cancer represents a complex group of diseases characterized by uncontrolled cellular growth and proliferation, often accompanied by the evasion of immune surveillance and resistance to conventional therapies. Among the emerging therapeutic strategies, antibody-drug conjugates (ADCs) have gained significant attention due to their ability to selectively deliver cytotoxic agents to tumor cells, thereby enhancing treatment specificity while reducing systemic toxicity. ADCs rely on high-affinity antibodies to guide cytotoxic payloads to specific tumor-associated antigens. However, variability in the target antigen structure and post-translational modifications can interfere with antibody binding and compromise the therapeutic efficacy. Galectin-3 Binding Protein (Gal-3BP) has emerged as a promising target in oncology owing to its involvement in various cancer-related processes, including immune modulation, angiogenesis, metastasis, and tumor progression. Notably, Gal-3BP is a highly glycosylated protein, and its glycan composition can vary significantly depending on cellular and microenvironmental context. This variability can affect the interactions between antibodies that recognize glycosylated epitopes, leading to inconsistent therapeutic responses. Previous studies have demonstrated that antibodies such as SP-2, which interact through carbohydrate-dependent recognition, can be employed as carriers in ADC platforms targeting Gal-3BP, showing efficacy in in vitro settings. Avoiding carbohydrate-dependent recognition may improve specificity, and contribute to predictable therapeutic outcomes. In this study, we aimed to design antibody mutants capable of binding to the non-glycosylated epitopes of Gal-3BP using a comprehensive in silico pipeline. The 3D structure of Gal-3BP was modeled using AlphaFold2 and validated using VoroMQA and QMEANDisCo3. Based on published experimental data and predictors, three surface-accessible epitopes were selected as the candidate binding regions. An in-house naïve antibody library comprising 800 structures was screened using HADDOCK3 for docking against the dimeric form of Gal-3BP. The use of the dimeric conformation was supported by modeling results, which indicated that multimerization is necessary to stabilize flexible loop regions. From this workflow, five antibody candidates with favorable HADDOCK and REF-15 scores were identified for further development. Following the docking step, the antibody-antigen complexes were analyzed using heated molecular dynamics (MD) simulations to assess flexibility and complex stability. Despite observing elevated interface RMSD values (>5 Å) during the MD simulations, suggesting initial instability, these findings provide a basis for affinity maturation through rational CDR engineering. Selected antibody candidates were then subjected to a CDR-swapping and mutagenesis integration protocol (Ab-SELDON) to optimize their interactions with the target epitopes. This pipeline led to a significant reduction in the interaction energy (from -42.14 to -84.88 REU and from -37.50 to -79.14 REU). The mutant CDR loops exhibit increased extension, improved surface complementarity, and enhanced antigen engagement. Future steps will also include validation of the complexes through extensive molecular dynamics simulations. These in silico findings offer a valuable framework for designing mutant antibodies capable of targeting the non-glycosylated regions of Gal-3BP, potentially enhancing the consistency and therapeutic potential of ADCs. The selected candidates will undergo experimental validation to confirm their binding specificity and functional efficacy in relevant cancer models.
Palavras-chave: Galectin-3 Binding Protein, glycosylation, Antibody-drug conjugates, cancer immunotherapy
#1118415

Transcriptomic profiling of exosomal biomarkers coupled with gold Nanoparticle Immunosensing for early gastrointestinal cancer detection

Autores: Douglas Felipe de Lima Silva,Raimundo Fernandes de Araújo Júnior,George Alexandre Lira,Norma Lucena Cavalcanti Licinio da Silva,Elizabeth Costa Sobral De Albuquerque,Rafaela Torres Dantas Da Silva,Emily Lima Oliveira,Ryan Carlos
Apresentador: Douglas Felipe de Lima Silva • dougbti2022@gmail.com
Resumo:
Colorectal and pancreatic cancers continue to rank among the leading causes of cancer-related mortality globally, primarily due to their asymptomatic nature at early stages and the limited diagnostic accuracy of current biomarkers. In this study, we explored an integrative diagnostic strategy combining analysis of RNA-Seq datasets with the experimental development of a lateral flow immunoassay employing gold nanoparticles (AuNPs) for the detection of tumor-derived and bacterial exosomes. We conducted differential expression analysis using publicly available RNA-Seq data from the NCBI repository, focusing on identifying transcripts significantly upregulated in colorectal and pancreatic tumor tissues. Biomarkers including CD63, HSP70, CD44v6, CD133, EpCAM, and mutant Kras were selected based on their overexpression profiles and established roles in tumor progression and prognosis. Additionally, we employed SignalP and SecretomeP prediction tools to identify candidate peptides secreted via classical and non-classical pathways, respectively, highlighting their potential utility as circulating tumor antigens detectable in blood and fecal exosomal fractions. To experimentally validate these findings, we engineered a prototype diagnostic platform incorporating antibody-functionalized AuNPs to enable rapid, non-invasive detection of exosomes in plasma and stool samples. This device targets both tumor-derived and bacterial exosomes, with the latter being analyzed through 16S rRNA gene sequencing to explore microbiome-exosome interactions and their relevance in oncogenesis. Preliminary validation assays, including ELISA, Western blotting, and optical refractive index measurements, were conducted to assess biomarker expression and optimize sensor sensitivity and specificity. This multidimensional approach, integrating transcriptomics, proteomics, and microbial ecology, provides a robust framework for the identification and translation of novel cancer biomarkers. Our findings suggest that this platform holds promise for improving early detection of gastrointestinal malignancies and could support future precision oncology applications by facilitating the stratification of high-risk individuals through minimally invasive screening.
Palavras-chave: Exosomes, RNA-Seq, Gold Nanoparticles, Gastrointestinal Cancer, Immunoassay
★ Running for the Qiagen Digital Insights Excellence Awards
#1119029

Explainable Deep Learning for Malaria Diagnosis in Low-Resource Settings

Autores: Sthefanie Monica Premebida,Paulo Victor dos Santos,Ana Carolina Ataya,Heron dos Santos Lima,MARCELLA SCOCZYNSKI RIBEIRO MARTINS
Apresentador: Ana Carolina Ataya • 24009623@uepg.br
Resumo:
Malaria remains a major public health concern in tropical and subtropical regions worldwide, particularly in Brazil. Caused by protozoa of the Plasmodium genus and transmitted by the bite of infected Anopheles mosquitoes, the disease is especially prevalent in the Amazon region, which accounts for over 99% of all reported cases in the country. The most common species found in Brazil are Plasmodium vivax and Plasmodium falciparum, with the former responsible for the majority of infections. Despite significant advances in control strategies, malaria continues to pose challenges for early diagnosis, treatment, and epidemiological surveillance, especially in remote or underserved areas. Microscopic examination of stained blood smears remains the gold standard for diagnosis; however, it is highly dependent on the availability of skilled personnel and adequate laboratory infrastructure. In this context, the application of artificial intelligence (AI) to automate and enhance malaria diagnostics has emerged as a promising alternative to improve both efficiency and accessibility. This study explores the use of machine learning models for the automated classification of malaria-infected blood cells in microscopic images. Two deep learning architectures are evaluated: ResNet50, a classical convolutional neural network, and Swin Transformer, a recent attention-based model. Employing 5-fold cross-validation and hyperparameter optimization using Optuna, the models were compared in terms of recall, loss, and stability. Results show that, after optimization, the Swin Transformer outperformed ResNet50 in both recall and consistency, achieving an average recall of approximately 0.991. Future work includes the use of a new dataset acquired from a low-cost microscope without Giemsa staining, introducing further challenges for model generalization. Additionally, the study intends to incorporate explainable AI (XAI) techniques to support clinical decision-making and enhance trust in automated diagnostic tools.
Palavras-chave: malaria diagnosis, explainable AI, deep learning
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1119098

Where does bioinformatics fit in Brazil's National Learning Standards? Mapping its pedagogical potential in high school Natural Sciences and Computing education

Autores: Carolina Corado,Dr. João Paulo Matos Santos Lima,Beatriz Stransky
Apresentador: Carolina Corado • carolbio@gmail.com
Resumo:
Integrating bioinformatics into secondary education remains a global challenge, particularly in alignment with national curricula and educational goals. This study addresses this gap by investigating the pedagogical potential of bioinformatics within Brazil’s national learning standards, Base Nacional Comum Curricular (BNCC). The BNCC is a normative and mandatory document that establishes the essential learning outcomes and competencies to be developed by all students throughout Basic Education (ages 4-17), encompassing early childhood, elementary and high school. It guides the development or adaptation of school curricula and pedagogical approaches by all public and private educational institutions across the country. At the high school level, BNCC is structured into four broad knowledge areas: Languages and their Technologies, Mathematics and their Technologies, Natural Sciences and their Technologies, and Human and Social Sciences Applied, and is complemented by specific guidelines for Computing Education within a separate national document. Despite these national standards addressing both foundational scientific and computational skills, the explicit integration of bioinformatics within it has yet to be fully explored or pedagogically articulated. To guide this investigation, we designed an analytical and interpretative framework consisting of five guiding questions informed by recent literature in curriculum studies, scientific literacy, ethics, and digital education, with a focus on high school skills in Natural Sciences and Computing. The framework evaluates each skill in terms of thematic relevance, technical feasibility, potential for student-centered learning with real-world data, socio-environmental significance, and alignment with BNCC’s general competencies. Using this approach, we analyzed 51 BNCC skills across three pedagogical dimensions: (1) content (alignment with bioinformatics subject matter); (2) methodology (use of computational tools and data-driven practices), and (3) context (connection to real-world challenges). The analysis showed stronger alignment in Natural Sciences for bioinformatics as content and interdisciplinary context, while Computing Education emphasized methodological practices, highlighting the complementarity between scientific and computational reasoning. These patterns indicate that bioinformatics is best introduced not as an isolated subject, but as a resource to enhance existing curriculum areas, particularly by connecting scientific concepts with computational methods. Rather than proposing new content, this approach supports the use of bioinformatics to deepen student understanding of current topics through data analysis, real-world applications, and ethical discussions. Because many skills in the BNCC already align with this potential, bioinformatics can be integrated in practical and meaningful ways, without major changes to the curriculum. This framework lays an evidence-based foundation for integrating bioinformatics into secondary education in alignment with the BNCC. It offers concrete guidance for educators and researchers to collaboratively design learning experiences that promote inquiry, ethical engagement, and the development of essential digital and analytical competencies.
Palavras-chave: Bioinformatics Education; BNCC (Base Nacional Comum Curricular); Pedagogical Framework; Interdisciplinary Teaching
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1119247

BioRemPP: A Genomic Platform for Bioremediation Potential Analysis of Environmental Pollutant

Autores: Douglas Felipe de Lima Silva,Lucymara Fassarella Agnez-Lima
Apresentador: Douglas Felipe de Lima Silva • dougbti2022@gmail.com
Resumo:
Bioremediation is a sustainable strategy for mitigating environmental contamination caused by
industrial, agricultural, and urban activities. In response to the growing need for accessible tools
that can support genomic approaches to pollutant degradation, we present BioRemPP (Bioremediation Potential Prediction Profile)—a comprehensive, open-access platform for analyzing the
genomic, metabolic, and toxicological potential of bacteria, fungi, and plants in the breakdown of
high-priority pollutants.
BioRemPP processes the input data into seven modular result sections, providing a robust analytical environment that supports in-depth biotechnological assessment. The platform generates
merged databases enriched with annotations from BioRemPP itself, as well as from HADEG
and ToxCSM, offering detailed information into the genetic and enzymatic arsenal of pollutant
degradation. Users can explore the data through interactive visualizations, which include more
than 20 types of charts, such as heatmaps, dendrograms, ranking plots, and pathway enrichment
graphs, facilitating multidimensional exploration of complex biological relationships. Additionally, BioRemPP delivers enzyme activity profiles, maps compound-sample interaction networks,
and performs KO clustering to highlight patterns of functional similarity across samples. Importantly, the platform enables cross-species comparisons, allowing users to rank organisms by their
predicted bioremediation relevance and to suggest synergistic microbial or plant combinations
based on shared KEGG Orthologs and complementary pathway coverage.
Developed in Python using the Dash framework, BioRemPP integrates curated datasets from
KEGG, PubChem, ChEBI, HADEG, and ToxCSM, alongside pollutant classifications from global
regulatory frameworks such as ATSDR, EPA, CONAMA, IARC, WFD, PSL1/2, and the European
Parliament. User input consists of simple .txt Fasta-Like file with sample IDs and associated
KEGG Orthologs (KOs), allowing intuitive use even for non-specialist users.
The platform employs Python libraries including Pandas, NumPy, scikit-learn, Matplotlib,
and Plotly to enable dynamic data manipulation, pattern recognition, and rich visual exploration.
BioRemPP currently maps 324 priority environmental pollutants to 986 genes, linking them through
enzymes, metabolic degradation pathways, and toxicity prediction profiles.
Preliminary analyses using publicly available metagenomic datasets and genome annotations
revealed clear patterns of functional enrichment in microbial communities from polluted sites.
These include a higher prevalence of hydrocarbon degradation pathways and resistance-associated
gene clusters, demonstrating the tool’s capacity to generate biologically and environmentally meaningful comparissons and screenings.
BioRemPP supports applications in microbial screening, environmental diagnostics, and the
design of microbial consortia for pollution remediation. It aligns with key UN Sustainable Development Goals, particularly SDG 6 (Clean Water and Sanitation), SDG 14 (Life Below Water), and
SDG 15 (Life on Land), by facilitating evidence-based decisions in bioremediation and promoting
ecosystem recovery.
In conclusion, BioRemPP stands out as a scalable and scientifically robust platform, bridging
the gap between complex genomic data and real world environmental applications in the field of
biotechnology and ecological restoration.
Palavras-chave: Bioremediation; Genomic Analysis; Environmental Pollutants; KEGG; Microbiology;
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1120408

Comparative analysis of time series modeling in the prediction of mortality due to Acute Myocardial Infarction in Brazil (2010-2022)

Autores: Elisangela Ap. da Silva Lizzi,Vitor Hugo Amadeu Da Silva
Apresentador: Elisangela Ap. da Silva Lizzi • elisangelalizzi@gmail.com
Resumo:
Acute Myocardial Infarction (AMI) is the leading cause of death in Brazil. It is estimated that there are 300,000 to 400,000 cases of heart attack each year, and that one death occurs in every 5 to 7 cases. AMI results from cellular necrosis in the heart muscle due to the sudden obstruction of blood flow by clots, and can affect different regions of the heart depending on the location of the occlusion.

The goal of this study is to compare the predictive performance of three time series modeling approaches: Holt-Winters exponential smoothing, SARIMA, and LSTM neural networks, in estimating mortality from AMI in Brazil, aiming to identify the most appropriate method to support public health management and health surveillance.

The methods are based on stages by CRISP-DM(CRoss Industry Standard Process for Data Mining), in the acquisition of data through the open access repository of the Mortality Information System (SIM) of the Ministry of Health, data were collected between the years 2010 to 2022. In the pre-processing stage, structured organization was carried out, with grouping into months and calculation of standardized rates: number of deaths per 100,000 inhabitants, considering all brazilian territory. Time series models were adjusted and three approaches were considered: Holt-Winters smoothing, SARIMA models identified via autocorrelation functions (ACF/PACF) and LSTM-type neural networks with Adam optimizer, all procedures were implemented in Python program, in addition to the diagnosis of the models by adjustment metrics and white noise analysis, the out-of-sample adjustment was verified by accuracy criteria and calculation of the mean absolute percentage error (MAPE).

The results show that the smoothing based on Holt-Winters exponential (MAPE=8.5%), considering trend and seasonality components, proved to be practical and intuitive, being an approximation of temporal dynamics. The SARIMA(1,1,1)(1,1,1)12 model demonstrated a good fit to the historical series (MAPE=5.7%), being especially effective in capturing trends and seasonality, as long as the parameters are correctly identified through the autocorrelation (ACF) and partial autocorrelation (PACF) functions. It is a dynamic model that captured the oscillation in the short and medium term with good profiling to the data. The LSTM (Long Short-Term Memory) neural networks, implemented with the Adam optimizer and four optimized layers, achieved a competitive predictive performance (MAPE=6.0%), surpassing the Holt-Winters model and approaching SARIMA. This architecture has demonstrated the ability to model non-linear relationships and long-term temporal dependencies in the series, benefits absent in traditional methods. However, its 'black box' nature limits the interpretability of the results, which is a critical disadvantage in public health applications, where model transparency is essential for decision-making.

It is concluded that the SARIMA(1,1,1)(1,1,1)12 model stands out for its robustness, interpretability of parameters within the health context and better out-of-sample prediction. The Holt-Winters method is simple and fast, but it was shown to be sensitive to abrupt oscillations, while LSTM offers good prediction in series with complex patterns, but without interpretability of parameters. In terms of health surveillance, these methods support decision-making at a decentralized management and governance levels.
Palavras-chave: Acute Myocardial Infarction; Time series analysis; SARIMA; LSTM, public health.
★ Running for the Qiagen Digital Insights Excellence Awards
#1120429

LORFISHTEDB: A comprehensive database of transposable elements in fish genomes spanning 259 species

Autores: Lorena Maria Rudnik,Liliane Santana,Simon Orozco Arias,Diego Araujo Souza,Anderson Fernandes,Roberto Ferreira Artoni,laurival antonio vilas Boas,Alexandre Paschoal
Apresentador: Lorena Maria Rudnik • lorenarudnik43@gmail.com
Resumo:
Transposable elements (TEs) are dynamic components of genomes, influencing genomic architecture, evolution, and genetic diversity. Despite their pivotal role, the characterization of TEs in fish remains limited, with existing databases encompassing only a small number of species. To address this gap, we present LORFISHTEDB, a comprehensive database designed to advance the annotation of TEs in fish by integrating genomic data from 259 species, significantly surpassing the coverage of FishtEDB (63 species). Our methodology employs a robust bioinformatics pipeline combining complementary tools for de novo detection, curation, and classification of TEs. Key steps include: (1) de novo TE detection using RepeatModeler2 and EDTA (Extensive de novo TE Annotator); (2) redundancy reduction via CD-HIT and Bedtools to optimize database efficiency; (3) categorization of previously unclassified TEs using TEsorter, based on conserved protein domains; and (4) genome-wide TE mapping and localization via RepeatMasker. In model species such as Danio rerio (zebrafish), our pipeline refined existing annotations, identifying 15% more TEs and resolving ambiguous classifications. For the Amazonian species Colossoma macropomum (tambaqui), we improved upon prior annotations from Hilsdorf et al. By integrating advanced computational approaches with updated genomic data, LORFISHTEDB establishes a new benchmark for TE studies in fish. This resource serves as a centralized platform for investigating TE diversity, with implications for: understanding TE roles in speciation, adaptation, and genomic plasticity; identifying genomic stress markers in threatened species; and applications in genetic editing and aquaculture. LORFISHTEDB not only accelerates comparative genomics research but also strengthens bioinformatics infrastructure for the conservation and sustainable management of fisheries.
Palavras-chave: Transposable elements, databases, fish, bioinformatics, genomic, evolution.
#1120703

In silico modeling of miRNA-mediated control of cell fate in Ewing sarcoma cells

Autores: Pedro Ravalha Lorenzoni,Marialva Sinigaglia,Daner Acunha Silveira,José Carlos Merino Mombach,Shantanu Gupta
Apresentador: Pedro Ravalha Lorenzoni • pedro.lorenzoni@acad.ufsm.br
Resumo:
This study investigates the intricate molecular mechanisms underlying the pathogenesis of Ewing's sarcoma, a highly aggressive malignant tumor predominantly affecting the pediatric and young adult population. The primary etiology of this neoplasm is intrinsically linked to a chromosomal translocation resulting in the formation of the chimeric oncogene EWS-FLI1, derived from the fusion of the EWS and FLI1 genes. This aberrant protein promotes uncontrolled cell proliferation, a fundamental characteristic of the disease. Furthermore, the study highlights the crucial role of a specific set of microRNAs – miR-34a, miR-16, miR-29, and miR-145 – in modulating tumor aggressiveness. In particular, microRNAs miR-16 and miR-29 exert a negative regulation on the expression of the phosphatase WIP1, which plays an inhibitory role in the activation of the tumor suppressor protein p53. The consequent suppression of p53 function compromises cellular mechanisms of DNA damage response, namely senescence and apoptosis, conferring upon neoplastic cells the ability to evade intrinsic cellular control mechanisms. MicroRNA miR-34a, frequently downregulated in Ewing's sarcoma cells, also contributes to the inhibition of senescence and apoptosis, processes mediated by regulators such as the Myc oncogene, thereby favoring deregulated proliferative progression. Moreover, miR-145 establishes a positive feedback loop with the EWS-FLI1 oncogene, amplifying oncogenic signaling and consequently exacerbating tumor aggressiveness.In this context, the present investigation aims to elucidate the synergistic interaction of microRNAs miR-34a, miR-16, miR-29, and miR-145, in conjunction with the EWS-FLI1 protein, in the dysfunction of the G1/S cell cycle checkpoint and the dysregulation of the DNA damage signaling pathway, culminating in the abnormal cell proliferation observed in Ewing's sarcoma. To achieve this objective, a Boolean genetic interaction network was developed, encompassing the main molecular entities involved, using logical rules derived from the relevant scientific literature. Logical modeling was implemented to describe the complex interactions between these molecules, allowing the identification of the fixed points of the dynamics using the GINsim software. These fixed points represent the stable states at the end of the cell cycle checkpoint, corresponding to the cellular fates of senescence, apoptosis, or proliferation. The temporal evolution of the model was simulated until convergence to these fixed points, employing a logical-stochastic dynamics implemented in the Maboos tool. Additionally, the stochastic modeling of biochemical reactions, through the kinetic Monte Carlo methodology, provided a detailed analysis of the molecular dynamics underlying Ewing's sarcoma. The integrated methodological approach adopted in this study allowed for an in-depth analysis of the intricate molecular interactions in the context of the G1/S checkpoint and their implications for the behavior of tumor cells. Thus, the present investigation significantly contributes to the expansion of knowledge regarding the molecular mechanisms that govern the aggressiveness of Ewing's sarcoma, opening new perspectives for the development of more effective therapeutic strategies.
Palavras-chave: Ewing Sarcoma, Checkpoint, microRNAs, Genetic Network.
#1122086

Bioinformatics for All: A Social Media-Driven Approach to Democratize STEM Education

Autores: Bruna Espiño dos Santos,SANDY INGRID AGUIAR ALVES,Anna SOPHIA MELO DE OMENA,Amanda Porto,Veridiana Piva Richter
Apresentador: Bruna Espiño dos Santos • brunaespino@gmail.com
Resumo:
Bioinformatics is an essential and rapidly growing field in scientific research, as it enables the computational analysis of large-scale biological data. However, due to its technical complexity and the limited availability of resources in Portuguese, it remains difficult to understand for non-specialists, often becoming a niche area restricted to those with specific knowledge — a “bubble” that limits broader interdisciplinary collaboration and public engagement. To address this challenge, we developed an innovative methodology for scientific communication: a social media-based outreach initiative aimed at demystifying complex bioinformatics concepts, breaking out of the bubble. To evaluate the project’s impact, data collection included metrics such as follower growth and engagement rates, which were tracked via Instagram Insights to assess learning outcomes and audience reach. The project is being run through the Instagram platform, with content created using Canva. The posts are classified into three categories, each designed with language and content tailored to specific audience groups: (A) individuals of all ages and educational backgrounds, (B) students with at least a high school education, and (C) technical content for students with specific knowledge in bioinformatics. In general our initiative aims to (i) develop concise, visually engaging content to explain bioinformatics concepts in Portuguese; (ii) explore common daily topics where bioinformatics plays a role, enabling audiences to apply these concepts to real-world contexts and (iii) promote the visibility of Brazilian research. Over a 18-month period, the project produced more than 65 posts across Instagram feeds, stories, and reels, including 18, 15, and 17 posts from Groups A, B, and C, respectively. These covered over 15 distinct bioinformatics-related topics. Quantitative analysis revealed that posts from Group C — characterized by a more informative tone and technical content — achieved higher engagement levels compared to Groups A and B. Additionally, Instagram Insights revealed the project’s substantial impact: it reached over 20,646 accounts in February-March 2025, with posts saved over 500 times in february, indicating a high perceived value of the content and its utility as a reference for learning. Those insights showed that the followers adopted the content as a teaching aid, with many interactions through direct messages (DMs) and comments praising the accessibility and relevance of the published material. The project successfully demonstrated that accessible and engaging science communication can expand the reach of bioinformatics beyond academic circles. By tailoring content and leveraging social media, it promoted public understanding, increased engagement, and highlighted the importance of Brazilian research in the field. Future plans include scaling the model to other STEM fields, expanding outreach through events, and developing free online resources to further bridge the gap in science communication and education.
Palavras-chave: Bioinformatics, Science communication, Social media
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1122163

CONSERVATION OF intraRNAs IN PROKARYOTIC TRANSCRIPTOMES AND TRANSLATION OF NOVEL PROTEIN ISOFORMS

Autores: Nathalia Dos Santos Oliveira,Robson Francisco de Souza,Tie Koide
Apresentador: Nathalia Dos Santos Oliveira • nathalia-dso@usp.br
Resumo:
Mapping transcription start sites (TSS) of genes provides a more detailed understanding of gene regulation, allowing the location of promoter regions and the identification of transcript isoforms. The dRNAseq (differential RNA sequencing) technique has been used in prokaryotes for global TSS mapping, revealing a variety of transcript isoforms. One underexplored class of transcripts is the intraRNA: RNAs with TSS located within protein-coding genes (internal TSS or iTSS) and on the same strand, which can be translated into protein isoforms. The translation of intraRNAs increases the repertoire of compact prokaryotic genomes, highlighting the modularity of protein domains. Interesting cases of overlapping internal RNAs in bacteria and eukaryotes have been reported, however, no systematic study has been conducted to understand the prevalence and conservation of intraRNAs and the variety of proteins produced across different organisms. Given the increased availability of dRNAseq data, we propose to investigate the prevalence and conservation of intraRNAs in prokaryotes and their impact on the coding potential of the genome. To do that, first we evaluated two different methods for predicting TSSs: a machine-learning based method that requires a trainining set (ANNOgesic) and a statistical method that takes into account the dRNAseq library distribution (TSSAR). Using Halobacterium salinarum dRNAseq data as the testing set, we set TSSAR as the preferred method for TSS mapping since it does not require manual curation and provides better localization of TSS. Database and literature searches showed the availability of dRNA-seq datasets for 78 prokaryotic species. Using TSSAR, we have currently mapped and classified TSS for all available archaea and detected at least 26778 iTSSs Given these findings, this project aims to contribute to the understanding of overlapping transcripts and production of new proteins in prokaryotes.
Palavras-chave: intraRNA; Transcription Start Site; dRNAseq; protein isoforms.
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1122412

Genome-scale metabolic model reconstruction of the stage-specific T. cruzi Epimastigote

Autores: Bruno Ribeiro Pinto,Mayke Bezerra Alencar,Gabriela Torres Montanaro,Ariel Mariano Silber
Apresentador: Bruno Ribeiro Pinto • brunorbqi@gmail.com
Resumo:
Chagas disease, caused by infection with the protozoan Trypanosoma cruzi, is a tropical illness prevalent in the Americas. Humans can contract the disease through contact between open wounds and infected feces of invertebrate vectors. To survive within its hosts, T. cruzi relies on a plethora of transporters, enzymes, and unique metabolic pathways to meet its energetic and biosynthetic demands. This parasite can utilize multiple substrates, including glucose, amino acids, and fatty acids, metabolizing them to produce ATP and several metabolic precursors. Understanding the systematic interactions within T. cruzi's metabolic network is essential for elucidating phenotypic adaptations and metabolic reprogramming. Such insights are useful for advancing knowledge of its metabolism and biomass production under dynamic environmental conditions. Genome-scale metabolic model (GEMs) reconstruction is a systems biology approach that allows the mathematical modelling of an organism's metabolism based on omics and biochemical data. These networks comprise metabolites, genes, enzymatic reactions encoded in the genome, and non-enzymatic reactions represented as a matrix of linear equations to calculate metabolic fluxes. Flux balance analysis (FBA) can then be applied for in silico phenotype prediction. The reconstruction of the stage-specific Epimastigote of T. cruzi, based on genomic and transcriptomic data, consists of 646 genes, 1106 reactions, and 1024 metabolites. Previously a GEM of T. cruzi was generated and manually curated with information from literature and multiple databases, including KEGG, TriTrypDB, BRENDA, BiGG, and MetaCyc. An objective function was formulated to specifically simulate T. cruzi biomass generation and maintenance. Transcriptomics data was integrated to the model using the GiMME algorithm for the stage-specific reconstruction. Model was validated based on the succinate, acetate, and CO2 excretion, as described in previous literature experiments and for amino acids and purine auxotrophy using FBA. Single and double reaction essentiality analysis was conducted to evaluate in silico essential metabolic pathways. Reactions that may be explored as potential drug targets. The next step involves integrating transcriptomics data to reconstruct stage-specific GEMs for amastigote, trypomastigote followed by an in silico metabolic flux comparison between models and generation of in silico tools for the integration of omics/biochemical data. All simulations were carried out using the COBRApy and the gurobi solver.
Palavras-chave: Genome-scale Metabolic Models, Flux-balance Analysis, Chagas Disease, Metabolism
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1122543

A Pan-genome of the Saccharum complex

Autores: Gustavo Carvalho do Nascimento,Diego Mauricio Riaño Pachón
Apresentador: Gustavo Carvalho do Nascimento • Gustavocarvalhofortes@gmail.com
Resumo:
Sugarcane (Saccharum spp.) is a grass that has undergone extensive anthropogenic evolution, transforming from a wild species into a vital food and energy crop. It is currently the most harvested crop globally, surpassing even grains in production. Its evolutionary history is complex, marked by multiple instances of interspecific hybridization among species within the so-called “Saccharum complex.”
The Saccharum complex is a hypothetical group of species believed to have contributed to the origin of modern sugarcane. Originally, this group included the genera Saccharum, Erianthus (specifically, sect. Ripidium), Sclerostachya, Narenga, and Miscanthus (sect. Diantra). These species are characterized by large genomes and significant chromosomal alterations. Within this complex, chromosome numbers vary considerably—for example, some Miscanthus species are diploid while others are tetraploid. Moreover, hybrid sugarcane genomes are exceptionally complex due to interspecific hybridization and subsequent backcrossing. Consequently, sugarcane chromosomes often exhibit high heterozygosity, translocations, aneuploidy, or may even be identical by descent, reflecting their autoallopolyploid nature.
Breeding programs have traditionally focused on utilizing Saccharum officinarum and Saccharum spontaneum to improve sugarcane hybrids through a process known as nobilization. However, other species within the complex may also provide valuable genetic resources for sugarcane improvement. Recent advances in genomics have made it possible to obtain genome assemblies for these complex organisms.
A good way to explore these resources is through a pangenome approach, which can reveal the genomic variation within a group of organisms—at either the species or family level. We chose to explore some of the genomes available in public databases from the Saccharum complex to assemble a pangenome graph of this group.
Using a total of seven genomes—R570, KK3, AP85-441 (S. spontaneum), Np-X (S. spontaneum), LA-Purple (S. officinarum), Miscanthus lutarioriparius, and Erianthus fulvus—we assembled a pangenome using the Cactus-Minigraph pipeline v2.6.13. This pipeline employs Minigraph for the initial graph construction, Giraffe for read mapping, and Cactus to integrate structural variation into the final graph.
The final graph file, after filtering to remove low-coverage bubbles, was 8.7 Gb in size. It contained approximately 75 million nodes, 100 million edges, and 1.7 billion base pairs.
To analyze the pangenome, we developed Python scripts (available at https://github.com/labbces/sugarcanePanGenome/) to read the GFA file and extract the information necessary to identify genomic segment categories: core (present in all genomes), accessory (present in some genomes), and exclusive (present in only one genome).
Palavras-chave: Pan-genome, Genomic, Graph, Sugarcane
#1122557

Biomarker Identification for Childhood Tuberculosis: An Integrative Approach exploring the Transcriptome and Metabolome

Autores: Eduardo Fukutani Rocha,Artur Queiroz,Tiago Feitosa Mota
Apresentador: Eduardo Fukutani Rocha • eduardofukutani@gmail.com
Resumo:
Tuberculosis (TB) remains one of the leading causes of infectious disease-related infant mortality worldwide. In children, diagnosis poses significant challenges due to the poor sensitivity and specificity of current methods. Recent studies have identified transcriptomic and metabolomic signatures as promising diagnostic tools, although these approaches require further validation for routine use. This study aimed to identify a biomarker signature capable of classifying childhood TB through the integration of transcriptomic and metabolomic data. Blood and plasma samples were collected from children up to 15 years old in Pune, Maharashtra, India. A total of 40 samples with both data types were included in this study, being 16 from children with TB and 24 from healthy controls. Transcriptomic data were obtained via RNA-seq, and metabolites were measured using CL/EMAD. Differential expression analysis using DESeq2 identified 174 differentially expressed genes (DEGs) between active TB and control groups. A random forest algorithm selected five classifier genes with strong discriminative capacity: BPI, AZU1, C1QC, AC092580.4, and MPO. ROC curves were generated in the training dataset classifying TB and control samples, displaying that the full DEG set achieved an AUC of 0.86 (95% CI: [0.75, 0.97]), while the five-gene signature reached an AUC of 0.91 (95% CI: [0.83, 0.99]). The signature was further validated in three independent public datasets: GSE39939 (91 samples), GSE39940 (159 samples), and GSE41055 (27 samples). GSE39939 and GSE39940 included samples from latent tuberculosis infection (LTBI), active TB, and active TB with HIV, and featured three genes from the signature (AZU1, BPI, and MPO). GSE41055 included samples from LTBI, active TB, and healthy individuals, and featured four genes (AZU1, BPI, C1QC, and MPO). ROC analyses in GSE39939 and GSE39940 yielded AUCs of 0.80 (95% CI: [0.69, 0.91]) and 0.85 (95% CI: [0.78, 0.92]) in distinguishing LTBI from active TB, while GSE41055 achieved an AUC of 0.70 (95% CI: [0.44, 0.96]) in distinguishing healthy from active TB samples. Integration of transcriptomic and metabolomic data was performed through correlation analysis between the five-gene signature expression levels and metabolites abundance, revealing 27 moderate-to-strong correlations in the TB group and 33 in the control group. Pathway enrichment analysis of correlated metabolites highlighted metabolic pathways consistent with childhood TB pathophysiology. These findings underscore the biological relevance of the identified markers. The consistent performance of the five-gene signature across multiple datasets in classifying active TB from LTBI and control samples, combined with its correlation to disease-relevant metabolites and enriched pathways, supports its potential as a robust diagnostic tool. This integrative approach lays the groundwork for the development of multi-omics biomarkers and future validation studies aimed at improving childhood TB diagnostics.
Palavras-chave: Childhood Tuberculosis, Bioinformatics, Biomarker, Data integration, Transcriptomics, Metabolomics
★ Running for the Qiagen Digital Insights Excellence Awards
#1122576

A novel stemness-associated six-gene signature for molecular and prognostic stratification of pancreatic ductal adenocarcinoma

Autores: Daniela Bizinelli,Eduardo Moraes Reis
Apresentador: Daniela Bizinelli • daniela.bizinelli@usp.br
Resumo:
Pancreatic ductal adenocarcinoma (PDAC), the most common subtype of pancreatic cancer, originates in the exocrine compartment and accounts for over 90% of cases. Its aggressive behavior, lack of early detection, and resistance to therapy contribute to delayed diagnoses and a five-year survival rate below 10%, making it a leading cause of cancer-related mortality worldwide. Although multi-omics approaches have advanced our understanding of PDAC biology, resistance mechanisms, particularly those driven by cancer stem cells (CSCs), remain a major therapeutic challenge. Therefore, identifying clinically translatable stemness-related biomarkers could improve patient stratification and outcomes. To address this, we applied a published algorithm to estimate the stemness index (mRNAsi) of 150 TCGA-PDAC transcriptomic samples. Based on mRNAsi, samples were stratified into high (n=36) and low (n=36) stemness groups, followed by differential expression analysis (FDR < 0.05, |FC| ≥ 2). A weighted gene co-expression network (β=12) was then constructed using WGCNA to identify gene modules associated with stemness. Differentially expressed genes (DEGs) from the module most positively correlated with mRNAsi were used for unsupervised clustering via ConsensusClusterPlus (K-means, Euclidean distance, 500 bootstraps). For prognostic modeling, we applied right censoring at five years and retained 145 samples, which were split into training (n=89) and testing (n=56) sets. From the DEGs, univariate Cox regression identified genes significantly associated with overall survival (p<0.05). These were further refined using LASSO regression (10-fold cross-validation) to reduce overfitting and select the most prognostically informative features. A risk score was calculated for each sample as a linear combination of expression values weighted by the LASSO-derived coefficients, and risk group cutoffs were defined using the survminer package. The model was trained, internally validated, and externally validated using the ICGC PACA-AU dataset (n=69). Finally, biological features associated with risk groups were explored using CIBERSORT (immune profiling), single-sample GSEA (oncogenic and therapy-related pathways), and oncoPredict (drug sensitivity based on GDSC data). Stratification by mRNAsi revealed a significant association between tumor dedifferentiation and poor overall survival (p=0.02). WGCNA identified 17 modules, with the magenta module (n=257 genes; 101 DEGs) showing the strongest positive correlation with stemness (r=0.71, p=6.48e–13). These DEGs effectively clustered PDAC patients into three subtypes with distinct prognoses, corresponding to low, intermediate, and high stemness phenotypes. The resulting six-gene signature, derived from univariate Cox and LASSO regression, consistently stratified patients into high- and low-risk groups with significant survival differences across internal cohorts (p<0.05) and showed strong predictive performance for mRNAsi classification (AUROC=0.89–0.91). In external validation, the model accurately predicted stemness-related groups (AUROC=0.95), outperforming mRNAsi-based classification in survival analysis (p=0.05 vs. p=0.79). Low-risk patients exhibited higher infiltration of resting dendritic cells, monocytes, and CD8+ T cells, alongside downregulation of pathways involved in genome integrity and proliferation. Finally, we identified 11 drugs with significantly lower predicted IC50 values in high-risk samples. In summary, we present a novel six-gene stemness-associated signature that independently and accurately predicts high-risk PDAC patients. Further validation may support its use as a cost-effective tool for clinical risk stratification and personalized therapy selection.
Palavras-chave: mRNAsi, cancer stem cells, molecular subtype, prognostic gene signature
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1122581

Disrupted genes and pathways in Schizophrenia: A robust analysis of the brain and blood

Autores: Artur Queiroz,Eduardo Fukutani Rocha,Tiago Feitosa Mota
Apresentador: Artur Queiroz • arturlopo@gmail.com
Resumo:
Schizophrenia (SCZ) is a complex and chronic neuropsychiatric disorder that remains only partially understood. It is characterized by a wide range of symptoms, including behavioral disturbances, emotional dysregulation, and cognitive impairments. The etiology of SCZ is multifactorial, involving both genetic and environmental components, yet its underlying molecular mechanisms are not fully elucidated. In this study, we sought to explore the biological basis of SCZ by analyzing publicly available gene expression datasets. Our goal was to identify key SCZ-associated genes and biological pathways that could serve as potential biomarkers or therapeutic targets. We conducted a comprehensive data collection process focused on transcriptomic datasets derived from the brain’s prefrontal cortex and various blood sources. After applying stringent inclusion and exclusion criteria, as well as performing quality control assessments, we retrieved a total of 17 datasets from six distinct tissue sources: brain biopsy (n = 6), whole blood (n = 2), peripheral blood mononuclear cells (PBMCs, n = 2), leukocytes (n = 1), lymphocytes (n = 1), and isolated neurons (n = 5). Our primary analysis used four brain-derived datasets as the discovery set to identify differentially expressed genes (DEGs) between SCZ patients and healthy controls. We identified a total of 532 DEGs, which were further analyzed through enrichment analysis to uncover biological pathways associated with SCZ. These analyses suggested strong enrichment in pathways linked to neurodegenerative processes, reinforcing the hypothesis of shared molecular mechanisms between SCZ and neurodegenerative disorders. To narrow down the most informative biomarkers for SCZ, we employed feature selection techniques, such as the random forest algorithm. This process identified three genes—HUWE1, PTGDS, and RPL31—that consistently distinguished SCZ samples from controls. We then constructed a predictive model based on this 3-gene signature and evaluated its performance across independent datasets from other tissues. The model demonstrated robust classification accuracy (>72%) in brain tissue, whole blood, PBMCs, and leukocyte samples. In conclusion, our findings highlight a promising 3-gene biomarker panel with cross-tissue predictive potential and reveal relevant biological pathways that may be central to the SCZ pathophysiology. These results not only enhance our understanding of SCZ at the molecular level but also open new avenues for developing targeted diagnostic tools and therapeutic strategies, with the potential to improve clinical outcomes and patient care.
Palavras-chave: Schizophrenia, Bioinformatics, IPD meta-analysis, Data mining, Neurodegenerative disorders.
#1122705

Uncovering therapeutic targets in cell-specific networks of Primary Sjögren’s Syndrome through multi-layer network propagation

Autores: João Vitor Ferreira Cavalcante,Rodrigo Juliani Siqueira Dalmolin,Diego Marques Coelho
Apresentador: João Vitor Ferreira Cavalcante • jvfecav@gmail.com
Resumo:
Primary Sjögren’s syndrome (pSS) is a chronic, complex and heterogeneous autoimmune disease which usually manifests as oral and ocular dryness, fatigue and joint pain caused by damage to the body’s exocrine glands. It has a significant impact on the quality of life of those affected, while also increasing the risk for associated pathologies, such as B-cell lymphoma, which is 14-15 more prevalent in pSS affected people than in the general population, and Interstitial Lung Disease, which contributes significantly to morbidity in pSS. Although pSS pathogenesis is not fully understood, trans-endothelial migration of B and CD4+ T cells to the exocrine glands leads to formation of tertiary lymphoid structures followed by ectopic germinal centers. Nonetheless, current treatment approaches for pSS seek only the alleviation of symptoms, and novel therapeutic targets are continuously being researched. In our analysis, we acquired scRNA-seq public datasets of pSS, inferred transcription factor activity and identified cell surface receptors and differentially expressed genes. Using these three layers of information as seed nodes to a network propagation algorithm, phuEGO, which utilises prior knowledge interactions from OmniPath to build its networks. We obtained cell-specific signaling networks in both pSS and healthy controls. With the cell-specific networks, we queried DrugBank and ChEMBL for drugs that target the genes present in them. In the B-cell network, targets currently under evaluation in clinical trials for pSS were identified, such as CD40 - targeted by the drug Iscalimab - thereby supporting the validity of this methodology in uncovering biologically relevant genes. Moreover, it also led to the identification of novel targets, such as IL4R, which has approved drugs that target it. Additionally, this network resource could serve to further elucidate underlying mechanisms of the pathogenesis of pSS, highlighted by the appearance of genes such as TRAF2 in the B-cell network, which has been previously associated with B cell survival in pSS.
Palavras-chave: Single-cell, Sjögren's disease, Network Propagation, Drug repurposing, Cell-cell communication, Systems Biology
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1122791

Analysis of the evolutionary impact of retrocopies of coding genes in the phylum Chordata

Autores: Lorraine Christine de Oliveira,Helena Beatriz da Conceição,Pedro A F Galante
Apresentador: Lorraine Christine de Oliveira • lorraine.christine@usp.br
Resumo:
Retrocopies, also known as processed pseudogenes, are gene duplicates generated by the reverse transcription of mRNA into cDNA, followed by genomic integration. Traditionally considered non-functional, recent studies suggest that some retrocopies play significant roles in gene regulation, evolutionary innovation, and disease association. In this study, we systematically investigate the expression and conservation patterns of approximately 72,929 retrocopies across 14 species within the phylum Chordata, using RNA-Seq datasets from six tissues (brain, heart, kidney, liver, ovary, and testis). To achieve this, we used retrocopies identified by RCPedia 2.0 database. Expression quantification was performed using the pseudo-alignment algorithm Kallisto. We conducted expression breadth calculation to distinguish tissue-specific retrocopies genes from housekeeping ones. Additionally, retrocopy signatures were identified through differential expression analysis, detecting upregulated retrocopies in specific species and tissue samples. Our preliminary results show a high number of expressed retrocopies in the brain and testis in most species. The ratio between the number of expressed retrocopies and the total number of retrocopies, along with their median expression values, revealed that testis and brain consistently show the highest proportions of expressed retrocopies. Notably, humans, chickens, and sheep showed elevated expression in testis, while rats, zebrafish, and cats stood out for brain expression. In contrast, tissues such as kidney and liver displayed lower and more uniformly distributed proportions of expressed retrocopies across species. This research provides new insights into the evolutionary and functional dynamics of retrocopies in vertebrates, emphasizing their contribution to genomic diversity. Further analyses will focus on characterizing specific retrocopies with evidence of functional activity, particularly those with conserved expression patterns across species.
Palavras-chave: Retrocopies, Evolution, Gene Regulation, RNA-Seq, Chordata, Phylogenetics
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1123032

Causeway: A pipeline for genome-wide effector gene screening with Mendelian Randomization and colocalization

Autores: Julia Apolonio de Amorim,João Vitor Ferreira Cavalcante,Diego Marques Coelho,Rodrigo Juliani Siqueira Dalmolin,Vasiliki Lagou
Apresentador: Julia Apolonio de Amorim • apoloniojulia@gmail.com
Resumo:
The integration of quantitative trait loci (QTL) and disease genome-wide association studies (GWAS) for pinpointing candidate causal genes is a computationally demanding task accompanied by pitfalls related to the methods used. To address these issues, we introduce Causeway, a novel Nextflow pipeline for performing summary statistics-based two sample Mendelian Randomization (MR) for causal gene prioritization. The pipeline executes sensitivity and colocalization analyses for interrogation of findings providing robust results. The tool is designed to run tasks in a computationally efficient way even in low-resource environments, such as a personal computer. Furthermore, it can scale to web servers and high-performance computing clusters. The source code of Causeway is available at GitHub https://github.com/juliaapolonio/Causeway, while the documentation and instructions to run the vignette at https://juliaapolonio.github.io/Causeway/.
Palavras-chave: GWAS; Mendelian Randomization; Nextflow
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1123084

Gene Expression and Molecular Pathways Associated with Therapeutic Response in High-Grade Serous Ovarian Cancer

Autores: Glenerson Baptista,Nayara Gusmão Tessarollo,Nayara Evelin de Toledo,Alessandra Serain,Gabriel Fernando Costa Da Fonseca,Vitor Barbosa Paiva,Luciana de Castro Moreeuw,Diego Gomes,Cláudia Bessa,Fabiane Macedo,Carolina Furtado Torres da Silva,Andreia Melo,João Viola,Mariana Boroni
Apresentador: Glenerson Baptista • glenersonb@gmail.com
Resumo:
In Brazil, ovarian cancer ranks eighth in incidence, with 7,310 new cases estimated annually between 2023 and 2025. High-grade serous ovarian carcinoma (HGSOC), the most common epithelial subtype, accounts for 50–60% of cases. Although most patients initially respond to treatment, more than 70% experience recurrence within five years, negatively impacting prognosis. A deeper understanding of gene expression profiles associated with different clinical outcomes is essential for the development of targeted therapies and personalized treatments strategies. With advances in next-generation sequencing technologies, RNA sequencing is a powerful method for transcriptome analysis. This study aimed to perform a differential expression analysis at both gene and isoform levels in HGSOC samples from patients enrolled at the National Cancer Institute (INCA), classified according to their response to chemotherapy. Twenty HGSOC patient samples treated with adjuvant chemotherapy regime were selected and classified into two groups: (1) unfavorable response to platinum (9 patients who resumed chemotherapy within 12 months); and (2) favorable response to platinum (11 patients who resumed chemotherapy after 12 months). Bulk RNA-seq libraries were prepared using the Illumina Total RNA Prep with Ribo-Zero Plus kits and sequenced on the Illumina NovaSeq 6000 platform. Sequencing quality was assessed using FASTQC, followed by alignment and gene expression quantification with STAR and Salmon, respectively. Differential expression analysis was performed using the DESeq2 R package, and functional enrichment analysis was conducted using the fgsea R package. Clinicopathological data were correlated with gene expression results, and overall survival was analyzed using the Survminer R package. Differential gene expression analysis revealed 113 upregulated and 52 downregulated genes. Functional enrichment analysis highlighted pathways involved in extracellular matrix remodeling, genes as COL9A3 and COL4A3, which may contribute to tumor progression and lead to a more invasive phenotype. The construction of new bulk RNA libraries from an additional 40 samples is currently underway. These findings contribute to a better understanding of the molecular mechanisms underlying therapy response in HGSOC.
Palavras-chave: Ovarian cancer, Therapy resistance, Gene expression
★ Running for the Qiagen Digital Insights Excellence Awards
#1123278

WGCNA in the Identification of Biomarkers in Glioblastoma

Autores: Marcelo Santos Lima Filho
Apresentador: Marcelo Santos Lima Filho • marceloslf@ufba.br
Resumo:
Glioblastoma multiforme (GBM) is the most common and lethal primary malignant tumor of the central nervous system in adults. It is characterized by infiltrative growth, therapeutic resistance, and high genetic and epigenetic heterogeneity, which makes the identification of consistent molecular biomarkers and therapeutic targets challenging. Despite advances in combined therapies—such as surgery, radiotherapy, and chemotherapy—patients’ median survival remains below 15 months. In this context, the search for computational strategies capable of exploring large transcriptomic datasets to discover new biomarkers associated with tumor progression, treatment response, and clinical outcomes becomes essential.
This project proposes the application of WGCNA (Weighted Gene Co-expression Network Analysis) to analyze transcriptome samples from glioblastoma patients, aiming to identify modules of co-expressed genes associated with relevant clinical features and hub genes with potential value as biomarkers or therapeutic targets. WGCNA is a correlation-based approach to gene expression that enables the construction of modular networks reflecting the functional organization of biological data.
Public datasets deposited in the GEO (Gene Expression Omnibus) from NCBI will be used, ensuring reproducibility, accessibility, and sample diversity. Analyses will be conducted using the R programming language and will include steps such as preprocessing, setting the connectivity parameter, network construction, module detection, and correlation analysis between modules and clinical traits such as survival, tumor grade, and treatment response. Subsequently, central genes (hub genes) within the most relevant modules will be identified.
The expected outcome of this analysis is the identification of gene signatures that are coordinated in glioblastoma patients, and hub genes with potential application as molecular biomarkers or targets in future therapies. Moreover, the use of public data and reproducible statistical tools supports the advancement of open and collaborative practices in biotechnological research.
Palavras-chave: Glioblastoma, WGCNA, Bioinformatics, Cancer
#1123293

Facing the Post-Pandemic Crisis: Integrating Multi-Omics Approaches to Tackle COVID-19 and Long COVID

Autores: Marcelo Santos Lima Filho
Apresentador: Marcelo Santos Lima Filho • marceloslf@ufba.br
Resumo:
The COVID-19 pandemic triggered a global health crisis whose effects extend beyond the acute phase of infection. The emergence of Long COVID and the rise of antimicrobial resistance, especially in hospital settings, constitute a new reality that demands innovative solutions. This project proposes the integrated application of multi-omics approaches — transcriptomics and proteomics — combined with bioinformatics, with the aim of elucidating the molecular mechanisms involved in COVID-19, Long COVID, and antimicrobial resistance in both pre- and post-pandemic scenarios.

The strategy involves a comparative analysis of the transcriptome of patients with COVID-19 and Long COVID using RNA-seq to identify differentially expressed genes, perform functional enrichment analysis, and construct protein-protein interaction (PPI) networks. In parallel, a bacterial resistome analysis will be conducted through gel-free proteomics, focusing on identifying resistance mechanisms in microorganisms isolated from healthcare-associated infections (HAIs).

Based on the data obtained, protein-drug interaction networks will be constructed, and molecular docking simulations will be performed to propose targeted therapeutic candidates. The project will also aim to identify potential diagnostic and prognostic biomarkers, as well as hub genes associated with clinical outcomes. The integration of these analyses will not only provide a better understanding of the biological processes involved but also support the development of more effective strategies for clinical management and antimicrobial resistance control.

In addition to scientific impact, the study includes the training of human resources in bioinformatics and molecular biology, the production of scientific publications, and the dissemination of results at national and international events. With a multidisciplinary team and collaboration among various institutions, the proposal stands out for its use of advanced methodologies and the clinical and epidemiological relevance of the topics addressed.
Palavras-chave: COVID, Transcriptome, Pandemic, Infection, Bioinformatics, RNA-seq
#1123386

The Single Cell Notebooks: A global initiative for inclusive and accessible single-cell transcriptomics data analysis education

Autores: Joyce Karoline Da Silva,Adolfo Rojas-Hidalgo,Bruno Vinagre,Sebastián Urquiza-Zurich,Diego Pérez-Stuardo,Erick Armingol,Mariana Boroni,Yesid Cuesta-Astroz,Vinicius Maracaja Coutinho
Apresentador: Joyce Karoline Da Silva • joyce.karol@hotmail.com
Resumo:
Single-cell RNA sequencing (scRNA-seq) is transforming our understanding of cellular diversity by enabling the study of gene expression at single-cell resolution. However, analyzing scRNA-seq data remains a significant challenge, particularly in regions with limited computational infrastructure and restricted access to specialized training in bioinformatics. These barriers contribute to global inequities in genomics research and education, especially in low-resource settings. To address these challenges and promote inclusive, accessible training, we developed a computational framework consisting of ten interactive notebooks implemented in Google Colab and Jupyter Notebooks. These tools are designed to run in the cloud or on local servers, eliminating the need for high-performance computing and allowing broader participation in single-cell data analysis. The notebooks provide a step-by-step learning experience, covering a complete analytical pipeline. Early modules focus on essential tasks such as quality control, normalization, dimensionality reduction, clustering, and cell type annotation. As the material progresses, more advanced topics are introduced, including differential expression analysis, integration of multiple datasets, trajectory inference, pseudotime estimation, and prediction of cell-cell communication. In addition to scRNA-seq, the framework includes modules on related omics technologies such as spatial transcriptomics, TCR-seq, CITE-seq, and ATAC-seq, offering a comprehensive view of single-cell and multi-omics data analysis. These materials are designed to be used both for self-directed learning and as educational resources for instructors, trainers, and professors. Their interactive nature enhances user engagement and allows learners to apply concepts directly through practical exercises and real data. To further expand accessibility, we are also developing a Docker-based implementation, enabling offline use and self-hosted deployment, which is particularly valuable in areas with limited internet connectivity.Originally developed in English, the notebooks are being translated into Spanish and Portuguese, aiming to serve broader audiences across Latin America, Africa, and other non-English-speaking regions of the Global South. By lowering language and infrastructure barriers, this initiative seeks to democratize access to cutting-edge bioinformatics training. Overall, this framework supports the development of local expertise, encourages the participation of underrepresented communities in genomics research, and contributes to a more equitable global scientific landscape. By combining practical, cloud-based tools with multilingual, open-access content, we aim to foster a more inclusive environment for single-cell transcriptomics education and research worldwide.
Palavras-chave: scRNA-seq, Bioinformatics training , Data analysis education, Infrastructure barrier, Inclusive transcriptomics
#1123442

Design of a multiepitope chimera protein with vaccine potential against Oropouche virus

Autores: Ana Camilena dos Santos,Rafael Trindade Maia,Tarcisio Jose Domingos Coutinho,Camila Franco Batista de Oliveira
Apresentador: Ana Camilena dos Santos • camilenasantos29@gmail.com
Resumo:
Emerging infectious diseases are those that have the potential to cause outbreaks in the population and their spread is closely related to human activities. Currently, the Oropouche virus has attracted worldwide attention, where it was included in 2024 on the List of Emerging Pathogens with Future Pandemic Potential by the WHO, where it causes a febrile disease that is confused with dengue and chikungunya. In Brazil, in 2024, it was the cause of an outbreak that spread throughout the country, with 13,783 cases and in 2025, to date, 7,756 cases (04/07/2025). Therefore, the objective of this work is to propose a multiepitope chimera protein with vaccine potential against the Oropouche virus. For this, 70 complete Gc protein sequences of the viral envelope were obtained from NCBI, including the reference one (NC_005775.1). Afterwards, using the reference sequence, predictions of B and T cell epitopes (MHC-I and II) were performed using BepiPred - 2.0, NetMHCpan - 4.1 and NetMHCIIpan - 4.3, respectively. The MHC-I and II epitopes that had 100% correspondence with those of B cells were selected. Afterwards, predictions of antigenicity, allergenicity, immunogenicity and toxicity were performed using VaxiJen v2.0, AllerTop 2.0, Class I Immunogenicity and ToxinPred, respectively. For the analysis of epitope conservation, all 70 Gc sequences were used, using Epitope Consevancy Analysis. Non-antigenic, allergenic, low immunogenicity, toxic epitopes and those with conservation between sequences below 90% identity were excluded. Finally, 6 MHC-I and 8 MHC-II epitopes were selected for the construction of the multiepitope chimera and a GGSGG linker was added among them, a Methionine was included at the N-terminal position and 6 histidines at the C-terminal position. The physicochemical parameters of this chimera were calculated (ProtParam and Protein-Sol), three-dimensional modeling (Robetta), refinement (Galaxy web) and validation of the 3D structure (PROCHECK and ProsaWeb). Finally, a docking of the chimera with the T cell receptor TRL8 (PDB: 7R54) was performed using ClusPro. The results showed that the proposed multiepitope chimera has good physicochemical properties, being water-soluble, antigenic, non-allergenic and non-toxic. Structural validation revealed that most of the chimera residues are in favorable regions, as per the Ramachandran plot, and ProSA-web the Z-score indicated that the structure is within acceptable parameters for native proteins. The chimera was also able to bind to the T-cell receptor, thus showing that the multiepitope protein can stimulate the immune system, being a potential vaccine against the Oropouche virus.
Palavras-chave: Oropouche virus, epitope, multiepitope protein, vaccine chimera.
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1123506

Against the Tide of Resistance: Promising Peptides Against Candida glabrata Efflux Pumps

Autores: Lua Silva,Francisco Eilton Sousa Lopes,Maria Laína Silva,Matheus Henrique De Sousa Mariano,Rossana de Aguiar Cordeiro
Apresentador: Lua Silva • luasilvacontato@gmail.com
Resumo:
Candida glabrata is a yeast-like fungus that is part of the normal human microbiota, often found in mucosal surfaces. In 2022, the World Health Organization classified it as a "high priority pathogen," mainly due to the increasing number of clinical isolates. This is concerning because it affects immunocompromised patients, leading to higher mortality rates. Furthermore, strains resistant to echinocandins and azoles have been reported. This resistance and increased virulence are primarily due to the overexpression of ATP-binding cassette (ABC) transporters—members of the multidrug transporter family located in the membrane, which act as efflux pumps. Therefore, it is necessary to develop new compounds capable of interfering with this mechanism.The aim of this study was to design peptides capable of targeting efflux pumps in C. glabrata. To this end, a database was created containing osmotin sequences with antimicrobial activity obtained from UniProt and the National Center for Biotechnology Information, from which peptides with a length of 12 amino acid residues (AA) were extracted. Following analysis using the CAMPR3, APD3, PDBsum and ToxinPred software, and based on parameters established in the literature, peptides were selected. These peptides were modeled using PEP-FOLD, refined with GalaxyWeb, and their three-dimensional structures were validated using MolProbity. Finally, for those peptides with the best results, molecular docking was performed with the efflux pump structure (CDR1) using ClusPro. Based on visualization with PyMOL, the Osmpep2 peptide interacted with the pump at AA residues ILE164, PRO184, GLY185, THR190, GLU793, ILE828, and ASP830, most of which are located in the ABC transporter domain 1. Thus, Osmpep2 shows potential to act on the azole resistance mechanism.
Palavras-chave: Bioinformatics,Antimicrobial,Molecular Modeling
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1123569

FROM PATHWAY ANALYSIS TO MOLECULAR DOCKING: IDENTIFYING NEUROINFLAMMATORY TARGETS FOR ISATIN DERIVATIVES IN EPILEPSY

Autores: Antonio Cardoso da Silva Neto,Raynan Veras de Freitas,Jessika de oliveira viana,Cláudio Gabriel Lima Júnior,Karen Cacilda Weber
Apresentador: Antonio Cardoso da Silva Neto • acsn2@academico.ufpb.br
Resumo:
Inflammation is a physiological process characterized by the release of pro-inflammatory mediators in response to tissue injury, infections, or harmful stimuli, aiming to restore bodily homeostasis. Although it serves a protective role, an exacerbated inflammatory response can lead to tissue damage and physiological dysfunction, particularly when endogenous anti-inflammatory molecule synthesis and secretion are insufficient. In the central nervous system microenvironment, this imbalance promotes a sustained inflammatory cascade, directly affecting neuronal hyperexcitability and ultimately contributing to the establishment of epileptic neural circuits. While many patients achieve adequate seizure control through antiepileptic drugs, approximately 30% develop pharmacoresistant epilepsy, as current therapies primarily target symptom management rather than addressing the underlying cause of epilepsy. This limitation worsens long-term clinical outcomes and significantly impairs patients' quality of life.

This study applies a bioinformatics-guided approach to identify inflammation-related molecular targets with high binding potential to isatin derivatives previously shown to have affinity for epilepsy-related receptors. The ultimate goal is to prioritize targets that enable dual modulation of epileptic and inflammatory processes through a single chemical scaffold.

To achieve this, a systematic literature review was conducted as a first step to identify key neuroinflammatory biomarkers, followed by biochemical pathway analysis using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database, integrating omics data with metabolic pathway maps. Among the identified targets, Caspase-1 and NLRP3 (NOD-like receptor family pyrin domain containing 3), core components of the inflammasome, were selected due to their role in initiating the inflammatory cascade and activating pro-inflammatory cytokines.
Additionally, classical inflammatory targets, including Cyclooxygenase-2 (COX-2) and tumor necrosis factor-alpha (TNF-α), were included. To further refine the analysis, transforming growth factor-beta type I and II receptors (TGF-β RI and RII) were also considered, as they are implicated in increased blood-brain barrier permeability, allowing systemic pro-inflammatory biomolecules to influence the neural microenvironment and trigger epilepsy.

To evaluate the interaction of these targets with candidate compounds, we employed structure-based virtual screening using molecular docking using the GOLD software. Docking protocols were validated by redocking co-crystallized ligands for each target protein. Three isatin derivatives, pre-selected from a previous in silico screening against epilepsy-relevant receptors, were docked against the selected inflammation-related proteins. Among these, COX-2 exhibited the highest binding affinity values compared to other proteins. When benchmarked against celecoxib, the compounds demonstrated comparable affinity, suggesting occupation of the hydrophobic pocket and competitive inhibition of arachidonic acid binding to COX-2, thereby suppressing its pro-inflammatory activity.

Our results highlight how integrative bioinformatics workflows, combining target mining, pathway analysis, and structural modeling, can be useful in the identification of multi-target drug candidates for complex neurological conditions. These findings will be further validated through in vitro experimental assays to assess their anti-inflammatory potential, aiming to advance toward more effective treatments for pharmacoresistant epilepsy.
Palavras-chave: Neuroinflamação, Molecular Docking, Epilepsia, Triagem Virtual, Biomarcadores inflamatórios
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1123897

Investigation of the Effects of Aging on the Dynamics of Macrophage Populations in the Course of Bone Regeneration

Autores: Bianca Braga Frade,Clara Forrer Charlier,Nayara Gusmão Tessarollo,Caio Sorrentino dos Santos,Gabriela Rapozo Guimarães,Mariana Boroni,Danielle Cabral Bonfim
Apresentador: Bianca Braga Frade • bibs.braga2@gmail.com
Resumo:
Bones have a significant regenerative capacity, completely reconstituting their shape and function after a fracture. However, fractures in the elderly have increased rates of failure, due to physiological decay in bone regeneration capacity. Given the ongoing expansion of the elderly population, these fractures have become a clinical challenge, since the available treatments are not optimized to overcome the regenerative deficiencies of this population, which then suffer with multiple invasive surgeries and long periods of physical disability. In this context, a better understanding of how ageing impacts the mechanisms of bone regeneration is needed. Evidence from several in vitro and in vivo studies indicated that macrophage (Mac) activity is key for successful bone regeneration. These cells concomitantly modulate the inflammatory response and instruct bone neoformation and revascularization. However, Mac are a group of highly heterogeneous cells, displaying different phenotypes and functions, and it is still unknown how such a diverse cell population orchestrates the distinct phases of bone healing and how they are impacted by ageing. In this study, we aimed to map the profile of Mac present during the inflammatory and tissue neoformation phases of fracture healing by analyzing two publicly available single-nuclei transcriptomic data from murine fracture callus (GSE268276 and GSE234451; Hachemi et al., 2024). These data were downloaded using GEOquery, and quality control included mitochondrial and ribosomal content thresholds of 2.5% and 1.5%, respectively. Cells with outlier values for nCounts and nFeatures were removed. Integration was performed using Harmony v. 1.2.3, and downstream analysis was carried out in Seurat v5. With this preliminary data, we observed 9778 high-quality cells, including 1,632 immune cells, of which 300 of them were classified as Mac. Those cells have an enriched inflammatory gene program, representing a more proinflammatory population. However, the elucidation of the presence and function of different subpopulations in the early stages of bone regeneration remains unclear. To increase our capacity to resolve and map the Mac clusters in detail, we established a murine femur fracture model in young (10-12 weeks) and middle-aged Balb/c mice (48-50 weeks). Micro-CT and histological analysis showed that aged mice had delayed cartilage and bone formation in comparison with young animals. Through flow cytometry, we observed that the frequency of F4/80+ Mac increases as healing progresses, forming a more distinct population at the final time points, expressing CD206 and Mac2, both in young and aged animals. Nuclei of cells present within the fracture calluses were isolated and sequencing libraries are being performed with the 10x Genomics technology. With these findings, we intend to contribute to Regenerative Medicine through innovative treatments.
Palavras-chave: Bone regeneration, macrophages, aging, scRNAseq
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1123934

Genomic Basis of Antibiotic Resistance and Virulence in Acinetobacter baumannii: A Comprehensive Pangenome Analysis

Autores: Joyce de Souza,Helena Regina Salomé D'Espindula,Hellen Geremias dos Santos,Helisson Faoro
Apresentador: Joyce de Souza • JOYCEDESOUZAA@OUTLOOK.COM
Resumo:
Acinetobacter baumannii is an opportunistic pathogen that poses a significant threat to human health, causing infections in both community and nosocomial settings. The frequently observed multidrug-resistant (MDR) profile in this species highlights the urgent need for research and the development of novel treatment strategies. Aiming to better understand the mechanisms underlying antibiotic resistance and virulence in A. baumannii, we conducted a comprehensive pangenome analysis of clinical isolates by retrieving all publicly available genomes of A. baumannii isolated from the human host from the BV-BRC and NCBI Pathogens Detection databases. After eliminating redundancies, filters for assembly quality and metadata availability were applied. Genome integrity (>98%), contamination (<5%), and ambiguous bases (<1%) were assessed using CheckM v1.2.3. Species confirmation was achieved by calculating the average nucleotide identity (ANI) between the genomes and the A. baumannii type strain ATCC19606 using fastANI v1.33. All genomes were annotated with Prokka v1.14.6, and multilocus sequence type was determined based on the PubMLST Pasteur scheme. The pangenome was constructed using Roary v3.13.0. Resistance and virulence genes were identified with RGI v6.0.3 and VFDB v4.0.0. Based on SNPs from the core genome, a phylogenetic analysis was constructed using IQ-Tree2. We analyzed 10,754 clinical genomes of A. baumannii. The most prevalent sequence type was ST2 at 71.6%, followed by ST499 at 3%, and ST1 at 2.9%. On average, these isolates have 3,766 genes each, and the pangenome is composed of 30,255 genes. The core genome (consisting of genes shared by at least 95% of the isolates) comprises 2,666 genes. The resistome and virulome of A. baumannii represent 0.4% and 0.8% of the pangenome, with an average of 40 and 129.3 genes per genome, respectively. The identified resistome is distributed across resistance mechanisms as follows: antibiotic efflux (53.7%), antibiotic inactivation (32.9%), antibiotic target alteration (4.8%), antibiotic target replacement (3.7%), reduced permeability to antibiotics (2.8%), and antibiotic target protection (2.2%). In addition to the OXA-51-like variants, which are intrinsic to the species, the following carbapenemases were identified: OXA-23-like in 67.1% of the genomes, OXA-24-like in 16.5%, OXA-134-like in 2.9%, OXA-58-like in 1.2%, OXA-143-like in 0.4%, OXA-48-like in 0.1%, NDM in 6.1%, GES in 0.7%, VIM in 0.09%, KPC in 0.07%, and IMP in 0.03% of the genomes. These enzymes are of great concern in the context of A. baumannii resistance, as they hydrolyze carbapenem antibiotics, which are among the last-resort treatments for hospital-acquired infections. Virulence genes are classified into adherence (21.8%), nutritional/metabolic factors (21.4%), immune modulation (21%), effector delivery systems (20%), biofilm formation (11.8%), exotoxins (2.3%), and others (1.7%). This study based on 10,754 genomes showed that A. baumannii has an extensive set of virulence and antibiotic resistance genes. These findings are key to understanding the genetic basis of antimicrobial resistance in this critical pathogen, highlighting possible treatment targets and the urgent need for surveillance and research strategies to combat A. baumannii infections.
Palavras-chave: Acinetobacter baumannii, Pangenome, Resistome, Virulome
★ Running for the Qiagen Digital Insights Excellence Awards
#1123991

Decoding the ECM-Anoikis-PI3K/Akt crosstalk: A Novel Gene Signature for Prognostic Stratification in Breast Cancer

Autores: Diego Mauro Carneiro Pereira,Bianca Zaia Franco Ferreira,Maria Aparecida Pinhal,Helena Bonciani Nader,Giuseppe Leite,Carla Cristina Lopes
Apresentador: Diego Mauro Carneiro Pereira • diegom8135@gmail.com
Resumo:
Breast Cancer (BC) remains the most commonly diagnosed cancer worldwide and the leading cause of cancer-related deaths among women, with 2.3 million new cases and 670,000 deaths reported in 2022. Current treatment strategies rely heavily on molecular classification based on HER2, estrogen receptor , and progesterone receptor expression. However, identifying robust tumor markers to improve patient stratification and management—particularly for triple-negative cases—remains a significant challenge. Tumor progression in BC is driven by the ability of cells to remodel the microenvironment and evade anoikis, a form of apoptosis triggered by loss of adhesion to the extracellular matrix (ECM). Resistance to anoikis, often mediated by the PI3K/Akt pathway, promotes tumor cell survival and dissemination, driving cancer aggressiveness on BC. Therefore, this study aimed to identify a gene signature associated with ECM, anoikis, and the PI3K/Akt pathway—key components in BC progression—and evaluate its impact on patient survival. We performed a meta-analysis of BC gene expression datasets from the Gene Expression Omnibus using the MetaVolcanoR package, deriving a consensus gene signature linked to ECM, anoikis, and PI3K/Akt pathway. Using the Breast Cancer (METABRIC, Nature 2012 & Nat Commun 2016) and Breast Invasive Carcinoma (TCGA, PanCancer Atlas) datasets via cBioPortal, we assessed the relationship between gene expression and patient survival and analyzed enriched pathways via MSigDB’s Hallmark collection. MetaVolcano analysis identified 4,279 differentially expressed genes, of which 10 were consistently represented in the ECM/Anoikis/PI3K/Akt signature. We performed unsupervised clustering analysis using the expression profiles of the 10-gene signature in the METABRIC dataset, identifying three distinct patient clusters. This clustering pattern was independently validated in the Breast carcinoma (TCGA) cohort, where a similar expression profile and three-cluster structure were observed. Notably, overall survival differed significantly among the clusters (p < 0.05) in both cohorts, supporting the prognostic utility and robustness of this gene signature. Pathway analysis highlighted processes strongly associated with cancer aggressiveness, including differential activation of epithelial-mesenchymal transition, angiogenesis, and apoptosis. Our findings propose a novel ECM/Anoikis/PI3K/Akt-derived gene signature capable of stratifying BC patients into subgroups with divergent survival outcomes. This signature may aid in refining therapeutic decisions, particularly for high-risk or triple-negative subsets.
Palavras-chave: Bioinformatics, Gene, Database
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1124074

Chromosome-scale assembly of the polyploid genome of sugarcane SP80-3280

Autores: Gabriely Santos de Oliveira,Diego Mauricio Riaño Pachón
Apresentador: Gabriely Santos de Oliveira • gabriely.santos@usp.br
Resumo:
Sugarcane (Saccharum complex) is an important crop in global agriculture, accounting for around 80% of the world's sugar production and playing a crucial role in the generation of bioethanol and bioenergy. Its genome is highly complex due to the domestication process, involving crosses between different polyploid species, such as S. officinarum and S. spontaneum. Modern sugarcane varieties are interspecific hybrids, polyploid and aneuploid, with enormous genomes estimated to exceed 10 gigabases (Gb). This genomic complexity results from the high ploidy of ancestral species, which makes genome sequencing and assembly a significant challenge. Sequencing sugarcane genomes, due to polyploidy, requires the use of advanced technologies, which are essential for understanding and improving agronomic traits.
To address these challenges, and to generate a phased chromosome-scale assembly of the genome of the Brazilian sugarcane hybrid SP80-3280 we are using long-read PacBio HiFi sequencing and Illumina Hi-C data. HiFi reads combine long length with ≥99.9% accuracy, enabling reliable reconstruction of complex genomic regions. A total of 312.4 Gbp of PacBio HiFi data were obtained from 11 SMRT Cells on a Sequel II system, with an average read length of 18 kb and average Phred score of 32, providing an estimated haploid coverage of 312×. Two Hi-C libraries were sequenced, generating 1.6 billion 150 bp paired-end reads using four different restriction enzymes. Together, these datasets support robust chromosome-level scaffolding and phasing.
Contig-level assemblies were generated with HiFiASM, a state-of-the-art assembler that constructs phased assembly graphs to separate haplotypes and reduce sequence collapse, thus providing a more accurate representation of genetic variability. For scaffolding, we employed HapHiC, which leverages Hi-C contact maps to order and orient contigs at the chromosome scale.
Two phased assemblies were generated: one consisting of 104 pseudochromosome groups and another with 112. The assembly with 104 pseudochromosomes, having approximately 9 Gb in size, showed superior metrics in terms of completeness and contiguity. This assembly reached an N50 of ~70 Mb, with the largest contig exceeding 170 Mb and an average GC content of ~44.7%. Dotplot alignments with ancestral genomes confirmed the structural integrity and collinearity of the assembly.
We are currently curating the chromosome-level assembly and proceeding with genome annotation. We will present structural features of key genes involved in plant development and regulatory networks, including LFY (Leafy), PHYC (Phytochrome C), and TOR kinase (Target of Rapamycin), exploiting the chromosome-scale and haplotypic resolution of our genome assembly.
This work underscores the importance of integrating high-fidelity sequencing, Hi-C-based scaffolding, manual curation, and comprehensive annotation to assemble complex polyploid genomes.
Palavras-chave: Sugarcane, Polyploid genome, Genome assembly, Phased assembly.
#1124078

Single-Cell RNA Sequencing Reveals Disruption of Thymocyte Developmental Trajectories in Aire-Mutant Mice

Autores: Gustavo Ronconi Roza,Geraldo Aleixo Passos,Romário de Sousa Mascarenhas
Apresentador: Gustavo Ronconi Roza • rronconigustavo@gmail.com
Resumo:
The development of T lymphocytes in the thymus involves a stringent selection process essential for establishing central tolerance. In the thymic medulla, negative selection eliminates autoreactive thymocytes that, if left unchecked, could trigger autoimmune responses in peripheral tissues. This process relies on medullary thymic epithelial cells (mTECs), which express the Aire gene. The Aire protein regulates the expression of tissue-restricted antigens (TRAs) in mTECs, which are then presented to developing thymocytes during negative selection. Mutations in Aire may impair this mechanism, allowing autoreactive T cells to escape into the periphery and potentially initiate autoimmune disease. Using the CRISPR-Cas9 system, our group generated a mutant mouse strain carrying a one-base-pair deletion in exon 6 of Aire (del3554G), which encodes the protein SAND domain. This deletion results in a truncated Aire protein and, in in vitro co-cultures, was found to exert broad effects on both mTECs and adherent thymocytes. In this study, we investigated whether del3554G mutation may also affects thymocyte development in vivo. We performed single-cell RNA sequencing (scRNA-seq) on thymocytes from Aire wild-type and Aire del3554G mutant mice. Single-cell libraries were prepared using the Chromium platform (10x Genomics) and sequenced on the Illumina NovaSeq 6000 system (150 bp paired-end). Data processing included demultiplexing with Cell Ranger v7.0, alignment with STAR v2.7.9a, and downstream analyses using R (v4.4), Seurat (v5.2.1), and Monocle3 (v1.3.7). Results showed that thymocytes from both genotypes lacked detectable Aire mRNA expression, confirming that observed effects were not due to intrinsic Aire expression in thymocytes. Four major thymocyte populations were identified in both samples—double-negative (DN), double-positive proliferative (DPp), double-positive quiescent (DPq), and maturing T cells—although the number of clusters per population varied. Importantly, the Aire del3554G mutation altered RNA pseudotime trajectories, with thymocytes from wild-type mice showing greater transcriptional activity and diversification compared to those from mutants. These findings suggest that the Aire del3554G mutation influences thymocyte maturation in vivo, possibly through extrinsic effects mediated by mTECs.
Palavras-chave: Aire gene, thymocyte development, scRNA-seq, immunogenetics
#1124082

Soil Microbiome Structure and Function Across Diverse Cropping Systems: Insights from MAG-Based Metagenomics

Autores: Inaiá Ramos Aguiar,Cecilia Cerliani,Rafael Silva Rocha,María Eugenia Guazzaroni
Apresentador: Inaiá Ramos Aguiar • inaia.aguiar@alumni.usp.br
Resumo:
Shotgun metagenomic sequencing enables high-resolution analysis of microbial communities directly from environmental samples, offering an integrated view of both taxonomic composition and functional potential. One of its major advantages is the ability to reconstruct metagenome-assembled genomes (MAGs), which allows researchers to access genomic information from previously uncultivable microorganisms. These reconstructed genomes provide valuable insights into microbial contributions to biogeochemical cycles, ecological interactions, and environmental adaptation.
In agricultural soils, MAGs analysis is particularly relevant. Microorganisms play central roles in nutrient cycling, soil structure maintenance, organic matter decomposition, and plant growth promotion, factors that collectively influence crop yield and long-term sustainability. A deeper understanding of soil microbiomes under different agricultural systems can therefore support the development of more productive and environmentally responsible farming practices. This study presents a comparative metagenomic analysis of soil samples collected from key agricultural regions in Argentina, Brazil, the United States, and China. The samples represent diverse cropping systems, including soybean, maize, sugarcane, and citrus. By recovering MAGs and analyzing both taxonomic and functional profiles, we aim to investigate how microbial community structure and metabolic potential vary across geographic locations and crop types.
Initial taxonomic profiling revealed distinct microbial signatures among soil samples, with notable differences in the relative abundance of dominant bacterial phyla. These patterns suggest that microbial community composition is shaped by a combination of environmental and agronomic factors, including soil properties, climate, and land management practices.
Functional analysis based on metagenomic data highlighted substantial variation in microbial metabolic potential. Pathways related to nitrogen and carbon cycling were among the most prominent across samples. These findings suggest that microbial communities contribute to essential ecosystem functions in distinct ways, depending on local conditions and reinforce the ecological importance of soil microbiomes and their dynamic response to agricultural practices.
Palavras-chave: Metagenome-assembled Genomes (MAGs), Agricultural Soils, Microbiota
#1124138

Expansion of sialic acid-utilizing bacteria in the dysbiotic gut microbiome of Clostridioides difficile-infected patients

Autores: Dorivado Marques da Silva Junior,Lívia Soares Zaramela
Apresentador: Dorivado Marques da Silva Junior • marques.dsjr@gmail.com
Resumo:
Clostridioides difficile infection (CDI) is a severe gastrointestinal disease whose prevalence is rising worldwide. It is caused by a Gram-positive, anaerobic bacterium and frequently occurs in nosocomial settings following broad-spectrum antibiotic therapy. Sialic acids are carboxylated monosaccharides located at the terminal ends of cell-surface biomolecules that serve as binding sites for pathogens and toxins. These molecules have been implicated in the onset and progression of CDI. To investigate gut microbial dynamics related to CDI and sialic acids, we reanalyzed data from four shotgun metagenomic studies, focusing on microbial genes involved in sialic acid metabolism and transport. A total of 63 samples from CDI patients and 70 control samples were processed and analyzed using bioinformatics tools. In brief, the relative abundances of 29 sialic acid-related genes were compared. Using the Mann-Whitney test with Benjamini-Hochberg false discovery rate correction at a 0.05 significance threshold, 21 genes were found to have different relative abundance. These genes were associated with polysialic acid synthesis and export, sialic acid catabolism, specific transport systems, and lipopolysaccharide modification. Additionally, the Random Forest machine learning algorithm was applied to predict CDI based on the analyzed genes, achieving an area under the curve (AUC) of 0.99, indicating excellent predictive performance. Taxonomic analysis revealed a reduction in the richness of bacterial species carrying sialic acid-related genes in CDI patients, accompanied by an increase in pathogenic taxa and a decline in beneficial ones. Notably, several producers of short-chain fatty acids were diminished. Overall, these findings suggest that CDI is associated with a dysbiotic gut environment that promotes bacteria enriched in sialic acid metabolism genes.
Palavras-chave: Clostridioides difficile infection, sialic acid, metagenomics,
#1124146

Exploring the Evolutionary Diversity of Spike Proteins using Artificial Intelligence - Beyond the Coronaviridae

Autores: Pâmella Cristina Soares Santana,Dougles Da Silva Rocha De-Santi,Manuela Leal,Maria Fernanda Ribeiro Dias
Apresentador: Maria Fernanda Ribeiro Dias • marfedias@gmail.com
Resumo:
Viruses, considered to be the most abundant biological entities on the planet, have a profound impact on ecosystems and global health. With species infecting millions of hosts in the three domains of life, they are central objects of scientific research. Viral entry proteins, such as fusion and adhesion proteins like Spike from SARS-CoV-2, are essential for infection and are often explored as vaccine, diagnostic and therapeutic targets. However, in addition to viruses from the Coronaviridae family, several other viruses have proteins functionally analogous to Spike, whose diversity and evolutionary relationships are still poorly understood. In this study, we sought to investigate the diversity of these entry proteins and infer possible evolutionary relationships between them. To do this, we conducted a PubMed search using the descriptors: (“virus” AND “spike protein”) NOT “SARS” NOT “MERS” NOT “CORONAVIRUS” NOT “COVID-19”, in order to exclude the widely studied coronaviruses and focus on other viral families. After selecting the species and obtaining the FASTA sequences of the proteins in UniProt, we performed alignments using UGENE and ClustalO and built a phylogenetic tree using IQ-Tree. As a result, we obtained 329 sequences from 31 species belonging to 12 viral families, as well as three bacteriophages with no defined classification. To deepen the evolutionary analysis of proteins, we will use the AAindex database, which gathers hundreds of physicochemical and biochemical properties of amino acids, such as hydrophobicity, polarity, and structural propensities (α-helixes and β-sheets). The application of AAindex will allow the protein sequences to be represented numerically, enabling a comparative analysis based on structural properties, as well as complementing the phylogenetic data with physicochemical information. After this process, unsupervised machine learning algorithms will be applied to group the sequences according to their physicochemical properties. By integrating these approaches, we hope to better understand the similarities and divergences between viral entry proteins, both from an evolutionary and structural point of view, and to identify patterns that may be related to spillover potential and the ability to infect multiple hosts. This analysis could contribute to the future development of antiviral strategies and surveillance of new emerging viruses.
Palavras-chave: global health, Spike Proteins, Artificial Intelligence
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1124159

Interaction of Polystyrene with Placental Membrane Proteins: A Computational Docking Approach

Autores: Iasmin Cristina Lira Cavalcante,Wallison dos Santos Dias,Julio Cosme Santos da Silva,Alexandre Urban Borbely
Apresentador: Iasmin Cristina Lira Cavalcante • iasmin.cavalcante@eenf.ufal.br
Resumo:
The presence of microplastics and nanoplastics (MNPs) in human tissues and organs has been increasingly investigated. Among these organs, the placenta is important due to its critical role in fetal development. Potential health damages caused by these particles remains limited and the mechanisms enabling their entry remain poorly understood. In this context, receptor-mediated endocytosis emerges as a potential internalization pathway for smaller particles, as it allows specific selection for incorporation. Considering that MNPs are foreign bodies made of polymers, it is hypothesized that their monomers could interact with cell membrane proteins that serve as receptors involved in the endocytic process. Therefore, this study aims to employ Molecular Docking techniques to evaluate potential receptors on the placental syncytiotrophoblast membrane that could interact with polystyrene monomers through analysis of their binding energies. In this study, five cellular proteins were selected from the RCSB PDB database (https://www.rcsb.org/), including four membrane receptors and one integral membrane protein. Three conformations of polystyrene were chosen from the ChemTube3D database (https://www.chemtube3d.com/): isotactic (semicrystalline), atactic (amorphous), and syndiotactic (crystalline). After this selection, ligand structures were optimized using Gaussian09 software (https://gaussian.com/). Subsequently, blind docking was performed using the online HDOCK server (http://hdock.phys.hust.edu.cn/), with the previously optimized three-dimensional structures provided. Docking simulations performed on the HDOCK server yielded energy values (in Kcal/mol) for each interaction. For Receptor 1, obtained values were -203.50, -252.81, and -249.96 kcal/mol for isotactic, atactic, and syndiotactic conformations, respectively. Interaction with one of its natural ligands (control) presented a value of -268.43 kcal/mol. For Receptor 2, the energies were -144.19, -190.41, and -180.25 kcal/mol, while the control yielded -244.14 kcal/mol. In the case of Receptor 3, values were -116.39, -134.95, and -137.05 kcal/mol, with the control showing -726.36 kcal/mol. For Receptor 4, results were -238.38, -285.68, and -270.40 kcal/mol, whereas the control obtained -334.36 kcal/mol. Lastly, for the transport protein, values were -174.44, -200.27, and -133.94 kcal/mol in the same order of conformations, with the control resulting in -232.34 kcal/mol. These results indicate that polystyrene monomers display energetic affinity with the studied proteins, suggesting these interactions could spontaneously occur in biological environments.The theoretical results obtained suggest that MNPs have the potential to be internalized by cells through endocytosis, highlighting a possible pathway for these particles to enter tissues. This finding is concerning, particularly in the context of pregnancy, as the presence of these contaminants may pose risks to fetal development.
Palavras-chave: Molecular Docking, Placenta, Endocytosis Receptor, Nanoplastics.
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1124163

High Resolution Melting (HRM) for genotyping of Citrus rootstocks

Autores: Karolinni Bianchi Britto,Danielle Menezes Martins,Matheus Lopes Furtado Machado,Heberth de Paula,José Aires Ventura,Greiciane Gaburro Paneto
Apresentador: Greiciane Gaburro Paneto • greiciane_gp@hotmail.com
Resumo:
The rootstock is the component of the plant that gives the root system support, enables the plant to absorb water and nutrients, and has a direct impact on the resilience and vigor of citrus plants. Because it influences orchard yield, disease tolerance, and soil adaptation, its selection is essential. Fruit quality and yield are determined by the scion and rootstock combination, thus it is crucial to take climate, soil type, and agricultural management into account to ensure effective cultivation that meets market expectations. The purpose of this work was to create a primer kit based on Single Nucleotide Polymorphisms (SNPs) acquired using Diversity Arrays Technology (DArTSeq) for the genetic identification of seven citrus rootstocks utilizing High Resolution Melting (HRM) analysis. 21 citrus rootstock samples were gathered from the germplasm bank of the Instituto Capixaba de Pesquisa, Assistência Técnica e Extensão Rural (Incaper) in Espirito Santo, Brazil. These samples represented seven genotypes that are commonly used in citriculture. Genomic DNA was extracted from leaf samples in three biological replicates and analyzed by DArTSeq for SNP identification. To select the most informative markers, advanced computational strategies were employed, using R and Python scripts to process the raw data, generate consensus sequences and select SNPs with high discriminatory capacity. The filtering of the 64,442 identified SNPs was performed based on criteria such as allele frequency, entropy and alignment to the reference genome. Based on the selected SNPs, primers were designed for HRM, using the bioinformatics tools: PRIMER3WEB, PERLPRIMER and PRIMER-BLAST. HRM analysis was performed in a LightCycler® 96 thermal cycler, with optimized conditions for amplification and detection of melting curves. Data processing enabled the precise differentiation of the analyzed genotypes, ensuring an efficient and reproducible method for citrus rootstock identification. The HRM analysis successfully differentiated the seven evaluated genotypes using three distinct primers, which generated amplicons of 162 bp, 118 bp, and 132 bp in three sequential reactions. The proposed method demonstrated high efficiency, reducing both costs and analysis time. The integration of HRM with rigorous bioinformatics analysis proved to be an effective tool for certification and traceability in the citrus industry. This approach minimizes common errors in visual selection, ensuring greater reliability in genetic material identification.
Palavras-chave: Bioinformatics, Polymorphisms, Differentiation, Citriculture
#1124165

Analyzing the free energy landscape of HLA-DRB1 isoforms and their role in MS pathogenesis

Autores: Levy Bueno Alves,Silvana Giuliatti
Apresentador: Levy Bueno Alves • levybuenoalves@usp.br
Resumo:
Studies have identified an association between Multiple Sclerosis (MS) and Epstein–Barr virus (EBV), with the EBNA1 protein considered as one of the main exogenous antigens involved in disease induction. The 400–413 epitope of this protein and the 85-98 region of myelin basic protein (MBP) have been implicated in molecular mimicry due to cross-reactive T cell responses. Both peptides are recognized by HLA-DR molecules, with the HLA-DRB1*15:01 allele being most strongly associated with MS. However, it is still unknown how these molecular interactions contribute to the breakdown of immunological tolerance. Considering that the HLA-DRB1 locus is highly polymorphic, the role of its polymorphisms has been studied to understand why some alleles influence predisposition while others influence resistance to MS. To investigate potential conformational and energetic differences in the binding groove, molecular dynamics (MD) simulations were conducted with predisposing and non-predisposing alleles, both in the free (APO) state and in complex with three relevant peptides: MBP85–99, EBNA1400–414, and CLIP103–117. The complete model of the HLA-DRB1*15:01 heterodimer was constructed based on the 1YMM crystallographic structure, where the missing regions were filled using templates available in the Alphafold database. From this model, five other allelic isoforms were generated: DRB1*01:01, DRB1*11:01, DRB1*15:02, DRB1*15:03 and DRB1*16:01, representing variants with distinct effects on MS susceptibility. Peptide docking into the binding groove was performed using the HPEPDOCK software. The resulting 24 systems were embedded into heterogeneous lipid bilayers using the CHARMM-GUI server and subjected to 500 ns MD simulations using the GROMACS package. Free energy landscape (FEL) analysis was based on the first two principal components (PC1 and PC2) of the trajectories, with estimates obtained using the SHAM (Stochastic Histogram Analysis Method). Energy minima were identified in regions of lowest projected values in the PC1 × PC2 space. The results of the FEL analysis showed that the binding groove of the HLA-DRB1*15:01 complexes with MBP and EBNA1 have similar energetic conformational states, corroborating the hypothesis of functional mimicry. In the DRB1-15:03 allele, the EBNA1 peptide dissociated around 320 ns. However, the conformational states obtained after 400 ns coincided with the energy well observed in the MBP–DRB1*15:03 and MBP/EBNA1–DRB1*15:01 complexes. This indicates that even after EBNA1 dissociation, the -DRB1*15:03 allele adopts conformational states conducive to the presentation of autoantigens. Furthermore, the DRB1*15:01 and DRB1*15:03 alleles already presented, in the APO state, conformations similar to those induced by MBP and EBNA1. Thus, the viral epitope becomes less determinant in the structural definition of the binding groove, but DRB1*15:01 can act as an antigenic trigger in an inflammatory environment. This conformational plasticity can generate similar pHLA surfaces at the interface with the TCR, even with peptides in different conformations, which supports the hypothesis of cross-reactivity. In conclusion, this study shows the importance of the structural flexibility of risk alleles in mediating cross-reactivity in MS, providing support for the development of therapies that target both antigenic peptides and the intrinsic conformational states of HLA-DRB1.
Palavras-chave: Autoimmunity, HLA-DRB1, polymorphisms, structural bioinformatics
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1124167

Gut Microbiota Metabolites and Their Implications in Neurobiology and Treatment of Major Depressive Disorder

Autores: Vitória Stavis de Araujo,Silvana Giuliatti
Apresentador: Vitória Stavis de Araujo • vitoriastavis@gmail.com
Resumo:
Major Depressive Disorder (MDD) is a high-prevalence condition and one of the main causes of burden throughout life. It is a heterogeneous and multifactorial disorder that frequently resists treatment, and its causes are not yet fully understood. Due to its complexity, some challenges include the correct diagnosis, subtypes identification for personalized treatment, and monitoring. The main hypotheses for MDD are related to the depletion of neurotransmitters, the HPA axis, neuroinflammation, and oxidative stress. Those processes lead to functional and structural alterations in neurons, as well as glial cells (especially microglia, astrocytes, and oligodendrocytes). The consequences of those changes combined include cell death, synaptic loss, neuroplasticity impairment, and volumetric reduction in brain areas, e.g. prefrontal cortex, hippocampus, hypothalamus, and amygdala. The most common treatments for MDD are inhibitors of the recaption of monoamines. There are alternatives acting in glutamatergic and GABAergic receptors and cortisol metabolism. However, these treatments often present considerable side effects or face resistance, and some of them require more studies. A factor associated with the mentioned pathways is the gut microbiota, represented mainly by bacteria and fungi. Those microorganisms play a role in nutrient metabolism, immune and epigenetic regulation, pathogen resistance, intestinal mucosa permeability, and homeostasis. Among the metabolites produced or transformed by the microbiota are lipopolysaccharides, short-chain fatty acids, D-amino acids, neurotransmitter precursors, polyamines, indole compounds, phenolic compounds, and secondary bile acids. Some induce neuroinflammation and neurodegeneration, while others have anti-inflammatory effects and balance the HPA axis; for some, there are less studies. Overall, the results are inconclusive, so further research is needed to clarify the mechanisms and effects of these compounds. This study aims to investigate the interaction between gut microbiota metabolites and proteins involved in MDD, in order to expand the existing knowledge and propose new strategies for auxiliary diagnostics, prognostics and treatment. The microbial metabolites were selected from the GutmGene v2.0 database. Their targets were predicted using Similarity Ensemble Approach (SEA) and Swiss Target Prediction (STP). The targets underwent functional enrichment with Over-Representation Analysis, using the following databases: Kyoto Encyclopedia of Genes and Genomes, Gene Ontology and DisGeNET. 246 unique metabolites from the human microbiota were obtained. Using SEA, 1289 targets were predicted, while 992 were predicted with STP. The intersection of those targets resulted in 303 unique targets. The enrichment analysis showed relevant pathways, such as glutamate and dopamine neurotransmission, insulin metabolism and pathways related to neuroplasticity and neuroinflammation. Some of the encoded proteins from the associated genes were already studied as drug targets for MDD, but present the mentioned side effects. For those reasons, treatment with metabolites from the gut microbiota or analog novel drugs can be proposed. Some of the observed pathways and their targets need more research as potential treatment strategies, considering microbial metabolites, as well as novel or preexistent drugs. These findings may contribute to the development of metabolomic signatures for auxiliary diagnostic tests and prognostic tools, in addition to joint therapeutic possibilities targeting gut microbiota modulation of MDD.
Palavras-chave: Major Depressive Disorder, intestinal microbiota, functional enrichment analysis, target prediction
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1124171

Comparative Structural Profiling of Spike Protein Processing by Host Proteases Across SARS-CoV-2 Variants

Autores: Felipe James de Almeida Vasquez,Silvana Giuliatti
Apresentador: Felipe James de Almeida Vasquez • felipevasquez@usp.br
Resumo:
The entry of SARS-CoV-2 into human cells is a multistep process involving the binding of the viral Spike protein to the ACE2 receptor, followed by proteolytic activation by host enzymes such as Furin and TMPRSS2. While ACE2-mediated recognition is well characterized, the roles of Furin and TMPRSS, particularly in the context of emerging variants, remain less explored. Furin cleaves the Spike protein at the S1/S2 site, and TMPRSS2 acts at the S2′ site, both facilitating membrane fusion and viral entry. In this study, molecular dynamics (MD) simulations were employed to investigate the structural dynamics of ACE2–Spike–Furin and ACE2–Spike–TMPRSS2 complexes across multiple SARS-CoV-2 variants. Structures of ACE2, Spike, and Furin were retrieved from the Protein Data Bank (PDB), and TMPRSS2 was modeled using AlphaFold2. Spike variants, including Alpha, Beta, Gamma, Delta, and Omicron sublineages (BA.1, BA.2, BA.5, XBB.1.5, and BA.2.86), were modeled using MODELLER and glycosylated with Glycan Reader & Modeler. Protein–protein docking was performed with HADDOCK, followed by 200 ns MD simulations by using GROMACS with the CHARMM36m force field. Trajectory analyses included Root Mean Square Deviation (RMSD), hydrogen bonding, and per-residue energy decomposition with the gmx_MMPBSA tool. RMSD values averaged 1.5 ± 0.5 nm for ACE2–Spike–Furin and 1.9 ± 0.7 nm for ACE2–Spike–TMPRSS2 complexes, indicating conformational flexibility. ACE2 and Furin interactions with Spike were more strongly affected by Spike mutations than those involving TMPRSS2, particularly in the Omicron sublineages, which exhibited increased hydrogen bonding. Energy decomposition per residue performed using the MM/PBSA method, revealed how specific mutations in Spike and host proteins influence binding affinity and structural stability. Furin showed variant-specific energy contributions near the S1/S2 cleavage site, particularly associated with the P681H/R mutations, which may enhance viral infectivity and transmission. TMPRSS2 complexes, in contrast, exhibited a relatively stable binding pattern with fewer mutation-dependent variations at the S2′ site. ACE2 interactions were more significantly impacted by the N501Y mutation, which altered the binding affinity at the interface. While the TMPRSS2–Spike interface remained largely conserved, ACE2 and Furin demonstrated more dynamic interaction profiles in response to variant-specific mutations. These results improve our understanding of viral entry mechanisms and the effects of variant-specific mutations, which may support the development of targeted antiviral therapies and vaccines, especially against emerging strains. Overall, the findings clarify the mechanisms of Spike activation and identify ACE2 and Furin as dynamic mediators of variant-specific infectivity, suggesting their potential as targets for antiviral strategies.
Palavras-chave: SARS-CoV-2, Spike protein, Host–virus Interaction, Molecular Dynamics Simulations
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1124195

Bioinformatics in agribusiness: Computational approaches for genetic identification of Pera Orange

Autores: Karolinni Bianchi Britto,Raphaella Luisa Fernandes de Almeida,Heberth de Paula,José Aires Ventura,Greiciane Gaburro Paneto
Apresentador: Karolinni Bianchi Britto • karolbbritto@hotmail.com
Resumo:
The phenotypic resemblance of seedlings in the juvenile stage makes it difficult to distinguish between different cultivars of sweet orange (Citrus sinensis L. Osbeck). Although molecular techniques have been used to get over this restriction, many of them entail intricate procedures that make precise sample identification difficult. Diversity Arrays Technology Sequencing (DArTSeq) enables the detection of Single Nucleotide Polymorphisms (SNPs), which can be explored using techniques such as High Resolution Melting (HRM). Based on melting curve analysis, HRM provides a rapid and precise approach for cultivar differentiation, reducing both time and costs compared to whole-genome sequencing. 24 Pera sweet orange cultivar samples were examined in this study; they were taken from the Capixaba Institute for Research, Technical Assistance, and Rural Extension's (Incaper) germplasm collection in Espirito Santo, Brazil. SNPs were found using DArTSeq after genomic DNA was isolated from leaves. To select the most informative markers, advanced computational strategies were applied, using R and Python scripts to process raw data, generate consensus sequences, and filter SNPs with high discriminatory power based on criteria such as allele frequency, entropy, and alignment to the C. sinensis reference genome. From the selected SNPs, specific primers were designed and subsequently validated by HRM, allowing the samples to be separated into distinct genetic profiles. To ensure effective discrimination between Pera orange cultivars, seven of the 13,224 SNPs found by DArTSeq analysis were chosen using computational methods. Amplicon sizes ranged from 87 to 255 bp, and only seven HRM runs were required to discriminate all cultivars, reducing analytical time and complexity compared to conventional methods. A promising tool for seedling certification and traceability in the citrus industry, the created primer kit showed excellent efficiency, speed, and cheap cost. The suggested methodology improves the security and precision of genetic characterisation for Pera orange cultivars by offering a user-friendly and dependable framework for researchers, producers, and seedling distributors.
Palavras-chave: Genotyping, Polymorphisms, Bioinformatics, Certification
★ Running for the Qiagen Digital Insights Excellence Awards
#1124252

Genomic Aspects and Transcriptomics under Root Dehydration of Terpene Synthase in Cowpea (Vigna unguiculata)

Autores: Maria Luiza Carvalho Farias,Ana Luíza Trajano Mangueira de Melo,Manassés Daniel da Silva,Ana Maria Benko-Iseppon,José Ribamar Costa Ferreira Neto
Apresentador: Maria Luiza Carvalho Farias • schaelleger@gmail.com
Resumo:
Terpenes are synthesized by a gene family known as Terpene Synthase (TPS), which plays a fundamental role in plants by contributing to hormone biosynthesis and cellular regulation. These compounds are critical components of plant adaptation to drought stress and are considered key factors in mitigating productivity losses in several cultivated species within the agronomic sector, including cowpea (Vigna unguiculata), a strategically important legume. The present study aims to investigate the transcriptomics under root dehydration and genomics of the TPS gene family in cowpea, thereby supporting the development of strategies for its effective management through a comprehensive understanding of its complexity. A total of 29 loci were identified, exhibiting variability in the number of isoforms ranging from 1 to 9 isoforms per locus, which may suggest alternative mechanisms of gene expression. Such plasticity is likely attributable to alternative splicing, a common regulatory mechanism activated in response to stress. Using MCScanX to evaluate the mechanisms of gene duplication, a total of nine genes were identified as segmental duplications, nine as tandem duplications, eight as proximal, and three as dispersed. To investigate the mechanisms underlying gene family expansion and orthogroup formation across other Fabaceae species, OrthoFinder and CAFE5 were employed. A total of 18 orthologs were identified in Phaseolus vulgaris, 20 in Glycine max (L.), and 20 in Medicago truncatula, suggesting a high level of conservation in these gene families across closely related legume species, with limited evolutionary divergence in terms of copy number variation. No significant gene family expansions or contractions were observed within the cowpea genome when compared to its immediate ancestral group. Additionally, transcriptomic data from the RNA-seq data provided by the Cowpea Genomics Consortium, considering log2FC > 1, p < 0.05, and FDR < 0.05, were analyzed in cowpea subjected to root dehydration at 50 and 150 minutes. At 50 minutes, five transcripts were up-regulated and fourteen down-regulated, whereas at 150 minutes, only three transcripts were down-regulated and twenty-one up-regulated. These findings suggest an early and transcriptional response to drought, characteristic of immediate-response genes, followed by a stabilization of gene expression potentially reflecting the activation of root dehydration tolerance mechanisms or adaptation to prolonged stress conditions. Collectively, the results highlight the regulatory importance and dehydration-responsiveness of TPS genes under adverse environmental conditions. These insights contribute to a deeper understanding of their functional roles and offer promising avenues for the development of drought-tolerant cultivars through genetic improvement and breeding strategies.
Palavras-chave: gene family expansion, transcriptome analysis, comparative genomics.
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1124276

Comparative Genomics and Stable Isotope Evidence for a Saprotrophic–Pathogenic Lifestyle in Phellinotus piptadeniae

Autores: Luiz Marcelo Ribeiro Tomé,Gabriel Quintanilha-Peixoto,Diogo Rezende,Dr. Carlos Alberto Salvador Montoya,Domingos Cardoso,Daniel S. Araújo,Jorge Marcelo Freitas,Gabriela Nardoto,Genivaldo Alves da Silva,ELISANDRO RICARDO DRECHSLER DOS SANTOS,Aristóteles Góes-Neto
Apresentador: Gabriel Quintanilha-Peixoto • gabrielnilha@gmail.com
Resumo:
Fungi in forest ecosystems can occupy a wide spectrum of ecological niches, ranging from decomposers of dead organic matter to pathogens and symbionts of living plants. In the Hymenochaetaceae (Basidiomycota), distinguishing between saprotrophic and pathogenic lifestyles is particularly challenging, as many species may exhibit overlapping or flexible trophic modes. In this study, we investigated the ecological lifestyle of Phellinotus piptadeniae, a poorly known neotropical fungus predominantly associated with living trees of Piptadenia gonoacantha (Fabaceae) in the Atlantic Forest of South America. To elucidate the trophic behavior of P. piptadeniae, we adopted an integrative approach combining genome sequencing and annotation, comparative phylogenomics, pangenome analysis, and stable isotope profiling (δ¹³C and δ¹⁵N). The genome was assembled using a hybrid Illumina and Oxford Nanopore strategy, resulting in a 32.97 Mbp assembly with high completeness (95.6% BUSCOs) and 9,771 predicted protein-coding genes. The phylogenomic analysis placed P. piptadeniae within a well-supported clade comprising both saprotrophic and pathogenic taxa, including Inonotus, Sanghuangporus, and Fomitiporia mediterranea. Comparative genomic analyses revealed the presence of a large repertoire of Carbohydrate-Active enZymes (CAZymes), consistent with wood decay capabilities. Additionally, the pangenome analysis identified a set of species-specific gene families, many of which include predicted effectors and secreted proteins involved in detoxification, secondary metabolism, and host interaction. Stable isotope analysis of fungal basidiomata and host tissues provided further insight into the nutritional strategy of P. piptadeniae. Isotopic signatures did not align with those typically observed for strict saprotrophs, but instead fell within clusters containing ectomycorrhizal and bryophilous fungi, suggesting a more complex, possibly facultative pathotrophic lifestyle. These findings were consistent across multiple clustering models and were supported by comparative CAZy profiling with other fungal taxa. Ecological data indicate that P. piptadeniae exhibits a marked preference for humid regions and is frequently found on high branches of living host trees, which often lack heartwood. This spatial distribution and genomic and isotopic evidence support the hypothesis that P. piptadeniae may access carbon from living sapwood, pointing to a pathotrophic adaptation. Our results provide the first comprehensive genomic and ecological characterization of P. piptadeniae and contribute to a growing body of evidence highlighting trophic plasticity in wood-inhabiting fungi. This work challenges traditional binary classifications of fungal trophic modes and underscores the utility of integrated molecular and ecological methods in resolving fungal lifestyle strategies.
Palavras-chave: fungus-plant interactions, Hymenochaetaceae, Fabaceae, CAZy, C13/N15stable isotopes
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1124277

MirtronDB 2.0: expanded the knowledge about mirtrons

Autores: Vitor Gregorio,Fabiana Rodrigues de Góes,Matheus Fujimura Soares,Bruno Thiago de Lima Nichio,Alisson Gaspar Chiquitto,Flávia Lombardi Lopes,Mark Basham,Douglas Silva Domingues,ALEXANDRE ROSSI PASCHOAL
Apresentador: Vitor Gregorio • vitor-gregorio@hotmail.com
Resumo:
MirtronDB is a specialized database dedicated to centralizing and organizing mirtrons, a class of microRNAs derived from splicing. Mirtrons are a subclass of microRNAs that originate from short intronic sequences and bypass the canonical microRNA biogenesis pathway. Instead of being processed by Drosha, mirtrons are directly spliced, debranched, and then enter the microRNA maturation pathway, where they are further processed by Dicer. These molecules have been identified in various species and are potential sources of several human pathologies (Qu and Adelson, 2012), whereas in plants, research suggests a feedback loop for the autoregulation of miRNA biogenesis (Budak and Akpinar, 2015). In this work, we present a major update to MirtronDB, incorporating new data and functionalities to improve usability and accessibility. The database has been updated with new literature published between 2017 and 2023, significantly expanding the number of documented mirtrons across various species. In this update, we have included 7 new research articles, 13 additional species, and a total of 126 new mirtrons, comprising 12 mature sequences and 114 precursor sequences. This ensures that MirtronDB remains a comprehensive and up-to-date resource for the scientific community. Additionally, we have implemented new features, including the ability to download mirtron annotations in BED format, facilitating integration with genome browsers and other bioinformatics tools. To further enhance user experience, we have developed an interactive dashboard that allows researchers to explore mirtron data in a more intuitive and visual manner. This dashboard provides dynamic filtering, statistical summaries, and graphical representations, enabling users to analyze trends and patterns efficiently. By integrating these improvements, MirtronDB continues to be an essential resource for mirtron research. We believe these enhancements will contribute to a deeper understanding of mirtrons and their biological significance.
Palavras-chave: mirtron, microRNA, splicing, database, dashboard
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1124285

Detective Challenge: a card-game approach to teach bioinformatics and gene expression in high schools

Autores: Dayana Turquetti,Gabriel Quintanilha-Peixoto,Ana Luiza Martins-Karl,Thiago M Venancio
Apresentador: Gabriel Quintanilha-Peixoto • gabrielnilha@gmail.com
Resumo:
This study presents an innovative educational activity that integrates bioinformatics and molecular biology through a gamified approach using the Soybean Expression Atlas (SEA). The current version of SEA includes more than 5,000 publicly available RNA-seq samples derived from a wide range of soybean tissues, such as leaves, flowers, and seeds. This comprehensive dataset provides high-resolution gene expression profiles and serves as both a scientific and educational resource. Inspired by the potential of SEA to engage learners, we developed an educational card game entitled "Detective Challenge". This activity is designed to introduce high school students to core concepts of gene expression using soybean as a model organism. By adopting a gamification strategy, the Detective Challenge bridges theoretical knowledge and practical application, transforming the classroom into a space for active inquiry and problem-solving. In the Detective Challenge, students become "gene detectives" tasked with investigating the functions of specific soybean genes based on their expression patterns within SEA. Participants learn to interpret data, identify tissue-specific expression, and evaluate transcript abundance using TPM (Transcripts Per Million) values. Through this hands-on interaction with a real-world dataset, students are introduced to fundamental concepts in genetics, molecular biology, and bioinformatics. The activity promotes scientific curiosity and critical thinking while addressing a gap in accessible bioinformatics educational tools for pre-university students. It also provides students with early exposure to computational tools and analytical methods commonly used in current biological research. We aim to demystify complex genetic topics and foster engagement with authentic scientific data. Evaluation of the activity revealed that the Detective Challenge is highly effective in improving students’ understanding of gene expression. Students actively developed hypotheses, explored functional genomics data, and demonstrated biological reasoning. The overwhelmingly positive feedback and high participation rates underscore the value of gamification in science education. In particular, students’ ability to analyze expression profiles and infer gene function reflects a strong integration of data interpretation with conceptual learning. These outcomes align with educational research advocating for active, inquiry-based learning as a strategy to enhance understanding and retention. Furthermore, qualitative feedback highlighted areas for improvement, including increasing accessibility through translations and broader implementation. Future developments of the Detective Challenge could incorporate interdisciplinary elements and real-world scenarios, reinforcing the relevance of genetics and bioinformatics in addressing societal challenges. In summary, this work demonstrates the effectiveness of gamified, data-driven educational strategies in enhancing biological literacy. The Detective Challenge provides a scalable and engaging model for integrating bioinformatics into secondary education, fostering curiosity, critical thinking, and a deeper appreciation of gene expression and functional genomics.
Palavras-chave: RNA-seq, Soybean Expression Atlas, Gamification
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1124485

The mobilome of Vigna Unguiculata: a genomics approach

Autores: Flávia Layse Belém Medeiros,Cínthia Carla Claudino Grangeiro Nunes,Ana Luíza Trajano Mangueira de Melo,Agnes Angélica Guedes de Barros,Eliseu Binneck,José Ribamar Costa Ferreira Neto,Ana Maria Benko-Iseppon
Apresentador: Flávia Layse Belém Medeiros • flavia.lmedeiros@ufpe.br
Resumo:
Transposable elements (TEs) are interspersed repeats capable of moving within the genome. They are present in most eukaryotes and abundant in plant species. Consistent research has shown TEs potential to generate diversity, for example, as they play an important role in gene regulation. Given their importance, the aim of this study was to annotate and classify TE families in cowpea (Vigna unguiculata), a globally recognized Fabaceae species cultivated for human consumption. Genomes of genotypes from V. unguiculata ssp. unguiculata (Pingo de Ouro, IT85F-76, BR-14-Mulato, Santo Inácio) and subspecies V. unguiculata ssp. dendkditiana were sequenced and assembled by the Cowpea Genomics Consortium. The reference cowpea genome and V. unguiculata ssp. sesquipedialis were downloaded from public databases. We adopted the pipeline RepeatModeler2 and RepeatMasker to search and annotate transposable elements, and MCHelper to curate the library. These actions generated a consensus sequence library for V. unguiculata. Means and standard deviation were calculated in Excel. Additionally, to compare TE family distribution at a chromosome level we focused on two contrasting genotypes: BR-14-Mulato (resistant to Cowpea-Aphid Born Mosaic Virus) and IT85F-76 (susceptible to the referred pathogen). TBtools was used to calculate gene density and generate a Circos plot. Other plots were made in Python and edited on Inkscape. Comparing the seven genomes, on average 43.5% of cowpea genome was composed by repetitive DNA. Repetitive landscape ranged from 40.08% (BR14-Mulato) to 50.82% (Reference genome) and was mostly composed by TEs. The most abundant families in cowpea were LTR/Gypsy and LTR/Copia, as is commonly seen in plants. LINEs and SINEs represented in general less than 2% of the studied genomes. Regarding DNA transposons, TIR superfamily was the most abundant in all genomes, ranging from 5.94% (sesquipedialis) to 8.21% (IT85F-76). LTR and TIR distribution was consistent in all genotypes, these elements are often associated with gene regulation and stress adaptation and can contribute to cowpea stress responses. SINEs, LINEs, HELITRON and Maverick – present in smaller contents – showed a more heterogeneous scenario. Their variable distribution could be due to a possible recent mobilization, which can generate diversity among the cultivars. Regarding distribution in the contrasting genotypes BR14-Mulato and IT85F-76, the above-mentioned TE superfamilies are distributed in all eleven chromosomes. Although TEs are generally enriched in centromeric and pericentromeric regions, their distribution appears relatively even in the two genomes. For both genotypes there was a similar pattern of TE distribution: DNA transposons (HELITRON, Maverick, TIR and MITE) were located in gene poor regions of the chromosome, likely to avoid disrupting essential genes. In contrast LTR/Copia and Gypsy showed peaks in high density gene regions in chromosomes 4 and 5, possibly due to LTR preference for insertion nearby or within genes. The data suggests no significant differences in TE distribution/abundance between the contrasting genomes. To further investigate if TEs are contributing to genetic diversity in cowpea cultivars a transposon insertion polymorphism analysis could be a useful tool to assess genotype-specific insertions. Overall, our findings are important steps towards characterizing cowpea mobilome and understanding crop's genetic diversity.
Palavras-chave: Transposable elements, Cowpea, Genomics, Gene
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1124492

The mobilome of V. unguiculata: transcriptomic insights into root dehydration response

Autores: Flávia Layse Belém Medeiros,Cínthia Carla Claudino Grangeiro Nunes,Ana Luíza Trajano Mangueira de Melo,Agnes Angélica Guedes de Barros,Eliseu Binneck,José Ribamar Costa Ferreira Neto,Ana Maria Benko-Iseppon
Apresentador: Flávia Layse Belém Medeiros • flavia.lmedeiros@ufpe.br
Resumo:
Transposable elements (TEs) are interspersed repeats capable of moving within the genome. They are present in most eukaryotes and abundant in plant species. Beyond their capacity to shape genome structure, TEs influence gene expression through insertions within or nearby genes. Given their importance, this study aimed to identify TEs and analyze their transcriptional activity in cowpea (Vigna unguiculata), a globally recognized Fabaceae cultivated for human consumption. The experiment was conducted in greenhouse conditions. Seedlings of “Pingo de Ouro”, a drought-tolerant cultivar, were subjected to root dehydration in a hydroponic system, as they are tolerant to the referred stress. RNA-Seq libraries were synthetized, and differential gene expression analyses was performed (25 and 150 minutes) using edgeR within the GenPipes pipeline. Transcripts with log₂FC > 1 or < −1, p < 0.05, and FDR < 0.05 were considered differentially expressed genes (DEG). We adopted the pipeline RepeatModeler2 and RepeatMasker to search, annotate and classify transposable elements in the transcriptome and MCHelper to curate the resulting library. TE-transcripts were matched to DEGs via transcript ID using VLOOKUP in Excel. This action generated a subset of differentially expressed TE-related transcripts (DE TE-transcripts). Gene Ontology (GO) enrichment analysis was performed in DE TE-transcripts using TBtools. Plots were made in Python and edited in Inkscape. A total of 77,085 TE-related transcripts were identified, accounting for 39.19% of the transcriptome. LTR elements were the most abundant (22.77%), followed by DNA transposons (11.02%) and Retrotransposons (5.4%). The dominance of LTR elements is consistent with reports from other plant species, where these elements are often stress-responsive and may modulate the expression of adjacent genes. We observed 2,820 up-regulated and 1,342 down-regulated TE-related transcripts after 25 minutes. Interestingly, the 15-hour analysis revealed a reversed trend with 1,325 up-regulated and 2,616 down-regulated elements. This suggests an initial activation of TEs in response to sudden stress, potentially contributing to early transcriptional changes, followed by a regulatory mechanism to control TE expression and prevent genomic instability. GO enrichment analysis revealed that TE related-transcripts were significantly involved in biological processes such as response to stimulus (GO:0050896), response to stress (GO:0006950), response to chemical (GO:0042221) and defense response (GO:0006952). The consistency of enriched GO terms across different all stress time points suggest the contribution of TEs to transcriptional activity in cowpea under drought stress. These enriched TE-transcripts were predominantly LTR elements (Gypsy, Copia, TRIM, LARD), although enrichment was also observed in TIR and LINE elements. The presence of multiple TE families suggests a complex role for TEs in regulating stress-responsive gene expression. Although the regulatory mechanisms remain to be fully elucidated, our results support the hypothesis that TEs are dynamically regulated in response to abiotic stress and may contribute to adaptive gene expression changes. Therefore, our work contributes to the characterization of the cowpea mobilome and its role in stress response in cowpea.
Palavras-chave: Mobile elements, Cowpea, Transcriptome, Gene Ontology
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1124674

Comprehensive Transcriptomic Profiling of Ferroptosis Highlights Diagnostic and Therapeutic Targets in Human Sepsis

Autores: Miguel Victor Bringel Sales,Helder Takashi Imoto Nakaya,Fernando de Queiroz Cunha,Antonio Edson Rocha Oliveira
Apresentador: Miguel Victor Bringel Sales • mivibrisa@gmail.com
Resumo:
Sepsis, a life-threatening condition marked by a dysregulated host response to infection, remains a major cause of mortality worldwide. Emerging evidence has implicated ferroptosis, a regulated form of cell death driven by iron-dependent lipid peroxidation, in sepsis pathogenesis. However, the transcriptomic regulation of ferroptosis-related genes (FerGs) in sepsis remains poorly characterized. In this study, we conducted an integrative transcriptomic analysis of 18 publicly available datasets, including microarray, bulk RNA sequencing (bRNA-seq), and single-cell RNA sequencing (scRNA-seq) data, comprising whole blood and PBMC samples from septic patients or mice and healthy controls. Our analysis revealed a dynamic regulation of ferroptosis across the course of sepsis. Adult patients exhibited a more pronounced differential expression of FerGs compared to children, with ferroptosis-related activity increasing in later disease stages. Meta-analysis of 33 microarray comparisons identified 87 differentially expressed FerGs (DEFerGs), with 125 DEFerGs identified upon integration with bRNA-seq data. A subset of 11 genes (ACSL1, ALOX5, CAPG, CREB5, G6PD, MAPK14, MGST1, MICU1, QSOX1, SAT1, and SLC2A3) were consistently upregulated across all three platforms, underscoring their robustness and potential as biomarkers for sepsis diagnosis and monitoring. Moreover, 15 other potential biomarkers were identified, regulated in at least 90% of microarray comparisons, with 12 upregulated (ACVR1B, CAMKK2, CAPG, GALNT14, LCN2, MAFG, MAPK14, MTF1, PGD, POR, SLC2A3 and SLC40A1) and 3 downregulated (ATM, SLC38A1 and TRIB2). Among these, CAPG, MAPK14, and SLC2A3 overlapped with the 11 core biomarkers, reinforcing their significance. Additionally, 4 FerGs (G6PD, JDP2, PGD, and QSOX1) exhibited stage-dependent expression, suggesting their relevance for prognostic applications. scRNA-seq analyses of PBMCs further delineated the cellular context of FerG regulation, with monocytes emerging as the most prominent regulators. Moreover, transcriptomic profiling of a murine microarray dataset demonstrated that 43 of the 125 DEFerGs identified in human sepsis were also differentially expressed in septic mice. Notably, 12 of the 24 previously highlighted candidate biomarkers (ACVR1B, ALOX5, ATM, CREB5, G6PD, JDP2, LCN2, MGST1, PGD, POR, QSOX1 and SAT1) were conserved across species, showing differential expression in both human and murine datasets. This cross-species consistency underscores the utility of these genes in preclinical model validation. Pathway enrichment analyses revealed that DEFerGs were involved not only in classical ferroptotic processes such as oxidative stress, lipid peroxidation, and iron homeostasis, but also in immune modulation via IL-17, IL-18, and IL-4 signaling, transcription factor networks (FoxO, p53), and pathways of cellular senescence and autophagy. These data extend the mechanistic relevance of FerGs beyond ferroptosis, highlighting their role in systemic inflammatory responses and immune dysfunction during sepsis. Crucially, three of the identified biomarkers (ALOX5, ACSL1 and QSOX1) are currently targetable with pharmacological agents that can modulate ferroptosis. Zileuton, an ALOX5 inhibitor, has demonstrated efficacy in ameliorating neuronal damage and pulmonary dysfunction in murine sepsis models. Triacsin C inhibits ACSL1 and reduces ferroptosis in viral infection models. Ebselen, a known QSOX1 inhibitor, suppresses ferroptosis-related oxidative stress and has shown promise in LPS-induced models. Further exploration of compounds targeting the additional biomarkers identified here may lead to more comprehensive and personalized treatment strategies for septic patients.
Palavras-chave: sepsis,ferroptosis,transcriptomics
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1124702

Application of 3D Zernike Descriptors in Structural Clustering of Antibodies

Autores: Diego da Silva Almeida,Matheus do Vale Almeida,Jean Vieira Sampaio,Andrielly Henriques dos Santos Costa,Aline de Oliveira Albuquerque,EDUARDO MENEZES GAIETA,Geraldo Rodrigues Sartori,João Herminio Martins Da Silva
Apresentador: Diego da Silva Almeida • diego.dealmeida@fiocruz.br
Resumo:
Antibody clustering is a valuable strategy that reduces sample space, optimizes computational analysis pipelines, and reveals relevant biological patterns, such as identifying functionally similar antibodies that interact with the same epitope. Currently, the most widely used methods are based on the sequences and structures. However, highly divergent sequences may still exhibit similar binding conformations, making structure-based methods more suitable as they more directly reflect antibody function. Nevertheless, techniques based on atomic superposition, such as RMSD, have proven insufficiently sensitive for detecting functionally equivalent pairs. In this study, our goal was to evaluate and propose a new strategy for the structural clustering of antibodies using 3D Zernike descriptors, which represent protein surfaces through mathematical functions and eliminate the need for atomic superposition. To the best of our knowledge, this approach is unprecedented in the context of clustering antibodies that bind to the same epitopes. We used a dataset containing 256 functional antibody pairs modeled using AbodyBuilder2, divided into four distinct groups: CDRH3, all CDRs, paratopes, and epitopes. Paratope and epitope information was extracted from the experimental structures, and the surfaces were represented using FP-Zernike, which assesses similarity based on the Euclidean distance between surfaces. As a reference, we employed SPACE2, a specialized antibody clustering software, for comparison. The results revealed striking differences between the two approaches. SPACE2 clustered only 19 pairs with high precision (0.89). In contrast, the Zernike-based method showed superior performance in terms of sensitivity, recovering 74 functional pairs with a precision of 0.80 using paratopes, 43 pairs with a precision of 0.78 considering only CDRH3, 352 pairs from all CDRs (with a precision of 0.57), and 110 pairs when analyzing epitopes, thus achieving the highest precision of the study (0.91). These results demonstrate that our approach can detect more functional pairs than SPACE2, with particularly strong performance in paratope and epitope analysis. The application of 3D Zernike descriptors proved to be more effective than RMSD-based methods, particularly in the functional context, where the surface shape is more relevant than the exact atomic alignment. This strategy enhances the sensitivity of detecting functional antibodies and offers a robust tool for applications in immunoinformatics.
Palavras-chave: Shape-based clustering, paratope, epitope
#1124765

Cross-Species Single-Cell Transcriptomic Atlas Of The Hypothalamus: Reveling Conserved Genes And Communication Patterns

Autores: Victor Jardim Duque,Yuyao Song,João Victor Silva Nani,Irene Papatheodorou,André de Souza Mecawi
Apresentador: Victor Jardim Duque • victor.jr.duque@gmail.com
Resumo:
The hypothalamus is an important brain structure that controls many essential functions for the body’s survival. Its main roles include regulating energy and water balance, as well as other processes like reproduction and blood pressure control. Because of its importance, this region is highly conserved across a species. Despite this conservation, little is known about the extent of these similarities in the transcriptomic profile of each cell type and the patterns of cellular communication conserved across species. Therefore, this study aims to understand the similarities between hypothalamic cell types in different species and the conserved pathways of cellular communication. We integrated over one million cells from single-nucleus RNA sequencing of the hypothalamus from different species: Homo sapiens (N = 433.369, DOI: 10.1038/s41586-024-08504-8), Macaca fascicularis (N = 174.150, DOI: 10.1016/j.cmet.2024.01.003), Callithrix jacchus (N = 83.403, DOI: 10.1126/sciadv.adk3986), and Mus musculus (N = 351.187, DOI: 10.1038/s42255-022-00657-y). After the acquisition of the cells, the genes were harmonized to the human ENSEMBL symbols orthologs and data were integrated using the software ‘Benchmarking Strategies for Cross-Species Integration of Single-Cell RNA Sequencing Data’ (BENGAL). Once converted, the data was integrated by reciprocal principal component analyses (RPCA) in Seurat, using BENGAL pipeline. This integration successfully aligned cells by type rather than species, identifying major populations: astrocytes, endothelial cells, ependymocytes, immune cells, mural cells, neurons, oligodendrocytes, oligodendrocyte precursor cells and tanycytes. Aiming to identify the marker genes for each cell type, we applied Model-Based Analysis of Single-Cell Transcriptomics (MAST), taking species as a statistical covariate. This method revealed more than 200 enriched genes (log 2 fold change > 0.25 and adjusted p‑value < 0.05) per cell type, representing key conserved genes for their characteristics and functionality. For example, the enrichment of the Myelin Oligodendrocyte Glycoprotein (MOG) gene in oligodendrocytes, that is essential for the maintenance of the myelin sheath. Furthermore, some endogenous peptides were also evident (IUPHAR databases), such as the Neuregulin 1 (NRG1) and Proopiomelanocortin (POMC) enriched in the neurons, and Angiotensinogen (AGT) in astrocytes. These findings are also reflected in the enriched pathways (Cluster Profile package), which highlight myelination in oligodendrocytes and synapse-related pathways in the neurons. Aiming to identify the cellular patterns of communication conserved across species, they were inferred through the Ligand-Receptor Analysis Framework (LIANA) package, revealing more than 950 enriched pathways, including well stablished interactions, such as the interaction between APOE and TREM2 in astrocytes and immune cells, and between NRG3 and ERBB3 in neurons and oligodendrocytes. The creation of this comprehensive cross-species hypothalamic atlas has the potential to elucidate the conserved cellular architecture and molecular mechanisms essential for organismal survival. In addition, it provides a valuable resource for evolutionary biology and translational medicine studies, by identifying shared genes and pathways that are shared and conserved among species, enhancing the utility of model organisms for studying human hypothalamic function and dysfunction.
Palavras-chave: Hypothalamus, Cross-Species, Single-Cell, Transcriptomics
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1124785

ParaDB-v2: Shiny app version of ParaDB - manually curated database for the human pathogenic fungi Paracoccidioides spp.

Autores: David Aciole Barbosa,Fabiano Menegidio,Daniela Leite Jabes,Regina Costa de Oliveira,Luiz Roberto Nunes
Apresentador: David Aciole Barbosa • aciole.d@gmail.com
Resumo:
The genus Paracoccidioides comprises thermodymorphic fungi responsible for paracoccidioidomycosis (PCM), a systemic mycosis that affects more than 10 million people in Latin America. Previously classified as only two species, Paracoccidioides is currently projected as five distinct species: P. brasiliensis, Lutzii, P. americana, P. restrepiensis and P. venezuelensis. Genomic sequencing of these five Paracoccidioides species provided an important framework for the development of post-genomic studies with these fungi. The results of a reannotation effort, based on bioinformatics approaches and extensive manual curation, reduced the number of “Hypothetical Proteins” to ~28%, in all Paracoccidioides genomes. These results were compiled in a relational database called ParaDB, centralizing updated protein annotations. Unfortunately, technical problems related to hosting services made ParaDB increasingly unavailable. To overcome such issues, we present ParaDB-v2, a Shiny web-app comprising all data from ParaDB, where users can once again find all the relevant data and tools for Paracoccidioides genomes, with updated annotations. The app is freely hosted and available at shinyapps.io, and was built using dashboard interface layout of datatables, allowing fast searching, filtering and download of thousands of genes and proteins, based on many annotation sources, such as Gene Onthology, InterPro, Pfam and Swiss-Prot. ParaDB-v2 table searches are connected to sequences of each genome, so users can display and/or download single or multiple fasta files, along with partitioned tabular data. Furthermore, a blast search is also available in ParaDB-v2, allowing users to search and compare sequences against each and all Paracoccidioides genomes, control basic blast parameters, display or download blast search alignments and results. The Shiny version of ParaDB will help researchers to interact with genomic information of manually curated data of Paracoccidioides species and serve as a proof of concept on providing omics data through Shiny apps.
Palavras-chave: Paracoccidioides, Genome, Functional reannotation, Database.
#1124806

Unlocking Biosynthetic Potential from Brazilian Manatee Microbiomes through Metagenomics

Autores: Giulio Mendes Braatz,Claudio Benício,JOAQUIM MARTINS JUNIOR,Rodrigo Silva Araujo Streit,Prof.Dr. Mário Tyago Murakami,Gabriela Persinoti
Apresentador: Giulio Mendes Braatz • giuliobraatz@gmail.com
Resumo:
Microbial communities exhibit significant potential to produce a wide range of bioactive compounds, particularly those found in distinct ecological niches. Among the diverse metabolic capabilities of these communities, Biosynthetic Gene Clusters (BGCs) related to the production of secondary metabolites are particularly interesting due to their potential biotechnological applications. Microbiomes from endangered species within Brazilian biodiversity, such as the Amazonian manatee (Trichechus inunguis) and the Antillean manatee (Trichechus manatus manatus), present intriguing yet relatively unexplored sources of BGCs due to their evolutionary adaptations to specialized diets and habitats. Investigating these microbiomes is crucial for biodiversity conservation and unlocking untapped scientific potential.
The present study employed advanced metagenomic approaches to identify and characterize putative BGCs within these microbiomes. Using antiSMASH, which predicts BGCs through the identification of signature protein domains and conserved biosynthetic gene architectures, we initially detected 1,172 putative, mostly novel BGCs in the Amazonian manatee and 1,313 in the Antillean manatee. Subsequent dereplication and clustering based on protein sequence similarity and conserved domains into gene cluster families (GCFs) were performed using BiG-SCAPE, refining these findings to 1,830 unique putative BGCs grouped into 366 GCFs. Ribosomally synthesized and post-translationally modified peptides (RiPPs) emerged as the predominant class, indicating significant potential for novel bioactive compound discovery, including antimicrobials, anticancer agents, and antivirals. Additionally, Desulfobacterota stood out as the phylum with the highest density of BGCs per metagenome-assembled genome.
Currently, efforts are directed toward evaluating the novelty of these BGCs by comparison with experimentally validated databases, such as MIBiG, to uncover unique or previously unexplored biosynthetic capabilities. Selected promising BGC candidates will then undergo heterologous expression in bacterial hosts, enabling functional validation and further characterization.
Our findings highlight how studying microbiomes from endangered Brazilian species can lead to the discovery novel compounds with substantial biotechnological potential.
Palavras-chave: Biosynthetic Gene Clusters, Metagenomics, Manatee Microbiome
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1124820

Transcriptogram-Based Analysis of EMT Dynamics Using Single-Cell RNA-Seq of MCF10A Cells Treated with TGF-β1

Autores: Odilon Julio dos Santos,Rodrigo Juliani Siqueira Dalmolin,Rita Maria Cunha de Almeida
Apresentador: Odilon Julio dos Santos • odilonjulio@ufrn.edu.br
Resumo:
Epithelial-Mesenchymal Transition (EMT) is a dynamic process involved in development, tissue repair, and cancer progression. In this study, we explore transcriptomic changes associated with EMT using single-cell RNA-seq data from MCF10A epithelial cells treated with TGF-β1 across multiple time points. The dataset, derived from a publicly available study, includes two experimental batches (using 10x Genomics kits v2 and v3) with samples collected from days 0 to 8. To address technical variability, gene expression was normalized by total transcript count per cell and batch effects were corrected by adjusting control means. We appliedthe Transcriptogramer package, which organizes gene expression profiles along protein-protein interaction networks, allowing the detection of functional gene clusters and transcriptional patterns over time. Principal Component Analysis (PCA) was performed without scaling to retainbiological variability. We analyzed transcriptogram smoothing with radii of 0 and 30, revealing both local and global expression shifts. Our preliminary results show distinct transcriptional signatures between batches and identify genes potentially associated with intermediate EMT states. These findings support the continuum model of EMT and highlight the importance of normalization in multi-batch analyses. Future steps include transcriptogram reconstruction using principal components and R²-based error estimation to assess dimensionality reduction. This approach offers a high-resolution view of EMT regulation and may support virtual modeling of dynamic tissue processes.
Palavras-chave: Epithelial-Mesenchymal Transition, Single-Cell RNA-seq, TGF-ß1, Transcriptogramer, MCF10A, Protein-Protein Interaction, Batch Effect, PCA, Gene Expression Dynamics
★ Running for the Qiagen Digital Insights Excellence Awards
#1124825

The Extended N-Terminal Domain of VPAC1 Isoform 2 Acts as a Self-Inhibitory Element: Insights from Molecular Dynamics Simulations

Autores: Matheus Henrique Figueiredo Reis,Deborah Antunes,Ingrid Bernardes Santana Martins,Ernesto Cafarena
Apresentador: Matheus Henrique Figueiredo Reis • matheushfreis@gmail.com
Resumo:
G protein-coupled receptors (GPCRs) represent one of the largest and most diverse superfamilies of membrane proteins, serving as crucial mediators in cellular signal transduction pathways. Among these, the VPAC1 receptor stands as a significant member of the secretin (class B1) subfamily, playing essential roles in numerous physiological processes including immune modulation, neuroprotection, and metabolic regulation. This comprehensive study focuses on elucidating the structural and molecular mechanisms responsible for the non-functional characteristics of VPAC1 isoform 2, which exhibits an extended N-terminal domain compared to its canonical counterpart, isoform 1.
Through sophisticated computational methodologies encompassing advanced molecular modeling techniques and extensive molecular dynamics simulations, our research demonstrates that the distinctive 17-residue α-helical insertion in the extracellular domain (ECD) of isoform 2 functions effectively as an endogenous antagonist. The detailed structural analyses reveal that this inserted α-helix adopts a specific conformation that physically occupies the binding pocket normally reserved for the Vasoactive Intestinal Peptide (VIP), the natural ligand of VPAC1. Furthermore, this inserted sequence establishes hydrogen bonds with critical receptor residues that are typically involved in VIP recognition and binding.
Remarkably, sequence analysis shows that the inserted α-helix exhibits significant similarity to certain portions of the VIP peptide itself, suggesting an evolutionary relationship or molecular mimicry mechanism. Our computational investigations additionally demonstrate that the inserted sequence fundamentally alters the conformational dynamics of the extracellular domain, stabilizing it in a configuration that prevents the essential "opening" movement required for proper peptide accommodation and subsequent receptor activation.
These novel findings provide valuable insights into an unusual auto-inhibitory mechanism in GPCRs that has not been previously characterized in detail. The implications of this research extend beyond basic structural biology, potentially contributing to our understanding of receptor diversity in various pathophysiological conditions and opening new avenues for targeted therapeutic interventions aimed at modulating GPCR signaling pathways.
Palavras-chave: Vasoactive intestinal peptide, VPAC1, GPCR, isoforms, Molecular Dynamics
#1124837

Evolution of Sensor Proteins in the Ammonium Transporter (AMT) Family

Autores: Eduardo Pereira Soares,Anacleto Silva de Souza,Raphael Luiz Lobo da Silva Souza,Gilberto Hideo Kaihami,Cristiane Rodrigues Guzzo Carvalho,Robson Francisco de Souza
Apresentador: Eduardo Pereira Soares • eduardo_soares@usp.br
Resumo:
Nitrogen is essential for life, but for many organisms its availability depends on specialized transporters that mediate the uptake and efflux of ammonium (NH₄⁺/NH₃). The superfamily AMT/MEP/Rh, conserved across prokaryotes and eukaryotes, plays a central role in nitrogen homeostasis, but the molecular mechanisms governing transport and responses to varying concentration of ammonium remain debated. To understand how the occurrence of AMT fusions to regulatory proteins impact the functions of the protein in transport and signal transduction we combined comparative genomic analysis and molecular dynamics simulations to elucidate the processes of evolutionary diversification and the functional adaptations of these transporters. To detect all instances of fused and solo AMT/MEP/Rh domains, we performed searches for members of the AMT/MEP/Rh superfamily in the non-redundant protein databases of NCBI using two hidden Markov models, obtained from PFAM and TIGR databases. The search was conducted with the hmmsearch program from the HMMER 3.0 package. Conserved protein domains were identified for all hits using the hmmscan. AMT homologs were classified using hierarchical clusters, based on MMseqs2 default clustering algorithm, and multiple sequence alignments of cluster representatives were built with MAFFT and FAMSA and curated in Aliview. Phylogenetic inference was performed using IQ-Tree and the resulting trees were analyzed using TreeViewer. For other analyses involving datasets, such as taxonomic classification and estimation of relative frequencies and occurrences of domain architectures in the neighborhood and in the entire genome, the Python programming language was used in the interactive IPython environment in conjunction with the ROTIFER (Rapid Open-source Tools and Infrastructure for Data Exploration and Research) toolkit. For molecular dynamics simulations, we used the CHARMM-GUI server to build the system and GROMACS to run the simulations. Phylogenetic reconstruction of 4,618 representative sequences reveals distinct clades (AMT, MEP, Rh) with divergent domain architectures, taxonomic distribution and residue composition. The AMT family contains the major occurrences and most diverse fusions of the three families, containing regulatory domains such as P-II and other common signal transduction domains in bacteria, with GGDEF and Histidine Kinase being the most frequently observed signal transduction domains, after P-II, and with specific clades formed for these domains, revealing new subfamilies. Our data also showed misclassification of homologs previously reported in the literature. Comparative structural dynamics simulations demonstrate that AMT variants exhibit distinct transport mechanisms: while MEP transporters facilitate rapid NH₃ diffusion, AMT with signal transduction domains fusions show slower, conformationally regulated transport, suggesting a dual role in sensing and translocation. Truncation of the histidine kinase domain in AMT disrupts unidirectional flux, implicating this domain in enforcing transport directionality. These findings reveal how structural innovations, including domain fusions and residue substitutions, have fine-tuned transporter and sensor functions across evolution. Our study provides a framework for understanding nitrogen transport regulation and highlights the dynamic interplay between structure, function, and evolution in AMT/MEP/Rh superfamily.
Palavras-chave: Ammonium transporter, domain fusions, signal transduction, comparative genomics, molecular dynamics
#1124857

Impact of Livestock-Associated Antibiotics on Microbial Communities and Resistance Genes in the Pantanal Wetlands

Autores: André Rodrigues de Oliveira,Nelson Kotowski,Alberto Dávila,Rodrigo Jardim
Apresentador: André Rodrigues de Oliveira • andre.rodrigues.oliveira.1999@gmail.com
Resumo:
Antimicrobials in livestock farming go beyond treating infectious diseases and are also used for prophylactic measures and the promotion of weight gain. However, this practice can have significant consequences for ecosystems, due to the excretion of antibiotic residues that affect local microbial communities, compromising the biodiversity and functionality of these ecosystems. In addition, the indiscriminate use of antibiotics contributes to the emergence of resistant bacteria (ARBs) and antibiotic resistance genes (ARGs), which can spread and infect humans. This study was carried out in two lagoons in the Pantanal region of Mato Grosso: in an environmental reserve in the Pantanal of Abobral and another in a cattle ranching area in the Pantanal of Aquidauana. Water column samples were collected near and 10 meters from the floating macrophyte Eichhornia crassipes and stored in 10-liter autoclaved bottles. The samples were filtered through membranes of porosity 1.2μm, 0.8μm, 0.45μm, and 0.22μm. The DNA, extracted with the QIAGEN DNEasy PowerWater Kit, was sequenced by the shotgun method using the Illumina HiSeq-2500 platform. The quality evaluation, cleanliness, assembly, and taxonomic classification of the metagenomic samples were performed with the FastQC, Trimmomatic, Megahit, and Kraken tools, respectively, of the MetaWRAP pipeline in default parameters. The identification of ARGs was performed with the consensus of the DeepArg, Resfam, Resfinder, and CARD tools, both in the clean reads and the assembled contigs. ARGs were identified in all samples, with the Abobral sample close to the macrophyte showing the highest abundance. The comparative analysis also revealed that the Abobral small lake had a greater diversity of ARGs than the Aquidauana small lake. The most abundant resistance gene at all sites was RPOB2, which is associated with multidrug resistance. In addition, several phyla and genera were found, with Pseudomonadota being the dominant phylum, also in the 4 localities.
Palavras-chave: Antibiotic resistance genes, Pantanal, Metagenomics, Livestock farming, Antimicrobial resistance
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1124859

The Genomic and Phylogenetic Characterization of Monocotyledons from the Poaceae Family Reveals a Novel Multigene Group of Chimeric Lectins with High Biotechnological Potential

Autores: Carlos Eduardo Reinaldo Custódio,Antonio Edson Rocha Oliveira,Ana Cristina
Apresentador: Carlos Eduardo Reinaldo Custódio • carloseduardoreinaldo13@gmail.com
Resumo:
Lectins are found in all living organisms and viruses as either soluble proteins or membrane-bound proteins, occurring both as single-domain polypeptides and as part of multidomain arrangements. These proteins are characterized by their specific ability to bind carbohydrates, interacting with glycoconjugates present on the cell surface, which results in a wide range of biological functions. A unique subfamily of lectins found in monocotyledonous plants of the Poaceae family, known as monocot chimeric jacalin-related lectins (MCJs), is formed by the fusion of Jacalin and Dirigent proteins. Studies suggest that MCJs are implicated in defense mechanisms against a broad spectrum of pathogens. Moreover, they may contribute to plant development and tolerance to abiotic stresses, making them promising targets for the development of biotechnological applications, such as transgenic crops with enhanced resistance to pathogens and adverse environmental conditions. Although previously studied, the investigation of MCJs still presents several gaps, such as the number of genes in each species, their genomic locations, gene expression profiles under developmental and stress conditions, carbohydrate-binding specificity among different representatives within the Poaceae family, and the phylogenetic relationships between paralogous and orthologous genes. This project aims to characterize the genomic and phylogenetic profiles of MCJs in the Poaceae family.
The curation of genomes deposited in the NCBI database for the Poaceae family revealed a total of 501 genomes available, corresponding to 89 species distributed across 31 genera. With respect to genome assembly levels, 73 species have chromosome-level assemblies, 12 are at the scaffold level, and 4 at the contig level. The species Aegilops tauschii_strangulata, Brachypodium distachyon, Hordeum vulgare, Oryza sativa, Setaria italica, Sorghum bicolor, Triticum aestivum, and Zea mays from the Poaceae family were selected for preliminary analyses. These species exhibit different levels of genome assembly (contig, scaffold, and chromosome), derived from various sequencing and assembly techniques, with genome coverage ranging from 71× to 234×. The manual gene annotation was performed using the amino acid sequence of the TaJA1 protein (AY372111), from Triticum aestivum, as a “query” to identify MCJ-coding genes, resulting in the identification of 3 to 48 genes in the genomes of the selected species. Phylogenetic analysis suggested the clusterization of these genes into three subfamilies. Two species contained representatives in all three subfamilies, while the remaining species had genes distributed across only one or two subfamilies. Additionally, in species with chromosome-level genome assemblies, such as Aegilops tauschii_strangulata, Sorghum bicolor, and Triticum aestivum, MCJ genes were distributed across multiple chromosomes, with some forming tandem clusters. Notably, Oryza sativa was the only species in which all identified genes were exclusively located on one chromosome. The presence of MCJ genes in all species and their classification into three subfamilies suggest strong evolutionary conservation in the Poaceae family. Gene annotation will be performed in other representatives of the Poaceae family. To elucidate the transcriptomic profile of MCJ genes in these species, publicly available RNA-seq data will be analyzed. Additionally, molecular dynamics analyses will be employed to evaluate differences in carbohydrate-binding patterns among the various MCJ subtypes.
Palavras-chave: Lectins, Genomic, Phylogenetic, Poaceae family
★ Running for the Qiagen Digital Insights Excellence Awards
#1124873

Transfer Learning of Molecular Embeddings for the Prediction of Antimicrobial Synergistic Interactions

Autores: Alex Sanchez Yumbo,Thiago Souza,André Borges Farias,José Eduardo Henriques da Silva,Maria Carolina Sisco,Isabella Alvim Guedes,Dr. Laurent Emmanuel Dardenne,Marisa Fabiana Nicolás
Apresentador: Alex Sanchez Yumbo • alexsy@posgrad.lncc.br
Resumo:
The rise of antimicrobial resistance (AMR) and the slow development of new treatments have increased the need for novel therapies. Synergistic combinations of two or more antibiotics mitigate the resistance due to their potentiated effect and lower required doses. However, the vast combinatorial space makes exhaustive experimental screening infeasible. In this study, we developed a novel sampling strategy to improve the predictions of the minority class, i.e., the synergistic activity of antibiotic pairs against specific bacterial strains, in machine learning models.
We utilized an experimental dataset comprising 1 million compound combinations screened against susceptible and resistant strains of Klebsiella pneumoniae, Acinetobacter baumannii, and Pseudomonas aeruginosa (Tse et al., 2024). Synergy was quantified using the Bliss Independence model, measuring the combined drug effect relative to the expected additive effect (Cantrell et al., 2022). Following the authors' recommendation, we set the Bliss score threshold of 0.3 to categorize drug pairs as synergistic (BLISS > 0.3; 3,784), additive (-0.3 ≤ BLISS ≤ 0.3; 782,714), or antagonistic (BLISS < -0.3, 36,727). Data processing involved standardizing and canonicalizing unique SMILES strings from 27,754 compounds using RDKit, followed by the generation of chemical embeddings from the chemBERTa pre-trained model on 77 million PubChem compounds. We extracted the classification token embedding of each molecule from the final hidden layer and concatenated these to represent each drug pair. Focusing on models for K. pneumoniae strains, we observed a significant class imbalance with additive to synergistic ratios of 183:1 for AR0097 and 167:1 for ATCC 43816.
To address this imbalance, we developed a novel cluster-based undersampling method designed to reduce the majority class while preserving the distribution of the discretized Bliss score and retaining chemical space information. Our method involves first clustering unique compound embeddings and then sampling combinations based on the frequency of each cluster within the combination dataset. Statistical analysis using ANOVA on the Bliss score distribution of the original and undersampled data, across various clustering methods and sampling sizes, showed no significant differences (p > 0.05), supporting the preservation of the target variable's distribution.
We trained LightGBM models with Bayesian hyperparameter tuning and 5-fold cross-validation varying the clustering method and undersampling reduction percentage as well as model-specific parameters. Models trained on the original dataset yielded a recall of 0.41 and an F1-score of 0.45 for synergistic pairs. Our undersampling approach improved recall to 0.61 and balanced accuracy to 0.61, despite a slight drop in precision (F1-score: 0.34).
These results demonstrate that addressing class imbalance is critical for improving the prediction of antimicrobial synergy. Our method enhances the identification of promising drug combinations, contributing to more effective computational strategies in antibiotic discovery.
Palavras-chave: antimicrobials combination, transfer learning, data imbalance, cluster undersampling
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1124908

CHARACTERIZATION OF CIRCULAR RNAS CONTAINING THE E7 GENE IN CERVICAL TUMORS WITH HPV16

Autores: Carolina França Padilha De Araujo,Nicole Scherer,Daniel Andrade Moreira,Mariana Boroni,Miguel Moreira
Apresentador: Carolina França Padilha De Araujo • carolpadilha166@gmail.com
Resumo:
Circular RNAs (circRNAs) are covalently closed, single-stranded RNA molecules that form a continuous loop, lacking 5’ and 3’ ends. It’s circular structure confers resistance to exonuclease activity, resulting in increased molecular stability. CircRNAs have attracted considerable interest due to their potential roles in gene regulation through interactions with microRNAs, genomic DNA and proteins. Recently, circRNAs derived from human papillomavirus (HPV) genome were described. A specific circRNA named circE7, which encompasses the entire E7 ORF (open reading frame), was shown to translate E7 oncoprotein, implicating a potential role in HPV-driven carcinogenesis. However, the biological significance of circE7, particularly in terms of its contribution to oncogene expression and cancer development, remains a subject of ongoing scientific debate. Given the limited understanding of HPV-derived circRNAs, the aims of this study were to: (i) determine the presence and frequency of circE7 in HPV16-positive cervical tumor samples; (ii) identify and characterize other viral circRNAs involving E7 gene but distinct from circE7; and (iii) evaluate potential associations between viral circRNAs (including chimeric forms) and clinical or pathological data of tumors and patients. We analyzed cDNA synthesized from 234 HPV16-positive cervical tumors, obtained from women diagnosed with squamous cell carcinoma or adenocarcinoma at the Brazilian National Cancer Institute (INCA). We used divergent primers targeting the backsplice junction of circE7, allowing exclusive amplification of circular RNAs. Samples showing a single amplicon were subjected to Sanger sequencing (n=32), while those with multiple amplicons underwent next-generation sequencing (NGS; n=138). For NGS data, we applied de novo transcriptome assembly using the Trinity software (v2.15.1). Assembled contigs were analyzed and annotated through alignment using BLAST against public databases including NCBI, UCSC Genome Browser, and PaVE (Papillomavirus Episteme). As a result, PCR amplicons were detected in 170 tumors. CircE7 was identified in 142 samples, other viral circRNAs were found in 100 samples and 86 tumors had chimeric circRNAs formed by the fusion of viral and human genomic sequences, and involving 106 distinct human genes. Interestingly, chimeric circRNAs were more frequently detected in tumors lacking E5 mRNA expression (p < 0.0007) and in those with disrupted E1/E2 genes (p < 0.0247), suggesting a link between chimeric isoforms and HPV genome integration. These findings confirm the high prevalence of circE7 and reveal a diverse landscape of viral and chimeric circRNAs in cervical tumors. Chimeric circRNAs existence may suggest HPV integration events and could contribute to cervical carcinogenesis through neoantigen formation or modulation of host gene expression. Viral circRNAs roles remain unclear, however they represent promising biomarkers and provide novel insights into the molecular complexity of HPV-related cervical cancer.
Palavras-chave: Cervical Cancer, HPV, circular RNA, De Novo assembly
#1124922

Discovery of potential anti-angiogenic compounds using Machine Learning

Autores: Isabel Cristine,PICCIRILLO, E,João Carlos Setubal,Ricardo José Giordano
Apresentador: Isabel Cristine • isabelcristine.cg@usp.br
Resumo:
Angiogenesis is a process that plays a crucial role in both health and disease. It is responsible for the formation of blood vessels from pre-existing ones, with the most well-known mechanism in the literature being mediated by vascular endothelial growth factor (VEGF) and its receptors.1 As a result, anti-VEGF therapies have emerged as a promising approach for treating various conditions, including cancer and retinopathies. However, despite advances in pharmacotherapy, current drugs face challenges in cancer treatment, while cost and administration issues limit their effectiveness against retinopathies.2 Due to these difficulties, the search for new pharmacological alternatives is necessary.
Currently, drug discovery approaches require the screening of large chemical libraries to identify a pharmacological prototype of interest. The in silico approach, using machine learning, enables the screening of these libraries for the selection of compounds in a fast and cost-effective manner.3 The current project aims to curate a database preserving the variability and representativeness of chemical space and train machine learning models to identify non-peptidic organic compounds as potential angiogenesis inhibitors, followed by bench validation through in vitro assays.
Two algorithms were trained and evaluated to identify the best-performing model: the Directed-Message Passing Neural Network (D-MPNN) and Random Forest. The D-MPNN is capable of reconstructing the molecular graph from the SMILES (Simplified Molecular-Input Line-Entry System) representation of each compound, while Random Forest, a robust and interpretable algorithm, utilizes Morgan fingerprints as molecular descriptors to generalize effectively to new data.
The main challenge faced during model training was the high imbalance in the dataset between the values of active molecules against angiogenesis (minority class) and inactive molecules (majority class). This data profile reduces the likelihood of predicting the class of interest, as it is treated by the model as a rare event, completely ignored, or considered noise, causing bias and impairing the model's generalization ability.4 To address this issue different class balancing methods were evaluated, including SMOTE (oversampling), Random Oversampling, and Random Undersampling, as well as molecular clustering to reduce data volume while preserving structural diversity.
Based on the evaluation of ROC and PRC metrics, along with validation using a known database, the best-performing model was applied to a commercial compound database. This process generated a list of 241 compounds, from which 33 were selected based on their high scores and 'drug-like' characteristics. In the next phase, experimental assays will be performed on the top 10 scoring molecules to assess their effects on epithelial and endothelial cells.

References


Cao, Y., Langer, R. & Ferrara, N. Targeting angiogenesis in oncology, ophthalmology and beyond. Nat Rev Drug Discov 22, 476–495 (2023). https://doi.org/10.1038/s41573-023-00671-z

JAIN, Rakesh K. Antiangiogenesis strategies revisited: from starving tumors to alleviating hypoxia. Cancer cell, v. 26, n. 5, p. 605-622, 2014.

STOKES, Jonathan M. et al. A deep learning approach to antibiotic discovery. Cell, v. 180, n. 4, p. 688-702. e13, 2020.

Guan, S., Fu, N. Class imbalance learning with Bayesian optimization applied in drug discovery. Sci Rep 12, 2069 (2022).
Palavras-chave: Drug discovery, Machine Learning, Angiogenesis, Cheminformatics.
#1124925

A Graph-Based Computational Framework to Predict Cross-Reactivity of T Cell Receptors from pMHC Structural Interfaces

Autores: Carlos Daniel Marques Santos Simões,Helder Veras Ribeiro Filho
Apresentador: Carlos Daniel Marques Santos Simões • carlos23001@ilum.cnpem.br
Resumo:
T cell receptor (TCR) cross-reactivity poses a major challenge for T cell-based immunotherapies, such as TCR-T, due to the risk of recognizing self-antigens presented by the Major Histocompatibility Complex (MHC) and inducing autoimmune responses. Current computational tools for predicting such cross-reactivity primarily rely on sequence similarity, with limited integration of 3D structural data. In this work, we propose a novel graph-based method to identify structurally and physicochemically similar regions between peptide-MHC (pMHC) complexes that may induce cross-reactivity. 3D structures of pMHCs interaction surfaces with TCRs, derived from statistical analyses of available MHC complexes with TCR, are represented as graphs, with residues as nodes and spatial interactions as edges. Node attributes include descriptors such as solvent accessibility, depth, and Atchley factors. These attributes are used to search for identical or similar nodes across multiple pMHCs, which are then connected by edges based on spatial information. To identify pMHC regions topologically compatible with TCR–pMHC interfaces, we apply the Breadth First Search (BFS) algorithm combined with graph property descriptors derived from experimentally solved TCR–pMHC complexes. We tested the ability of the BFS to recapitulate graphs of experimental MHC interfaces with TCRs starting from general MHC surfaces, achieving an average recovery rate of 80%. We then validate our tool through the detection of potential cross-reactive regions in solved structures, particularly in classical cases such as the cross-reactivity between MAGE-3 and Titin antigens, and through large-scale validation using modeled structures of sequences associated with cross-reactivity from databases such as IEDB and VDJdb. The tool will be made available as a web server and a Python package, enabling its integration into pipelines for TCR-based immunotherapy and vaccine development, while also enhancing our understanding of the molecular basis of TCR cross-reactivity.
Palavras-chave: T cells, cross-reactivity, structural bioinformatics, graph theory, pMHC, immunotherapy
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1124958

One-Click Drug Design? Not Quite Yet: Lessons from Engineering Experimentally validated EGFR Protein Inhibitors

Autores: João Sartori,Paula Fernandes da Costa Franklin,Ana Carolina Ramos Guimarães,Lucas Machado
Apresentador: João Sartori • joaosartori2@gmail.com
Resumo:
Protein binders - including antibodies, peptides, peptide PROTACs, and protein-drug conjugates - play a major role in therapeutics, yet designing effective binders remains a challenge. Advances in deep learning have introduced generative AI methods such as protein language models and diffusion models, but these often require generating hundreds of candidates to identify a viable binder. Additionally, successful binders must exhibit high expression yields and stability, requiring multi-objective optimization. To evaluate the state-of-the-art in binder design, AdaptyvBio organized a competition where EGFR binder designs were tested in vitro. Here, we describe our approach, which ranked 15th out of more than 100 protein designers. This project aimed to develop and evaluate computational strategies for designing effective EGFR binders, optimizing binding affinity, stability, and expression while adhering to sequence and length constraints. Specifically, binders needed to be at least 10 amino acid edit distance from published sequences and no longer than 250 residues. Three design strategies were developed using a TGFα segment—a known EGFR binder—as the starting point. The first approach combined ProteinMPNN for sequence generation using the scaffold, Rosetta for redesigning the binding region, and Monte Carlo Markov Chain (MCMC) sampling to maximize ESM2 log-likelihood. The second approach focused solely on non-interface optimization, refining non-interface regions while preserving the binding region. The third approach balanced affinity and expression by integrating Rosetta-driven interface redesign with non-interface optimization of ESM2 log-likelihood. AlphaFold3 predictions of the binder-EGFR complex were performed, and binders were ranked based on ESM2 log-likelihood and iPTM scores. Our strategy placed us among the top 15 competitors, producing an EGFR binder with a KD of 4.20e-6 nM. While 95% of designs submitted by all participants were successfully expressed, only 14% bound EGFR, despite being generated using cutting-edge methods. This highlights a key limitation in computational binder design: even with advanced AI-driven strategies, translating high-affinity predictions into functional binders remains challenging. Our results underscore the need for improved approaches to enhance expression and functional success.
Palavras-chave: Protein engineering, EGFR, binder design
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1124987

Multi-omic Single-Cell Profiling of Immunoproteasome Gene Expression and Regulation in Pediatric B-Cell Precursor Acute Lymphoblastic Leukemia

Autores: Ana Carolina Pires e Silva,Gabriela Rapozo Guimarães,Gabriel Fernando Costa Da Fonseca,Mariana Boroni,Mariana Emerenciano
Apresentador: Ana Carolina Pires e Silva • acpiresesilva@gmail.com
Resumo:
B-cell precursor acute lymphoblastic leukemia (B-ALL) is a common hematologic malignancy, particularly among children, characterized by mutational, phenotypic, and genetic alterations that impact cellular differentiation and prognosis. The immunoproteasome complex (IP) is a specialized form of the proteasome that plays a critical role in immune responses and antigen presentation to CD8+ T cells. This proteolytic complex is regulated by inflammatory mediators and cellular stress. Previous research has highlighted its role in tumor cell adaptation and immune evasion in solid and hematologic cancers. However, the relationship between IP expression and clinical progression in B-ALL, particularly across different disease stages remains poorly understood. Our central hypothesis is that IP expression differs across distinct stages and subtypes of pediatric B-ALL, and that these differences may be associated with alterations in other regulatory pathways. The first step is to examine the molecular changes and pathways associated with the IP in B-ALL, focusing on samples from cells at different disease stages (diagnosis, remission and relapse) to identify distinct transcriptomic signatures in public data available data from bone marrow and peripheral blood samples of pediatric B-ALL patients, with a focus on genetic fusion subtypes. The expression of proteasome catalytic genes (PSMB5, PSMB6, PSMB7) and immunoproteasome catalytic subunits β1i, β2i, and β5i (encoded by the PSMB9, PSMB10, and PSMB8 genes, respectively) will be assessed and correlated with cellular composition across disease stages and molecular subtypes. Additionally, to validate the findings in our cohort, a multi-omic approach will be employed using bone marrow samples from infants and children diagnosed with B-ALL, obtained from the Hematology Service of INCA and other collaborating institutions. This will enable an integrated analysis of gene expression and chromatin accessibility at the single-nucleus level. The integration of these complementary strategies is intended to provide a more comprehensive understanding of the regulatory networks involving the immunoproteasome and its contribution to B-ALL progression and resistance to treatment.
Palavras-chave: Immunoproteasome, Leukemia, Single-cell RNA-sequencing, Multi-omics
★ Running for the Qiagen Digital Insights Excellence Awards
#1125000

Genomic Characterization of Paenibacillus sp. 210: A Multifunctional Microbial Resource for Sustainable Crop and Animal Production

Autores: João Victor dos Anjos Almeida,Carlos Miguel Nóbrega Mendonça,Leandro Márcio Moreira,Ricardo Pinheiro de Souza Oliveira,Alessandro Varani,Mauro de Medeiros Oliveira
Apresentador: Mauro de Medeiros Oliveira • mauro.oliveira@unesp.br
Resumo:
The novel bacterial strain Paenibacillus sp. 210, isolated from Brazilian crude oil, exhibits genomic traits that indicate substantial potential for sustainable agricultural applications, particularly in plant growth promotion and enhancement of animal feed nutrition. The genome encodes over 250 carbohydrate-active enzymes (CAZymes), including glycoside hydrolases (GH5, GH43) and polysaccharide lyases (PL9, PL10), which may effectively degrade recalcitrant plant polysaccharides such as cellulose, xylan, and pectin. These enzymes are promising for converting agricultural residues (e.g., crop straw, husks) into nutrient-rich animal feed additives by breaking down lignocellulosic biomass, thereby improving feed digestibility and nutrient accessibility for livestock. Structural validation of selected enzymes, such as GH5 cellulase, confirmed functional homology to well-characterized counterparts from Bacillus spp., supporting their potential application in feed pre-processing. Plant growth-promoting properties identified in the genome include indole-3-acetic acid (IAA) biosynthesis via a complete tryptophan-dependent pathway (ipdC and trp genes), phosphate solubilization mediated by alkaline and acid phosphatases (phoN, phoA genes) and specialized phosphate transporters (phn and pts operons), as well as biological nitrogen fixation through a complete nif gene cluster. Collectively, these genetic capabilities could enhance soil fertility, decrease dependence on synthetic fertilizers, and improve crop resilience in nutrient-limited or degraded agricultural soils. Strain 210 also produces antimicrobial compounds (fusaricidin, paenilan, tridecaptin) effective against pathogens like Fusarium oxysporum and Staphylococcus aureus, suggesting utility as biocontrol agents and natural antibiotic alternatives in agriculture and animal husbandry. The genome also harbors complete biosynthetic pathways for multiple B vitamins (B1, B3, B5, B6, B7, B9, B12), potentially enriching soil microbial communities, enhancing plant nutrient uptake, and serving as microbial-derived nutritional supplements in animal feed formulations. Phylogenomic analysis positioned strain 210 as a novel species (ANI <93% to the closest described relatives) adapted specifically to hydrocarbon-rich environments, indicating potential utility in bioremediation and rehabilitation of contaminated agricultural lands. Importantly, the chromosomal localization and clustering of functional genes related to these beneficial traits, and their absence from regions associated with mobile genetic elements such as prophages, transposons, or genomic islands, suggest strong genetic stability and trait maintenance, a crucial factor for successful practical field applications. The multifunctional genomic profile presented here positions Paenibacillus sp. 210 as an ecologically sustainable microbial tool for integrated enhancement of crop productivity, soil health, and livestock nutrition. Future research should prioritize rigorous field trials to validate its effectiveness in agricultural settings and explore potential synergies within microbial consortia for comprehensive soil-plant-animal health management.
Palavras-chave: Animal feed applications, Antimicrobial biocontrol, Plant growth promotion, Sustainability and waste valorization
#1125016

Identification of prototype antibodies targeting the PD-1 FG loop with potential impediment of interaction with PD-L1

Autores: EDUARDO MENEZES GAIETA,Diego da Silva de Almeida,Jean Vieira Sampaio,Andrielly Henriques dos Santos Costa,Lília Oliveira Santos,Geraldo Rodrigues Sartori,João Herminio Martins Da Silva
Apresentador: EDUARDO MENEZES GAIETA • eduardomenezesgaieta@gmail.com
Resumo:
The PD-1/PD-L1 immune checkpoint pathway is integral to modulating the immune response against tumor cells and is increasingly being recognized as a promising therapeutic target. Monoclonal antibodies targeting PD-1 and PD-L1 have been developed to inhibit this pathway by blocking the interaction between these proteins and are currently approved for the treatment of various cancers. To date, the epitopes of PD-1 recognized by these approved antibodies have been concentrated at the PD-L1/PD-1 binding interface, indicating that these antibodies directly compete with PD-L1. However, in a study by Gao et al. (2019), the FG loop of PD-1 was identified as a potential hotspot for antibody recognition, capable of blocking PD-L1 binding through steric hindrance without directly overlapping the classical interface. Gao et al. (2019) found that, within this loop, residues P130, K131, Q133, and I134 are critical for recognition by novel antibodies. Thus, our study aimed to explore the FG loop region of PD-1 to identify naive antibodies capable of binding to this loop, thereby preventing formation of the PD-1/PD-L1 complex. To this end, PD-1 was subjected to virtual screening using an automated pipeline developed by our group. This is a modular pipeline that can be applied to any target of interest. It uses an in-house naive antibody database, combining different scoring functions (REF-15 and Haddock) to increase the success rate of selecting reliable candidate poses. To increase the chance of finding a true positive, the best docking poses were subjected to molecular dynamics simulations with a temperature plateau that distinguishes poses with unstable interfaces from those with greater stability. For this initial screening, 290 antibodies comprising variants with lambda and kappa light chains were used. For each docking-derived pose, three replicate simulations of 75 ns were performed. As a result, three naive prototype antibodies were identified: SFBB083, SFBB198, and SFBB220, with average i-RMSD (i-RMSD) values of 2.55 Å, 3.57 Å, and 1.73 Å, respectively, throughout the entire simulation. Structural analysis revealed that SFBB083 interacts with the FG loop exclusively through its heavy chain, with CDR-H3 forming hydrogen bond interactions with residues L99 and E101. In contrast, SFBB198 and SFBB220 use both light and heavy chains to establish hydrogen-bonding interactions with these critical FG loop residues. These findings demonstrate that (i) all selected prototype antibodies possess a kappa light chain, a feature also observed in two previously crystallized antibodies known to target this epitope, suggesting a potential structural preference of the PD-1 FG loop for kappa light chain-containing antibodies. (ii) Additionally, the identified antibodies formed multiple hydrogen bond interactions with critical FG loop residues and exhibited stability under thermal perturbation during MD simulations. (iii) Finally, these results highlight the efficiency of our pipeline in identifying prototype naive antibodies capable of forming stable binding interfaces with a structural epitope. These characteristics suggest that the identified antibodies represent promising candidates for further affinity maturation and structural optimization, aimed at developing novel antibodies capable of inhibiting the PD-1/PD-L1 checkpoint pathway. Currently, maturation is performed using the Ab-Seldon pipeline by Sampaio et al. (2024).
Palavras-chave: Naive Antibody, FG loop of PD-1, Molecular Docking, Immune Checkpoint, Molecular Dynamics Simulations.
#1125023

Comparative Topological Analysis of Transcriptional and Post-transcriptional Regulatory Networks (TPTRNs) in Pathogenic Bacteria

Autores: Thiago Souza,Alex Sanchez Yumbo,André Borges Farias,Maria Carolina Sisco,Maiana de Oliveira Cerqueira e Costa,Diogo Antonio Tschoeke,MARISA FABIANA NICOLÁS
Apresentador: Thiago Souza • thiagoms@posgrad.lncc.br
Resumo:
Comparative Topological Analysis of Transcriptional and Post-transcriptional Regulatory Networks (TPTRNs) in Pathogenic Bacteria
sRNAs are small non-coding RNA (ncRNA) molecules transcribed from intergenic regions or processed from messenger RNAs (mRNAs), acting as post-transcriptional regulators by preferentially binding to target mRNAs and modulating their stability and/or translation. Advances in high-throughput sequencing, along with in vitro approaches for mapping sRNA–mRNA interactions, have highlighted the broad regulatory impact of sRNAs on cellular metabolism. These datasets enable the integration of sRNA-mediated regulation into traditional Transcriptional Regulatory Networks (TRNs), which include transcription factors (TFs), their binding sites (TFBS), and protein-coding genes (CDSs). The resulting Transcriptional and Post-Transcriptional Regulatory Networks (TPTRNs) capture regulatory layers at both transcriptional and post-transcriptional levels.
In this work, we collected and standardized curated data from the literature and public databases for Salmonella enterica, Escherichia coli, and Klebsiella pneumoniae, constructing TPTRNs for each species and storing them in a Neo4j graph database. We then performed comparative analyses of topological properties, network motifs, modular structure, and other graph-based metrics to assess similarities and differences across species. These analyses reveal distinct regulatory patterns and help clarify the interplay between TFs and sRNAs in pathogenic bacteria — results that contribute to a broader understanding of bacterial gene regulation and are presented in this study.
Palavras-chave: RNA interactome, Regulons, Prokaryptes, Regulatory Networks, Graph Database, Network Analysis
#1125024

BIOPROSPECTING AND PHYLOGENETIC ANALYSIS OF GENES AND PROTEINS FROM Schwanniomyces Polymorphus FOR THE CONSTRUCTION OF GENETICALLY MODIFIED ORGANISMS FOR BIOREMEDIATION APPLICATIONS

Autores: Antonio Marcio Barbosa Junior,Victor Henrique Santos França,João Augusto Nascimento Conceição
Apresentador: Antonio Marcio Barbosa Junior • amjunior@academico.ufs.br
Resumo:
This study aimed to bioprospect enzymes from Schwanniomyces polymorphus and perform in silico analyses to characterize genes and proteins with biotechnological potential, especially for use in bioremediation. The research focused on identifying candidate sequences for use in genetically modified organisms (GMOs), utilizing standardized plasmid-based expression systems to enhance the expression of enzymes such as proteases in yeast models. These systems simplify the cloning and manipulation processes, making them practical tools for both research and industrial applications. Initial analyses targeted the ribosomal DNA (rDNA) region of S. polymorphus, focusing on the D1/D2 domains of the 26S rRNA subunit. BLAST (NCBI) and UNIPROT databases were used for gene and protein identification. MAFFT and NJ/UPGMA phylogenetic methods, supported by the MEGA software, were employed to group sequences, and Uniprot tools were used for protein modeling. Additional genome annotation was performed using Stingray Galaxy and Fgenesh platforms. All Digital Sequence Information (DSI) was registered in the SISGEN system (registration code: A94A127). BLAST analysis revealed high sequence similarity between S. polymorphus and related species such as S. occidentalis, S. vanrijiae, and S. etchellsii. Nine nucleotide sequences with annotated and functionally classified proteins were selected as targets for plasmid vector construction. Phylogenetic analysis revealed three main gene clusters. The first contained only the bgl1 gene, encoding beta-glucosidase. The second cluster was subdivided into: (a) J07 and amy1, both coding for alpha-amylase 1 (similarity range 0.3–0.8%), and (b) hak1 and D0X5-1, associated with potassium transport and fructofuranosidase (0.2–0.6% similarity). The third cluster included: (a) gam1 and sod1 (glucoamylase 1 and zinc superoxide dismutase, 0.1–0.3% similarity), and (b) leu2 and xyl1 (3-isopropylmalate dehydrogenase and xylose reductase, 0.1–0.4% similarity). Taxonomic analysis using Kraken 2 and Krona confirmed 100% homology with the Ascomycota phylum across 221–278 reads. FGenesh predicted nine transcription-controlled regions in all genes, associated with enzymes of biotechnological interest. Following the phylogenetic screening, sequence ENA HQ166039.1 and UniProt entry E5KJ07_SCHOC (encoding alpha-amylase) were selected for plasmid design. Protein modeling via AlphaFold identified three binding sites. Using the ApE Plasmid Editor, start and stop codons were defined along with seven forward and eleven reverse primer binding sites. The restriction enzyme BsaAI (position 739) showed 99.7% similarity in the cleavage site, confirming compatibility for plasmid construction. In conclusion, this study successfully identified several enzymes from S. polymorphus with potential for biotechnological and environmental applications. The integration of these enzymes into expression vectors can support the degradation of petroleum-derived hydrocarbons, offering sustainable solutions for pollution control and ecosystem preservation.
Palavras-chave: Debaryomycetaceae, plasmid, microbial phylogenesis
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125025

Gene-based rare variant analysis reveals a novel genomic locus associated with sporadic luminal breast cancer susceptibility

Autores: Marco Antonio Campanário,Ana Carolina Rodrigues,Michelle Orane Schemberger,Beatriz Rosa De Azevedo,Bruno Janke do Nascimento,Valter Antonio de Baura,Emanuel Maltempi de Souza,Enilze M.S.F. Ribeiro,Daniela F. Gradia,Hellen Geremias dos Santos,Jaqueline Carvalho de Oliveira,Fabio Passetti
Apresentador: Marco Antonio Campanário • MACSCAMPANARIO@GMAIL.COM
Resumo:
In Brazil, genomic research on breast cancer (BC) has primarily focused on hereditary and familial cases, which account for only ~15% of all diagnoses, while sporadic BC remains less explored. Although BC risk is recognized as heritable, a key challenge in sporadic cases is the issue of missing heritability, as most patients do not carry high-penetrance genetic variants in known cancer susceptibility genes. Part of this gap may be explained by rare genetic variants, which are harder to detect and often overlooked by standard genomic approaches such as genome-wide association studies (GWAS). Our study used a gene-based approach to investigate the association between rare germline single-nucleotide variants (SNVs), and luminal breast cancer (LBC) susceptibility. To perform this analysis, whole-exome sequencing (WES) data were obtained from peripheral blood samples of 89 women (mean age: 59.3 years) from Curitiba (Paraná, Brazil), all diagnosed with luminal breast cancer (ER-positive/HER2-negative), and 94 women without comorbidities (mean age: 81.8 years) selected as controls. Data preprocessing and variant calling were based on the GATK Best Practices. SNVs and INDELs were annotated with ANNOVAR to assess genomic context, functional impact, clinical classification and population frequency. Genotype quality control was performed using PLINK, and population structure was inferred through multidimensional scaling (MDS) and ADMIXTURE analysis. Our rare, high-impact SNVs dataset included only missense SNVs, and they were considered rare if their minor allele frequency (MAF) was below 5% both in our case cohort and in at least one public population database. Genes harboring at least three rare, high-impact SNVs were selected to be evaluated by Optimal Sequence Kernel Association Test (SKAT-O), a hybrid regression-based method that combines burden and variance-component tests to assess aggregate effects of rare variants in a gene or region towards the trait of interest. Age and the first three MDS dimensions were included as covariates in the model. Genomic control was assessed by calculating the genomic inflation factor (λ) from the distribution of SKAT-O p-values. After quality control, 72 cases and 93 controls remained for analysis. In total, 410,783 variants were called, of which 13,171 rare, high-impact SNVs across 1,292 genes were tested with SKAT-O. A gene related to RNA cap synthase activity showed a putative association with LBC susceptibility (genomic p-value = 1.443 × 10⁻⁴; FDR = 0.186), harboring six rare, high-impact independent (LD < 0.2) SNVs. These variants appeared in different heterozygous combinations in 9 of the 72 individuals with LBC. Two SNVs are phosphovariants predicted to disrupt phosphorylation sites, potentially affecting protein post-translational regulation. Additionally, other 4 SNPs have been reported at higher frequencies in somatic tissues from breast, lung, and colon tumors, with risk allele frequency (RAF) ~48.5%, compared to the germline average RAF of 0.81%. Moreover, one of the SNPs is considered a cancer vulnerability variant in the context of loss of heterozygosity (LOH). Given the protein’s role in spliceosomal assembly and telomerase maintenance, its gene may represent a novel susceptibility locus for LBC, but further replication is needed. Financial support: CAPES, Fiocruz, CNPq and Fundação Araucária.
Palavras-chave: luminal breast cancer, rare SNVs, SKAT-O
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125028

PHYLOGENETIC ANALYSIS OF GENES AND PROTEINS IN Enterococcus faecalis INVOLVED IN ORAL BIOFILM FORMATION

Autores: Antonio Marcio Barbosa Junior,ALICE LAÍS BARBOZA SEVERO,Gabriel Silva
Apresentador: Antonio Marcio Barbosa Junior • amjunior@academico.ufs.br
Resumo:
Enterococcus faecalis is a Gram-positive, facultative anaerobic bacterium frequently associated with persistent endodontic infections and failures in root canal treatments. Its resilience in hostile environments, such as treated root canals, is largely attributed to its capacity to form complex biofilms and to survive under conditions of high alkalinity and nutrient limitation. The colonization and pathogenicity of E. faecalis are facilitated by specific adhesion proteins and virulence genes, presenting a notable clinical challenge. This study aimed to explore the phylogeny of genes and proteins involved in biofilm formation in E. faecalis. Various in silico tools were utilized, including BLAST for sequence alignment, MAFFT for multiple sequence alignment, and phylogenetic tree construction using NJ/UPGMA methods in MEGA software. Protein modeling was conducted via UniProt, and gene analysis and functional annotation were performed using Stingray Galaxy and Fgenesh. A total of 25 genes and 9 proteins—plus 16 predicted elements—were identified as related to biofilm formation. Key proteins included surface-associated proteins, FsrB, serine protease, aggregation substance, biofilm regulatory elements, and minor pilus subunits. While all proteins were functionally annotated in UniProt, many lacked clearly defined structural morphology. Gene sequences were confirmed through BLAST searches in the NCBI database. Phylogenetic analysis revealed three major gene clusters. Cluster 1 included ym/yH, esp, and fsrB, with sequence similarities ranging from 99.5% to 99.8%. Cluster 2 comprised esp and spreE, and Cluster 3 included ace, gelE, asa1, lpxtg-prgB, brpA, and brpB, showing similarities between 99.3% and 99.7%. Protein phylogeny revealed high similarity rates, ranging from 94.44% to 99.88%. Analysis using Stingray Galaxy with Kraken2 and Krona identified between 112 and 329 sequence reads with 100% homology to the Firmicutes phylum. Fgenesh revealed nine transcription-regulated regions among the identified genes, linked to the expression of biofilm-related proteins, surface proteins, proteases, and virulence factors. Further analysis targeted the esp and brpA genes for ORF and siRNA prediction. Using checktrans, no orthologs or confirmed gene expression were detected. With getorf and geecee tools, six ORFs and a breakpoint codon were identified in esp (GC content: 37%), while twelve ORFs, one breakpoint codon, and 42% GC content were detected in brpA. In the siRNA analysis, eleven potential sites were identified in brpA, but none in esp. These findings highlight the brpA gene as a model candidate for further research. Despite the clinical relevance of E. faecalis, comprehensive genomic and proteomic studies remain limited. Expanding phylogenetic and molecular modeling analyses is essential to uncover the mechanisms driving biofilm formation and develop targeted therapies for endodontic and periapical infections.
Palavras-chave: Enterococcaceae, bacterial aggregates, microbial clustering
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125029

Application of Machine Learning Models in the Detection of HAM/TSP and Its Association with Intestinal Parasitosis Infections in People Living with HTLV

Autores: Marília Gabriela Barbosa da Silva,Laryssa Bandeira de Melo Silva,Matheus Azevedo Bomfim,Gabriel Freitas Araújo,Patricia Muniz Mendes Freire de Moura,José Anchieta de Brito,Paula M. Magalhães,ELIANE CAMPOS COIMBRA,Adauto Barbosa Neto,João Pacifico Bezerra Neto
Apresentador: Marília Gabriela Barbosa da Silva • marilia.gabrielabarbosa@upe.br
Resumo:
The Human T-cell Lymphotropic Virus (HTLV), identified in 1980, was the first isolated human retrovirus, being associated with diseases such as Leukemia/Adult T Cell Lymphoma (ATL) and HTLV-Associated Myelopathy/Tropical Spastic Paraparesis (HAM/TSP). HTLV-1 transmission occurs vertically and horizontally. The virus preferentially infects CD4+ T lymphocytes, integrating into the host's DNA. This integration compromises the immune response, affecting immune pathways such as Th1 and Th2. As a result, infected individuals become more vulnerable to opportunistic infections, such as strongyloidiasis. In symptomatic cases, such as in HAM/TSP, significant immune dysregulation is observed, with changes in cytokine production, abnormal eosinophil activation, and decreased in antibodies such as IgE. These changes interfere with the response against parasites, facilitating gastrointestinal infections, especially by helminths. therefore, the study aimed to investigate the association between HAM/TSP and gastrointestinal parasitic infections, using Machine Learning (ML) techniques to predict the most severe outcomes of the disease. The research was approved by the Ethics Committee (CAAE: 57785822.3.0000.5192), according to Resolution CNS No. 466/2012. Patients treated regularly at the Infectious and Parasitic Diseases Service of Oswaldo Cruz University Hospital were recruited. After signing the consent term, biological samples were requested and medical records were analyzed for history of parasitic infections and HAM/TSP diagnoses. After pre-processing the variables, they were used to train seven machine learning models. The results were analyzed using ROC curves, along with accuracy, recall, and precision metrics. Of the 37 patients analyzed, six were infected with Strongyloides stercoralis, one with Ascaris lumbricoides, one with Endolimax nana, and one with Entamoeba coli. In terms of ML application, the results showed that the Gradient Boosting (GB) and Random Forest (RF) classifiers had the best performance across all evaluated metrics (1.000), which may be indicate of overfitting. In contrast, models such as Logistic Regression (LR), Naive Bayes (NB), and Support Vector Machine (SVM) demonstrated low accuracy. The Decision Tree (DT) and Multilayer Perceptron (MLP) algorithms performed poorly across several metrics. Furthermore, the frequency chart showed that 91.7% of patients do not have HAM/TSP, while 8.3% are affected by the condition, indicating a relatively low prevalence of the disease, which may contribute to overfitting and prediction errors. These findings indicate that the low prevalence of the HAM/TSP condition, combined with the limited number of cases, makes accurate prediction difficult and highlights the need for larger and more diverse samples to improve the robustness of predictive models. The inclusion of additional data and the exploration of alternative approaches may enhance predictive accuracy and contribute to a better understanding of the relationship between HTLV, HAM/TSP, and parasitic infections. The correlation between HAM/TSP and parasites indicates that the immune dysregulation caused by HTLV renders patients more vulnerable to infections such as Strongyloides stercoralis. This susceptibility is related to changes in the regulation of T lymphocytes and in the balance of cytokines.
Palavras-chave: Virus, Immunology, Myelopathy, Bioinformatics.
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125047

A Deep Learning-Based Classification Approach for Identifying Transcription Factor Families Binding Regions in Plant Genomes

Autores: Lorrana Verdi Flores,Renan Terassi Pinto,Renato Ramos da Silva,Joaquim Quinteiro Uchoa,Luciano Vilela Paiva
Apresentador: Lorrana Verdi Flores • lorrana.vf@gmail.com
Resumo:
The interaction between transcription factors (TFs) and cis regulatory elements (cREs) is critical for understanding gene regulatory networks and enabling the prospection of biotechnologically relevant genes, as TFs play key roles in responses to environmental factors such as biotic and abiotic stresses. Most studies focus on developing machine learning models for individual TFs, predominantly in the model species Arabidopsis thaliana, limiting applications in complex genomes. This study proposes an approach by developing classification models for cREs based on TF families, leveraging the similarity of binding sites among family members to improve cross-species generalization. In this context, 866 ChIP/DAP-seq datasets (PCBase2.0: 298; PlantCistromeDB: 568) were collected spanning 53 TF families (PlantTFDB) and six species: A. thaliana (787), Glycine max (39), Zea mays (34), Solanum lycopersicum (4), Oryza sativa (1), and A. lyrata (1). Binding peaks were processed to extract 150 bp regions centered on enriched motifs identified by MEME/FIMO, while background sequences (intergenic, intronic, and UTR regions) generated balanced negative classes (1:1 ratio). DNA sequences were converted into 3-mer k-mers and encoded as one-hot matrices (1x12x148). The data were split into training (60%), validation (20%), and test (20%) sets. A hybrid CNN-Transformer architecture was set: parallel convolutional layers (3x3/32, 5x5/64, 7x7/128, ReLU) extracted multiscale features. A 3x3 max-pooling (stride=1) and dropout (0.2) preceded a 4-head 256-dimension transformer capturing global dependencies. Subsequent 1x3 convolutions (128 filters, ELU, BatchNorm), 1x4 max-pooling (stride=1x2), dropout (0.2), and flattening led to a 2-unit dense layer with softmax classification. Pooling layers and dropout were used to reduce overfitting. The models were trained using binary cross-entropy loss, Adam optimizer (batch size 32), and early stopping based on validation performance. The models demonstrated robust classification accuracy for 25 transcription factor families with ≥50,000 training instances, achieving a mean F1-score of 0.95 ± 0.04 and Matthews Correlation Coefficient (MCC) of 0.91 ± 0.08. To evaluate cross-species generalizability, independent tests were conducted on the BZIP family, comparing performance across distinct training datasets: A. thaliana-only training (M1): Moderate generalization to G. max (MCC: 0.86) but complete failure for Z. mays (MCC: 0.0); A. thaliana + G. max training (M2): High accuracy for G. max (MCC: 0.96); A. thaliana + Z. mays training (M3): Marginal improvement for Z. mays (MCC: 0.08); Multi-species training (A. thaliana + G. max + Z. mays, M4): Sustained high accuracy for G. max (MCC: 0.96) but minimal gain for Z. mays (MCC: 0.1). These results demonstrate the model’s potential for species like G. max. Still, its poor performance on Z. mays suggests challenges tied to genomic complexity or training data bias. While cross-species data integration is essential for broader applicability, current datasets remain heavily skewed toward A. thaliana. Grouping TFs by families is a computationally efficient and scalable strategy but requires adjustments for complex genomes. This framework may accelerate the discovery of regulatory networks and biotechnological targets in non-model species, pending further refinement and diverse data generation.
Palavras-chave: Transcription Factor Binding Site, Plant Cistrome, Transcription Factor Families, Deep Learning
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125048

Genetic Profile of Antimicrobial Resistance in Acinetobacter baumannii: An In Silico Study of Genomes and Plasmids

Autores: Khaylann Batista De Lima,Daniela Santos Pontes
Apresentador: Khaylann Batista De Lima • khaylannbl@gmail.com
Resumo:
Acinetobacter baumannii is a Gram-negative bacterium from the ESKAPEE group, recognized by the World Health Organization (WHO) as a priority nosocomial pathogen due to its multidrug resistance. This study aimed to characterize the resistome of A. baumannii by analyzing 537 complete genomes and plasmids deposited in the National Center for Biotechnology Information (NCBI), predominantly from clinical human isolates. The Resistance Gene Identifier (RGI) tool was used to detect antibiotic resistance genes (ARGs), identifying a total of 227 distinct ARGs in the genomes analyzed. According to the dataset analyzed, most chromosomal ARGs were associated with enzymatic inactivation mechanisms, followed by efflux pump-related genes. The most frequent genes were blaADC (93.8%) and blaOXA (82.44%), both conferring resistance to β-lactam antibiotics. The high genetic variability observed in these genes may influence the bacterium's resistance spectrum. Among efflux pump mechanisms, the AmvA gene (98.7%) and genes related to the AdeIJK (81.53%) and AdeFGH (59.23%) systems were notably prevalent, suggesting that efflux represents a recurrent and widely distributed resistance mechanism in A. baumannii. Other ARGs such as sul1 (44.6%), parC (97.3%), and msrE (39.3%) were also detected, associated with target substitution, target alteration, and target protection, respectively. LpsB was the only gene linked to reduced permeability, with a high frequency (97.44%). Among 24 antibiotic classes, ARGs were most frequently associated with cephalosporins, followed by carbapenems and penams. All carbapenem-associated genes were also linked to resistance to penams, possibly due to the cross-action of β-lactamases such as blaOXA. Out of the 537 genomes analyzed, 79.14% harbored at least one plasmid, and 29.16% of the 840 plasmids examined carried ARGs. The APH(3')-VIa gene (6.9%) was the most frequent and is associated with the drug inactivation or modification mechanism in the examined plasmids. Efflux pump and target alteration mechanisms were also significant, with highlights including tetB (2.02%) and armA (2.26%). Other mechanisms accounted for less than 3% of identified genes. Plasmid resistome analysis revealed ARGs primarily associated with penams (33.33%), carbapenems (26.47%), and aminoglycosides (21.56%). Although the presence of ARGs indicates genetic potential for resistance, these findings do not confirm phenotypic resistance, as no susceptibility tests were performed. Gene identification indicates that A. baumannii harbors the genetic determinants required for resistance, with their expression potentially modulated by regulatory and environmental factors. These preliminary analyses reveal the diversity of resistance mechanisms, while the ongoing mobilome assessment will provide further insights into the dynamics and dissemination of ARGs. The identified genes underscore the urgency of more targeted control strategies and could inform future research aimed at identifying potential therapeutic targets in A. baumannii. These findings have significant implications for combating antibiotic resistance and are consistent with the One Health approach, which integrates human, animal, and environmental health.
Palavras-chave: Resistome, Bioinformatic, Antibiotic Resistance Genes, Beta-lactamases, Efflux pumps, Plasmid-borne
#1125061

Transcription Factor Expression Analysis in Gastric Cancer in the State of Pará

Autores: Kauê Sant'Ana Pereira Guimarães,Jéssica Manoelli Costa da Silva,Ronald Matheus da Silva Mourão,Valéria Cristiane Santos da Silva,Diego Pereira,Fabiano Moreira,Paulo Assumpção
Apresentador: Kauê Sant'Ana Pereira Guimarães • kauesant.ana0804@gmail.com
Resumo:
Gastric cancer (GC) remains a significant global health issue, characterized by late-stage diagnosis and complex molecular heterogeneity, contributing to poor prognosis and treatment challenges. Transcription factors (TFs) are proteins crucial for regulating gene expression, often showing aberrant expression in cancers, thereby influencing tumor progression, metastasis, and therapeutic responses. Given their regulatory roles, TFs represent promising biomarkers and potential targets for novel therapeutic strategies. The objective of this study was to evaluate the differential expression of transcription factors in gastric adenocarcinoma compared to healthy normal gastric mucosa from patients in the state of Pará, Brazil. This study aims to identify TFs significantly associated with gastric cancer, potentially serving as biomarkers for early detection and therapeutic targeting. Total RNA samples were isolated from 145 gastric adenocarcinoma tissues and matched healthy gastric mucosa collected at HUJBB, adhering to ethical standards (CAAE 47580121.9.0000.5634). RNA sequencing was carried out utilizing the TruSeq Stranded Total RNA Library Prep Kit with Ribo-Zero Gold on the NextSeq® platform, ensuring high-quality and comprehensive transcriptomic data. Bioinformatic analyses involved initial quality checks using FastQC, removal of adapters and low-quality sequences via Trimmomatic v0.39, and transcript quantification through Salmon v1.5.2. Differential expression analysis was conducted using DESeq2 in R v4.1.0 to robustly identify transcription factors significantly dysregulated between cancerous and normal tissues. Our analyses identified numerous transcription factors exhibiting significant differential expression. Principal Component Analysis (PCA) and hierarchical clustering via heatmaps distinctly separated gastric cancer tissues from healthy controls, indicating strong transcriptional differentiation. The volcano plot revealed notably upregulated TFs, including ZNF860, POU3F2, and REL, alongside significantly downregulated factors such as NFE2L2 and ZNF513. Further, Receiver Operating Characteristic (ROC) analysis identified transcription factors with high diagnostic accuracy (AUC ≥ 0.95), underscoring their potential as reliable biomarkers. The heatmap visualization corroborated these findings, highlighting clear expression discrepancies and reinforcing the biological relevance of TF dysregulation in gastric cancer. These findings provide critical insights into transcriptional regulation specific to gastric carcinogenesis within the Pará population, illuminating region-specific molecular signatures of gastric cancer. The transcription factors identified herein, especially those demonstrating high discriminative power through ROC analyses, hold substantial promise as biomarkers for early diagnosis, prognosis evaluation, and targeted therapeutic interventions. This research significantly enhances our understanding of the transcriptional landscapes influencing gastric cancer, fostering advancements in personalized oncology.
Palavras-chave: Gastric Cancer, Transcriptomics, Bioinformatics, Transcription Factors
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125084

Literature review and distribution analysis of virulence factors in Prescottella and Rhodococcus bacteria

Autores: João Pedro Vasques da Conceição,Fábio Mota
Apresentador: João Pedro Vasques da Conceição • jpvasquesc@gmail.com
Resumo:
Prescottella is a genus of Gram-positive bacteria that emerged from a phylogenetic branch originally composed of Rhodococcus species, whose members were formerly classified within that genus. The genus includes the species Prescottella equi, an opportunistic pathogen which causes severe respiratory infections in animals and in immunocompromised humans. Research into its infective capacity has focused largely on genes carried by virulence plasmids, which enable survival within macrophage lysosomes. However, these plasmids are not strictly necessary for infection, as there are reported cases of human infections caused by plasmid-free strains of P. equi. Few studies have been carried out to identify other virulence factors in this pathogen. In the present work, we conducted a systematic literature review to identify experimentally validated virulence factors in P. equi and analysed their distribution in different Prescottella and Rhodococcus isolates. The review was carried out following the PRISMA 2020 guidelines. To find relevant studies, we first queried the PubMed database using the terms ((virulence AND factors) OR virulence factors) AND ((rhodococcus) OR (prescottella)). This query yielded 203 studies. These were manually screened to identify studies where virulence factors were experimentally validated. This screening yielded only 14 relevant studies. Across these studies, 59 unique virulence-related genes were identified. We then wrote Python scripts to survey the presence of these genes in over 300 genomes of clinical and non-clinical Prescottella and Rhodococcus isolates available in the RefSeq database. Our analysis revealed that 49 of the 59 genes occur in both clinical and non-clinical strains. These findings suggest that most genes labeled as virulence factors may actually serve broader and essential roles in the survival and fitness of these bacteria, regardless of their environmental or host context.
Palavras-chave: vap, nitrate reductase
★ Running for the Qiagen Digital Insights Excellence Awards
#1125097

Genomic analysis of Serratia sp. reveals potential CDI system's role in antagonistic activity against Trypanosoma cruzi

Autores: Jeniffer Evangelista da Fonseca,Daniele Pereira de Castro,Fabio Mota
Apresentador: Jeniffer Evangelista da Fonseca • jeniffer_evangelista@hotmail.com
Resumo:
The protozoan Trypanosoma cruzi, the etiological agent of Chagas disease in humans, colonizes the intestinal tract of hematophagous insects, including the triatomine vector Rhodnius prolixus. Previous studies have shown that the bacteria Serratia sp. RPA1 and Serratia sp. RPH1, isolated from the intestinal microbiota of R. prolixus, exhibit in vitro antagonistic activity against T. cruzi. To elucidate the mechanisms underlying this interaction, we conducted comparative genomic analyses using Artemis/ACT software to identify genes associated with cellular adhesion and parasitic lysis. The data for these analyses were obtained through nucleotide alignment using blastn or tblastx. Our investigation revealed the presence of genes belonging to the CDI (contact-dependent inhibition) system, a protein machinery that confers competitive advantages in interbacterial interactions. This system includes components with signal sequences directing translocation across the bacterial membrane and subsequent protein maturation in the periplasm. The identified genes encode predicted hemagglutinins and hemolysins, which were further analyzed for functional domains using the Conserved Domain Database (CDD). Sequence comparisons with experimentally characterized systems, such as the class I CDI system in Escherichia coli strain 93 (EC93), revealed structural and functional similarities. Three-dimensional structural predictions generated via AlphaFold 3 highlighted conserved motifs in these proteins while distinguishing them from homologous sequences in Serratia sp. DB11, a non-antagonistic strain. These structural divergences suggest evolutionary adaptations that may enhance the lytic competence of Serratia sp. RPA1 and RPH1 against T. cruzi in vitro experiments, providing insights into potential mechanisms for controlling Chagas disease vectors. Additionally, the comparative analysis underscores the importance of understanding interbacterial interactions in the gut microbiota of disease vectors.
Palavras-chave: Serratia sp., Trypanosoma cruzi, CDI system, hemagglutinins, hemolysins.
#1125122

EVOLUTIONARY HISTORY OF THE METHIONINE PATHWAY IN TRYPANOSOMATIDS

Autores: KARINA PAOLA BUITRON BUITRON,Percy Omar Tullume Vergara,Ariel Mariano Silber
Apresentador: KARINA PAOLA BUITRON BUITRON • kpbuitron@icb.usp.br
Resumo:
The metabolism of sulfur-containing amino acids such as cysteine (Cys) and methionine (Met) play crucial roles in protein synthesis, biomolecule methylation, and the biosynthesis of polyamines and glutathione. In Trypanosoma cruzi, Cys can be synthesized through the transsulfuration pathway, linked to Met metabolism, which involves three enzymatic steps: i. the conversion of Met into S-adenosyl-L-methionine by the enzyme S-adenosylmethionine synthetase (ADEMETS); ii. its transformation into S-adenosylhomocysteine and subsequent hydrolysis by S-adenosylhomocysteine hydrolase (ADEHCYSH); and iii. the remethylation of homocysteine to Met, catalyzed by homocysteine S-methyltransferase (HMT). While the functional characterization of these enzymes is well documented in model organisms, their distribution and evolutionary history among early-diverging eukaryotes such as trypanosomatids remain poorly understood. To address this, we performed a comprehensive survey of protein data to assess the distribution and evolutionary dynamics of these enzymes in the family Trypanosomatidae. And incorporated recently described genera (e.g., Lafontella, Jaenimonas, Obscuromonas, Sergeia, Wallacemonas, and Vickermania), thereby expanding the taxonomic breadth and contributing to a deeper understanding of its evolutionary diversity. Phylogenetic reconstructions of single trees and concatenated tree were conducted using Maximum Likelihood (ML) and Bayesian Inference (BI). Additionally, we performed microsinteny and selective pressure test. The distribution analysis revealed the presence of the completed set of the three enzymes with broad conservation among both monoxenous and dixenous taxa, as well as in the ancestral organism Paratrypanosoma confusum (outgroup). ADEMETS and HMT were detected in 60 taxa, while ADEHCYSH was found in 63. Single tree phylogenies generated through Bayesian inference (BI) (values = 1.0) and maximum likelihood (ML) (internal nodes > 50%) showed congruent topologies for the Trypanosomatidae and Leishmaniinae taxa, while exhibiting greater variability among monoxenous. In contrast, the trees generated for HMT were consistent between methods and reflected the expected species tree topology. The concatenated topology, inferred using both BI and ML, revealed a phylogenetic structure marked by strong conservation among dixenous clades and the evolutionary transition of monoxenous genera. Microsynteny analysis revealed duplications of the ADEMETS gene in Trypanosoma species and a high conservation of the genomic context surrounding ADEHCYSH. In contrast, HMT exhibited greater variability, suggesting possible gene loss or functional diversification events. Selective pressure analysis indicated that ADEMETS and ADEHCYS are under strong purifying selection, whereas HMT shows signs of episodic positive selection, possibly related to specific metabolic adaptations in certain lineages. The observed phylogenetic relationships and microsynteny analyses, point to ancient origins and lineage-specific diversification events. The integration of recently described taxa further enhances the taxonomic representation and contributes to a deeper understanding of the evolutionary dynamics of metabolic pathways in parasitic protists. Our results support the hypothesis that methionine pathway is an ancestral and conserved trait, yet functionally subject to evolutionary modifications across the Trypanosomatidae family.
Palavras-chave: Methionine pathway, Trypanosomatids, conservation, S-adenosylmethionine synthetase (ADEMETS), S-adenosylhomocysteine hydrolase (ADEHCYSH), Homocysteine S-methyltransferase (HMT)
★ Running for the Qiagen Digital Insights Excellence Awards
#1125123

BENCHMARKING METATROPICS: A BIOINFORMATICS PIPELINE FOR METAGENOMIC VIRAL DETECTION

Autores: Ane de Souza Novaes,Tessa de Block,Sandra Coppens,Kevin Ariën,Marjan Van Esbroeck,Philippe Selhorst,Daan Jansen,Koen Vercauteren,Antônio Mauro Rezende
Apresentador: Antônio Mauro Rezende • antonio.rezende@fiocruz.br
Resumo:
Long-read sequencing technologies, especially those from Oxford Nanopore Technologies (ONT), are playing an increasingly important role in metagenomic studies, with particular relevance to viral diagnostics and genomic surveillance. ONT sequencing offers key benefits, such as real-time data processing, the ability to work with small sample batches, and the generation of long reads that help resolve repetitive regions of viral genomes, while challenging short-read technologies. Portable and relatively low-cost devices like the MinION make ONT a practical option in settings with limited resources. However, the accuracy and usefulness of ONT data depend heavily on effective bioinformatics analysis, and there is a shortage of pipelines specifically designed and validated for long-read viral datasets. To help fill this gap, we developed MetaTropics, a custom bioinformatics pipeline built to analyze viral metagenomic data from ONT sequencing, which is implemented using the Nextflow framework and NF-Core templates, ensuring reproducibility and flexibility. We benchmarked MetaTropics against five public pipelines published between 2020–2024 (CZ ID, NanoSPC, VirMinION Kaiju, VirMinION Kraken2, and VirPipe). Benchmarking was conducted using 42 files from viral metagenomic Oxford Nanopore sequencing runs (27 virus positive and 14 negative controls) retrieved from the Sequence Read Archive (SRA), selected using the search: “virus[All Fields] AND clinical[All Fields] AND "metagenomic"[All Fields] AND "platform Oxford Nanopore"[Properties]”. We assessed sensitivity, specificity, precision, recall, and F1 score across a range of relative abundance thresholds to reflect different detection stringencies. MetaTropics achieved 100% precision and ≥93% specificity across all settings, with recall reaching 92% under permissive criteria and remaining above 74% under strict conditions. Its F1 scores exceeded 0.90 in permissive scenarios and stayed above 0.74 even at higher stringency. Compared to other tools, MetaTropics matched the highest performers while offering additional advantages in portability and ease of use.
Palavras-chave: nanopore sequencing, custom pipeline, viral metagenomics, pathogen detection
#1125129

Integrative Analysis of Copy Number Variations and Gene Expression in Acral Melanoma

Autores: Tamires Caixeta Alves,Antonio Carlos Facciolo Filho,Annie Cristhine Moraes Sousa Squiavinato,Martin Del Castillo VH,Patricia Basurto,Jesus Rene Wong-Ramirez,Daniela Robles,Dave Adams,Patricia Abrao Possik
Apresentador: Tamires Caixeta Alves • alvesctamires@gmail.com
Resumo:
Acral melanoma is a rare and aggressive subtype of cutaneous melanoma that primarily arises in sun-protected areas such as the palms, soles, and nail beds. Unlike other melanoma subtypes, which are frequently driven by point mutations, acral melanoma displays a distinct genomic landscape, characterized by a high prevalence of structural alterations—particularly copy number variations (CNVs). In this study, we aimed to explore the relationship between CNVs and gene expression levels by integrating Whole Exome Sequencing (WES) and RNA sequencing (RNA-seq) data obtained from the same tumor samples in Brazilian patients collected at the National Cancer Institute (INCA). This integrative approach allows us to assess whether structural alterations at the DNA level translate into transcriptional changes, particularly for cancer-related genes, where changes in expression may signal functional relevance. Methodologically, both Whole Exome Sequencing (WES) and RNA-seq data underwent rigorous preprocessing to ensure high-quality downstream analyses. Raw sequencing reads from both platforms were initially subjected to quality control, including trimming of adapter sequences and removal of low-quality bases. Cleaned reads were then aligned to the human reference genome (GRCh38) using appropriate alignment tools — BWA for WES and STAR for RNA-seq — tailored to each data type. For the WES data, copy number variations (CNVs) were inferred using the Sequenza algorithm, which produces allele-specific copy number profiles while adjusting for tumor purity and ploidy. In the case of RNA-seq, post-alignment processing included quantification of gene-level expression, followed by normalization to ensure comparability across samples. To enable integrative analysis, gene identifiers were harmonized using HGNC-approved symbols, allowing for direct mapping and correlation between genomic alterations and transcriptional activity across matched tumor samples. We then conducted gene-by-gene correlation analyses between CNVs and expression levels using Spearman’s rank correlation coefficient. As expected, some genes with copy number gains or losses exhibited corresponding increases or decreases in expression. However, this relationship was not consistent across all genes. In many instances, high expression levels were observed without underlying copy number alterations, and vice versa. Although gene dosage is generally expected to influence expression—especially in the case of oncogenes and tumor suppressors—not all genes are actively transcribed at all times, and expression levels can be shaped by diverse regulatory mechanisms. For example, a copy number gain in an oncogene may only be biologically meaningful if it leads to increased transcription; otherwise, it may be subject to transcriptional repression or epigenetic silencing. Thus, integrating genomic and transcriptomic data helps distinguish between structural variants with potential biological impact and those without functional consequences. The observation that many CNVs do not correlate with expression changes further underscores the complexity of gene regulation in cancer and reinforces the need for multi-omic approaches to identify key drivers and therapeutic targets in acral melanoma.
Palavras-chave: CNVs, WES, RNA-seq, multi-omics integration, cancer, acral melanoma
#1125142

Clear Cell Renal Cell Carcinoma Regulatory Network Reveals Master Regulators Associated with Tumor Aggressiveness and Patient Survival

Autores: Epitácio Dantas de Farias Filho,Rafaella Sousa Ferraz,Gleison Medeiros de Azevedo,Rodrigo Juliani Siqueira Dalmolin
Apresentador: Epitácio Dantas de Farias Filho • epitacio.farias.101@ufrn.edu.br
Resumo:
Clear cell renal cell carcinoma (ccRCC) is a pathology with a poor prognosis in the more aggressive stage due to the origin site size and its silent activity of development. The aggressive stage encompasses metastasis development, which is a complex and cancer-type-specific process characterised by the presence of several markers. Recent studies have carried out the analysis of gene regulatory networks (GRN) to present the mechanism of regulatory activity to ccRCC appearance, but none of them presented the GRN associated with disease aggressiveness. This work aims to build a GRN to ccRCC and identify transcription factors (TFs) that act as master regulators (MRs) of the disease aggressiveness. The GRN construction used data from The Cancer Genome Atlas (TCGA) cohort, and the MRs analysis was conducted using a ccRCC signature from the International Cancer Genome Consortium (ICGC-RECA) cohort. The Boruta feature selection algorithm was used to select the most informative MRs associated with the classification of patients who developed metastasis. Regulatory activity and risk analyses were subsequently performed to elucidate the findings further. Using the Boruta feature selection algorithm, we identified a set of 11 MRs capable of distinguishing patients in a more aggressive stage of the disease, including E2F2, FOXM1, HOMEZ, MYBL2, NFE2L3, PAX6, PBX1, TFAP2A, ZFP28, ZNF136, and ZNF658. The expression of these MRs presented an accuracy of 79.5% and an area under the curve of 86.6% in the classification of metastasis-positive patients. The regulatory activity analysis revealed that the 11-TF MRs stratify into two clusters that exhibit antagonistic activity. Moreover, the TF expressions were associated with patient survival, indicating their potential as biomarkers and diagnostic targets.
Palavras-chave: Gene Regulatory Network, Master Regulators, Aggressiveness, Renal Cancer, Patient Survival
★ This work is running for the Next Generation Bioinfo Award
#1125158

In Silico Design of a Mutant Antibody Targeting Galectin-3 Binding Protein for Cancer Immunotherapy

Autores: Julia Silva Souza,Andrielly Henriques dos Santos Costa,Aline de Oliveira Albuquerque,Diego da Silva Almeida,EDUARDO MENEZES GAIETA,Jean Vieira Sampaio,Geraldo Rodrigues Sartori,João Herminio Martins Da Silva
Apresentador: Julia Silva Souza • juliassouza40@gmail.com
Resumo:
Cancer is characterized by the uncontrolled proliferation of healthy cells and is one of the leading causes of death worldwide. Commonly used treatments, such as surgery, radiotherapy, and chemotherapy, present significant limitations, including low selectivity, severe side effects, and damage to healthy tissues. In this context, immunotherapy has emerged as a more effective and less-invasive alternative. Among the ongoing immunotherapy studies, Galectin-3 Binding Protein (Gal-3BP) has gained prominence as a promising biomarker, especially in tumor types such as breast and lung cancers. It is a large, highly glycosylated protein that modulates cell interactions and influences essential processes such as tumor cell adhesion, migration, and survival. Nonetheless, its multiple glycosylation sites interfere with the interaction of therapeutic molecules, hindering their effectiveness. Thus, the aim of this study is to propose a novel antibody mutant targeting Gal-3BP with potential application as an immunotherapeutic agent against cancer. The Gal-3BP protein sequence was obtained from UniProt database. Its three-dimensional structure was modeled using the AlphaFold Server, and ChimeraX software was used for visualization. QMEANDisCo3 and VoroMQA were employed to validate and assess the quality of the model. Additionally, HADDOCK was used to perform docking against a database of 800 naïve antibody structures, aiming to identify potential antibodies compatible with Gal-3BP. To validate the system, the docking-derived structure was subjected to heated molecular dynamics using Amber 24 software suite. Regarding the modeling, well-structured regions with reliable quality were observed, alongside low-confidence regions that coincided with segments lacking defined secondary structure, as previously reported in the literature, and, in the context of this study, are located outside the region corresponding to the epitope of interest. The structure was positively assessed by the validation programs, with a QMEANDisCo score of 0.71 and a VoroMQA score of 0.51. The docking process yielded a list of the top 30 antibodies, one of which was selected for further analysis. Heated molecular dynamics was applied to assess the stability of the antibody-antigen complex by gradually increasing the temperature over time, using an RMSD value of 5 Å as a reference threshold. Although this limit was exceeded in our experiment, the structure still underwent an internal refinement protocol aimed at improving its binding affinity. The data obtained can pave the way for the use of Gal-3BP as a potential target for immunotherapy, contributing to the rational development of new biopharmaceuticals and validating the use of in silico approaches for screening and optimization of therapeutic antibodies.
Palavras-chave: Docking, Molecular Dynamics, Antibody
#1125166

Bistability in snail Gene Expression in Drosophila melanogaster: a possible contribution to epithelial-mesenchymal transition

Autores: Luara Vieira Guimarães,Gilson Giraldi,FRANCISCO JOSE PEREIRA LOPES
Apresentador: Luara Vieira Guimarães • luara.guimaraes02@gmail.com
Resumo:
Cancer is currently one of the leading causes of mortality in the Americas, and according to projections by the National Cancer Institute (INCA), its incidence is expected to increase significantly in the coming decades (INCA, 2022). This scenario underscores the urgency of a deeper understanding of the cellular and molecular mechanisms underlying tumor progression, especially metastasis, which accounts for a large proportion of cancer-related deaths. Among the processes involved in this context, epithelial-mesenchymal transition (EMT) stands out as an essential phenotypic change that allows epithelial cells to acquire mesenchymal characteristics, increasing their mobility and invasive capacity. EMT is regulated by complex gene networks and intracellular signaling pathways, in which the nuclear factor kappa B (NF-κB) transcription factor plays a central role, acting on several target genes involved in the induction of cellular plasticity, such as twist1, slug, and sip1.
To systematically investigate these mechanisms, we used Drosophila melanogaster as an experimental model. This choice is justified by the high evolutionary conservation of many genes and signaling pathways between Drosophila and mammals, making it an efficient model organism for studying complex cellular phenomena such as NF-κB-mediated gene regulation. In this study, we combined mathematical modeling, used to quantitatively describe interactions among components of the gene regulatory network and predict the dynamic behavior of the system, and biological experimentation to analyze the formation of the expression pattern of the snail gene, homologous to slug in mammals, during Drosophila embryonic development. We focused on the gene network composed of Dorsal (homologous to NF-κB), Twist (homologous to Twist1), and Snail (homologous to Slug), from which we built our mathematical model to understand the dynamic principles that govern its function.
We hypothesized that the sharp, on-off spatial pattern of snail expression results from a bistable behavior in the regulatory system. This hypothesis was supported by the identification of three potential autoregulatory sites in the snail gene promoter, suggesting the existence of a self-activation mechanism. Through computational simulations, we verified that the model faithfully reproduces the expression patterns observed experimentally, both in wild-type embryos and in mutant strains. The results confirm the presence of multiple steady states, a typical feature of bistability, conferring robustness to the system and a differentiated response to external stimuli.
These findings help explain the elevated levels of slug, the snail homolog, observed in metastatic breast cancer cells independently of NF-κB activation (Pires et al., 2017). The identification of a possible snail self-activation mechanism in Drosophila suggests that similar mechanisms may occur in humans, contributing to the autonomous maintenance of the mesenchymal phenotype. Thus, by extrapolating the regulatory mechanisms observed in the experimental model to their human homologs, this study aims to deepen the understanding of metastasis and support the development of more effective and targeted therapies.
Palavras-chave: epithelial-mesenchymal transition, snail, autoregulation, bistability, Drosophila
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125169

The Antagonistic Role of the NF-κB Inhibitor DHMEQ in Breast Cancer Progression

Autores: Mariana Teixeira de Freitas,Karina De Menezes Leitão,Mayara Duarte de Araujo Caldas,Mayla Abrahim Costa,FRANCISCO JOSE PEREIRA LOPES
Apresentador: Mariana Teixeira de Freitas • mariana.mbtf@gmail.com
Resumo:
Breast cancer is the most common type of neoplasia among women in Brazil (Nogueira-Rodrigues et al., 2023). The disease is characterized by genetic and phenotypic heterogeneity and is classified into four main molecular subtypes: luminal A, luminal B, HER2-positive, and triple-negative, according to the expression levels of estrogen (ER), progesterone (PR) hormone receptors, or the transmembrane receptor HER2. The triple-negative breast cancer (TNBC) subtype lacks ER, PR, and HER2 expression, making it the most aggressive form, with poor therapeutic response and high recurrence rates. Different subtypes require specific therapeutic strategies. Therefore, heterogeneity, defined by the presence of cells from different subtypes within the same tumor, is one of the main challenges for treatment, contributing to therapy resistance and recurrence.
Recent results from our group (Lopes et al., 2024) indicate the occurrence of spontaneous phenotypic transitions between the HER2+ and TNBC subtypes over time. These stochastic transitions appear to be related to sustained activation of the NF-κB signaling pathway. To investigate this hypothesis, we developed a gene regulatory network model, which was converted into a system of ordinary differential equations (ODEs). The model exhibits bistability, suggesting the existence of two stable steady states, separated by a boundary of attraction basins. Calibration of the model using experimental quantitative data on NF-κB expression levels allowed us to associate the two steady states with the HER2+ and TNBC subtypes. Computational simulations suggest that intermittent administration of the DHMEQ inhibitor, followed by its withdrawal, can cause compensatory accumulation of NF-κB, capable of shifting the cell from the HER2+ basin of attraction to the TNBC basin, promoting the phenotypic transition.
This project aims to experimentally validate the model’s predictions. In the bioinformatics stage, which has already been completed, we analyzed public transcriptomic datasets (GEO) and identified genes differentially expressed between HER2+ and TNBC subtypes. These genes will be used as transition markers and support the next experimental steps. We will use the HCC-1954 and SK-BR-3 cell lines treated with DHMEQ for different durations, with time-course collections for qRT-PCR analysis of the marker genes. Additionally, immunohistochemistry analyses will be performed to assess HER2 expression and confirm the phenotypic transition. Simultaneously, we will refine the computational model, including sensitivity analysis, stochastic simulations, and identification of critical parameters regulating the transition between steady states. These theoretical and experimental results will be integrated to validate the role of NF-κB in promoting phenotypic heterogeneity in breast cancer.
Palavras-chave: Breast cancer, NF-?B, Mathematical modeling, HER2, TNBC, DHMEQ
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125174

Long non-coding RNAs as Key Regulators of Trophoblast Differentiation and Placental Development

Autores: Thalles Souza Lopes,Ana C. Tahira,murilo amaral,Sergio Verjovski-Almeida
Apresentador: Thalles Souza Lopes • thalleslopess@gmail.com
Resumo:
Recent studies have investigated the molecular mechanisms underlying placental development and their role as an immune and mechanical barrier. Despite this, pathogens can still cross the placenta, disrupting fetal homeostasis and embryogenesis. Amaral et al. (2020) isolated erythroblasts from dizygotic twins discordant for Congenital Zika Syndrome (CZS) and reprogrammed them into human induced pluripotent stem cells (hiPSCs), subsequently differentiating them into trophoblasts (TB). The study revealed significant differences in protein-coding gene expression between CZS-affected and unaffected twins. Although no pathogenic variants explained CZS susceptibility, the role of long non-coding RNAs (lncRNAs) in TB development was not explored. The aim here is to identify novel and known lncRNAs involved in the differentiation of hiPSC to TB. For that, Amaral et al. RNA-Seq data was re-mapped to the human reference genome (GRCh38) using a hiPSC-TB specific transcriptome, the STAR alignment tool and a dedicated pipeline to identify the expression of lncRNAs. GffCompare utility was used to annotate novel lncRNAs, CPC2, CPAT and FEELnc tools, to predict the coding potential of novel lncRNAs, EdgeR and limma R packages were used for differential expression analysis, WGCNA and clusterProfiler, for weighted correlation network and Gene Ontology (GO) analyses and igraph R package, to build the networks of co-expression modules. The newly designed pipeline identified 393 novel intronic lncRNAs, 317 novel sense lncRNAs, 102 novel intergenic lncRNAs and 247 novel antisense lncRNAs. When comparing hiPSC and TB, 115 known and 14 novel lncRNAs differentially expressed (DE) were found (FDR < 1%). WGCNA revealed two modules with significant negative correlation (-0,99*** and -0,59*), and another two with significant positive correlation (0,99*** and 0,78**) with TB differentiation. GO analysis revealed only one co-expression module with significant negative correlation (-0,99***), enriched with GO terms with negative Enrichment Scores (ES), related to trans-synaptic signaling, nervous system process, neuron differentiation, amongst others. This co-expression module presented 83 known lncRNAs DE, three novel intronic lncRNAs DE, five novel antisense lncRNAs DE, two novel intergenic lncRNAs DE and one novel sense lncRNA DE. Of lncRNAs DE, 11 (six known and five novel) were identified as hub genes (module membership > 0.8 and intramodular connectivity > percentile 70) and were selected for experimental validation. Likewise, only one co-expression module with positive correlation (0,99***), was enriched with GO terms with positive ES, related to tube development, angiogenesis, circulatory system development, wound healing, amongst others. This co-expression module presented 35 known lncRNAs DE, one novel intronic lncRNAs DE and two novel antisense lncRNAs DE. Of the known lncRNAs DE, six were identified as hub genes, and were selected for experimental validation. Additionally, the three novel lncRNAs DE were also selected for experimental validation even though they were not identified as hub genes. These data may be used to predict possible functions for lncRNAs as key factors regarding the molecular basis of the development of TB and human placenta and its function as a prominent immune and mechanical barrier.
Palavras-chave: Cellular development, human trophoblast, long non-coding RNA
★ Running for the Qiagen Digital Insights Excellence Awards
#1125178

Modeling Cancer Stem Cell Regulation by EWSR1::FLI1 in Ewing Sarcoma: Crosstalk Between Stemness and Epithelial–Mesenchymal Transition

Autores: Daner Acunha Silveira,Shantanu Gupta,André Tesainer Brunetto,José Carlos Merino Mombach,Marialva Sinigaglia
Apresentador: Daner Acunha Silveira • daner.silveira@gmail.com
Resumo:
Ewing sarcoma (ES) is characterized by the presence of the fusion protein EWSR1::FLI1, which regulates several biological processes, including the induction of cancer stem cells (CSCs) and the epithelial–mesenchymal transition (EMT). Our group has previously predicted the existence of hybrid cellular phenotypes during EMT in ES, which may exhibit metastatic potential up to 50 times greater than fully differentiated cell states. However, the connection between CSC induction and EMT remains unclear in the literature. This study aims to explore the signaling dynamics of CSC induction and EMT through a computational model that integrates molecules known to regulate these phenotypes in ES, based on current literature. These data were translated into a molecular interaction network. The activation and inactivation dynamics of the nodes were modeled by assigning logical operators to each component, representing the regulatory effects of upstream molecules. The model was validated by comparing the biological effects of its components with experimental data reported in the literature. Additionally, transcriptomic data from ES cell lines available in public databases were analyzed, further supporting the model's validity. Model simulations revealed the presence of distinct cellular states according to differentiation level: undifferentiated, intermediate, and differentiated. Dynamic analysis suggested that the Let-7/LIN28 regulatory axis plays a central role in controlling these cellular states. Moreover, mutual regulation between EWSR1::FLI1 and microRNA-145 was identified as a key modulator of intermediate states via Let-7 and LIN28. Another relevant finding was the role of c-Myc, which promotes full acquisition of stemness features. This research enhances our understanding of the link between EMT and CSCs in ES, highlighting the existence of intermediate cellular states driven by underexplored mechanisms that may serve as potential targets for novel therapeutic strategies.
Palavras-chave: Ewing sarcoma, Stemness, Epithelial–Mesenchymal Transition, Computational Modeling, EWSR1::FLI1
#1125190

Genomic and functional diversity in members of the Lachnospiraceae family in the gut microbiota

Autores: Kelly Yovani Olivos Caicedo,João Marcelo Pereira Alves
Apresentador: Kelly Yovani Olivos Caicedo • kellyolivos@usp.br
Resumo:
The Lachnospiraceae family is one of the main groups in the human intestinal microbiota. It includes obligate anaerobic bacterial species that are important for the production of short-chain fatty acids such as butyrate, an energy source for other microorganisms and for the host's epithelial cells; for the conversion of primary bile acids into secondary ones, for immune regulation and contribution to the microbiota's resistance against intestinal pathogens.
To better characterize the family and better understand the genomic and functional diversity among its members, a set of 200 genomes belonging to the Lachnospiraceae family obtained from NCBI, representing 80 genera and 183 species, from gastrointestinal tract samples of human and animal hosts, was analyzed using a comparative genomics and phylogenomics approach.
The results show variability in genome size and GC content. Assessment of genome-wide GC content indicates a wide variation in content within the family, from a minimum of 25.8% to a maximum of 56.3%.
The distribution of the relative standard deviation of total CDSs annotated as known proteins with or without an associated COG identification and annotated as hypothetical proteins shows that there is greater variability in the number of CDSs identified as hypothetical genes compared to those annotated. Genomic annotation shows diversity among species based on the genetic capacity to produce butyrate. The enzymes involved in each step of the butyrate synthesis pathways from acetyl-CoA were identified; one of the pathways is driven by butyrate kinase and the other by butyryl-CoA transferase which uses acetoacetate or acetate. The identification of these genes reveals a broad genetic repertoire among species from different hosts, since there are species that have a complete or incomplete pathway for butyrate synthesis, or vary in the number of genes in the pathway.
On the other hand, the presence of mobile genetic elements, such as plasmids, was observed between species from different hosts. The orthology relationship between species allowed the identification of a total of 297 groups of orthologous genes with all species present, important for the subsequent evaluation of genes belonging to the core genome of the family.
Finally, it is important to identify specific functions and metabolic pathways that likely influence an isolate's ability to influence host health, such as butyrate production. This study demonstrates the level of diversification among species in the Lachnospiraceae family and provides support for further genomic and metabolomic analyses to understand many aspects of the gut microbiota.
Palavras-chave: Lachnospiraceae, gut microbiota, short-chain fatty acids, butyrate.
★ This work is running for the Next Generation Bioinfo Award
#1125217

Genomic characterization of somatic SNVs and their association with chemotherapy response in Brazilian High-Grade Serous Ovarian Cancer (HGSOC) patients using Whole-Exome Sequencing (WES)

Autores: Michelle Marcela Paredes Escobar,Nayara Gusmão Tessarollo,Alessandra Freitas Serain,Luciana de Castro Moreeuw,Diego Gomes,Cláudia Bessa,Andreia Melo,João Viola,Mariana Boroni
Apresentador: Michelle Marcela Paredes Escobar • mipared01@gmail.com
Resumo:
Ovarian cancer (OC) is the most lethal gynecologic malignancy in women, ranking as the seventh most common cancer globally. In Brazil, over 7,000 new cases are expected annually between 2023 and 2025. High-grade serous ovarian carcinoma (HGSOC) is the most prevalent and aggressive subtype, with 5-year survival rates of 20-30%, largely due to late diagnosis and chemotherapy resistance. Genomic studies have identified somatic and germline mutations in tumor suppressor genes and DNA repair pathways, contributing to genetic instability and chemotherapy susceptibility. This project aims to characterize the somatic mutational landscape of a Brazilian HGSOC cohort to identify variants and mutation patterns associated with prognosis and treatment response. A cohort of 41 patients was established and stratified into three groups based on their response to platinum-based chemotherapy and progression-free interval (PFI): platinum-resistant (PFI < 6 months), platinum-sensitive (PFI 6–24 months), and very platinum-sensitive (PFI > 24 months or no relapse). Kaplan-Meier survival analysis confirms that three groups vary in their overall survival rate: very platinum-sensitive, platinum-sensitive, and platinum-resistant, with significant differences observed (p = 0.00013). Whole-exome sequencing (WES) was performed to identify somatic variants, following best practices from the Broad Institute and using the nf-core/Sarek workflow. Variant calling of somatic single nucleotide variants (SNVs) was carried out with GATK’s Mutect2. Subsequently, a custom bioinformatics pipeline was implemented for tumor-only experiment variant enrichment according with the Clinical Genome Resource (ClinGen) guidelines, ensuring robust classification of somatic variants in a clinical context. As expected, TP53 was the most frequently mutated gene, observed in 75% of cases. Additionally, mutations in homologous recombination deficiency (HRD)-related genes such as BRCA1 were detected in 15% of the cohort, particularly in platinum-sensitive and very platinum-sensitive groups. These findings suggest that patients with HRD-related mutations may be potential candidates for PARP inhibitor treatments. The median tumor mutational burden (TMB) in our cohort was comparable to that in the TCGA-OV cohort, indicating similar mutational processes and genomic instability. Our findings provide a comprehensive genomic profile of Brazilian HGSOC patients and highlight the utility of integrative bioinformatics approaches to uncover potential biomarkers for treatment response and prognosis, especially in underrepresented populations.
Palavras-chave: Ovarian cancer, Whole-exome sequencing, Somatic mutations, Homologous recombination deficiency, Tumor mutational burden
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125224

Analysis of the Resistome and Mobile Elements in Lactococcus lactis: Potential for Resistance Dissemination

Autores: Débora Camilly Satro Avelino,Daniela Santos Pontes
Apresentador: Débora Camilly Satro Avelino • dcamilly847@gmail.com
Resumo:
Lactococcus lactis is a lactic acid bacterium widely used in food fermentation and recognized as safe for human consumption, classified as GRAS (Generally Recognized As Safe). However, it can serve as a reservoir for antibiotic resistance genes (ARGs), making it relevant in the context of One Health. The indiscriminate use of antimicrobials promotes the spread of these genes, especially through horizontal gene transfer. This study aims to characterize the resistome and mobile genetic elements of L. lactis genomes from isolates, evaluating their role in the dissemination of antimicrobial resistance. A total of 76 complete genomes available in the National Center for Biotechnology Information (NCBI) repository were analyzed, representing dairy products, fermented foods of plant origin, natural environments, laboratory cultures, and human and animal sources, reflecting the widespread distribution of the species. ARG identification was performed using the Resistance Gene Identifier (RGI) tool, applying the “strict” and “perfect” criteria of the CARD (Comprehensive Antibiotic Resistance Database). The results showed low diversity of resistance genes across the analyzed genomes, with three genes detected: vanY (96%), lmrD (67.1%), and qacJ (34.7%). However, for vanY and qacJ, the identity with reference sequences was below 50%, suggesting insufficient evidence to confirm their functional presence. The absence of reports linking L. lactis to these genes, combined with the low similarity, indicates that these predictions may represent bioinformatics artifacts, such as distant homology or conserved domains with an unknown function in resistance. For this reason, both genes were excluded from further analysis. On the other hand, the gene lmrD showed high reliability (identity >90%) and is widely recognized as a multidrug transporter in L. lactis, associated with resistance to lincosamides. Its prevalence (67.1%) reinforces its potential as a marker of intrinsic resistance in this species. Regarding mobile genetic elements, 71.1% of the isolates contained between 1 and 10 plasmids. Of the 237 annotated plasmids, only five harbored resistance genes, totaling seven genes: tet(S) (n=4), associated with tetracycline; vanY/vanB (n=1), associated with vancomycin; catA8 (n=1), associated with chloramphenicol; and qacJ (n=2), associated with quaternary ammonium disinfectants. Despite their low prevalence, the detection of these genes in L. lactis underscores the species' capacity to harbor resistance determinants, emphasizing the necessity for genomic surveillance even in organisms traditionally considered low-risk. The ongoing analysis of genomic islands and other mobile elements is essential to understand their potential to transfer and acquire resistance genes through horizontal gene transfer, representing a risk for the spread of resistance. Characterizing the resistome and mobile elements is crucial for biosafety strategies, enabling a critical evaluation of microorganisms classified as safe in the context of antimicrobial resistance and One Health.
Palavras-chave: Lactic acid bacterium, Antibiotic Resistance Genes (ARGs), Plasmids, Genomic Islands, Biosafety
#1125235

Comparative analysis of α-latrotoxin in spider species of the Theridiidae family: a bioinformatics approach

Autores: Leví Carneiro Oliveira,Tarcisio José Domingos Coutinho
Apresentador: Leví Carneiro Oliveira • levicarneiro75@gmail.com
Resumo:
Spiders of the Latrodectus genus, belonging to the family Theridiidae, are widely known for their medical significance due to the neurotoxic effects of their venom, primarily attributed to the protein α-latrotoxin (α-LTX). This toxin acts on vertebrate neuronal cells by promoting neurotransmitter exocytosis, a mechanism triggered by its interaction with membrane receptors such as neurexins and latrophilins. Given its potent activity and high specificity, α-LTX has become a subject of biotechnological and pharmacological interest. In this study, we performed a comparative bioinformatic analysis of α-LTX sequences from seven spider species: five from Latrodectus and two from related genera (Steatoda grossa and Parasteatoda tepidariorum), in order to evaluate structural divergence and physicochemical properties. Protein sequences were retrieved from the UniProt and NCBI databases and subjected to physicochemical analysis using ExPASy's ProtParam. Sequence alignments were performed with Clustal Omega, and 3D structural predictions were generated using SWISS-MODEL. Structural comparisons and visualizations were carried out using PyMOL. The alignment results revealed high conservation of functional domains among Latrodectus species, with P. tepidariorum presenting the most divergent sequence. Significant differences were observed in amino acid composition, particularly in cleavage sites and functionally important regions such as the N-terminal domain and the helical bundle core. Structural models confirmed that S. grossa and P. tepidariorum exhibit notable deviations in key residues, including substitutions of negatively charged amino acids with neutral ones in binding pocket regions, which may reduce toxicity. Disulfide bridges and ankyrin repeats, critical for protein stability and receptor interaction, were largely conserved across species, except in P. tepidariorum. Physicochemical parameters showed that all α-LTX proteins are stable, with instability indexes below 40. Molecular weights ranged from 151 to 159 kDa, and theoretical isoelectric points (pI) varied from 5.61 to 6.09, reflecting similar biochemical behaviors. Notably, variations in estimated half-lives suggest different protein stabilities in mammalian reticulocytes among the species analyzed. This study highlights the utility of computational approaches in the structural characterization of venom proteins and provides insights into functional divergences that may underlie species-specific venom effects. These findings contribute to a broader understanding of spider venom biology and its potential for pharmaceutical exploration.
Palavras-chave: a-latrotoxin, Latrodectus, venom
#1125256

Computational Analysis via Molecular Dynamics Simulations of Frequent Spike Mutations in SARS-CoV-2 Genomes Reported in Brazil

Autores: Isabelle Alves Pereira,Ana Ligia Scott
Apresentador: Isabelle Alves Pereira • isabelle.p@ufabc.edu.br
Resumo:
Sars-Cov-2 has spread rapidly worldwide, causing millions of infections and
deaths, and has undergone numerous mutations. Some variants have shown increased
infectivity and transmissibility compared to the original strain. Tracking these mutations
is crucial for understanding viral infectivity, improving vaccine efficacy, assessing drug
resistance, and identifying immune escape mechanisms. The viral Spike protein, a
homotrimeric structure, plays a critical role in the virus’s ability to invade human cells,
facilitating entry and infection. Several mutations in the Spike protein have been
identified in SARS-CoV-2 variants, and are associated with higher transmissibility,
pathogenicity and resistance to neutralizing antibodies. Updating our understanding of
the structural and functional dynamics of this protein is essential, as the virus continues
to evolve and adapt.
For this study, metadata from GISAID was used to identify the most frequent
SARS-CoV-2 lineages in Brazil between May 2023 and May 2024. After verifying
lineage frequencies, sequences were selected based on the following criteria: complete
genomes identified as CoV-19, collected between January 1, 2023 and May 27, 2024,
and originating exclusively from Brazil. Using Python, the sequencing metadata was
processed into a dataframe, and each patient's genome file in TSV format was parsed
to extract the “AA Substitutions” column. Only mutations starting with “Spike” were
retained, resulting in 1,567 mutations. The most frequent 5% of mutations (40 in total)
were selected, excluding those that appeared only once. The CHARMM-GUI platform
was used to prepare both mutants and Wild-Type (WT) structures based on the PDB file
6VW1 (RBD-ACE2 complex), selecting only chains A and C, completing missing
residues, and modifying the sequence according to the wild-type reference from
UniProt. Each system was solvated, and a 50-nanosecond molecular dynamics
simulation was performed using NAMD. In the next step, analyzes of several aspects
are being carried out, including: a) Root Mean Square Deviation (RMSD); b) Hydrogen
Bonding patterns; c) Salt Bridges; d) Contacts; e) Frustration Degree. These analyzes
aim to determine the impact of these mutations between ACE2 and Spike. Experimental
studies in the literature have reported this phenomenon.
Palavras-chave: Spike protein, SARS-CoV-2, Molecular Dynamics, Brazil, Protein-protein interactions, Mutations
★ Running for the Qiagen Digital Insights Excellence Awards
#1125264

Genomic and computational approach to support the diagnosis of primary immunodeficiencies

Autores: Sophia Pereira Ozorio,Wilson Araújo da Silva Jr
Apresentador: Sophia Pereira Ozorio • sophiaozorio@usp.br
Resumo:
Primary immunodeficiencies (PIDs) are a group of rare, congenital genetic diseases that compromise the immune system's ability to act properly, making individuals more susceptible to recurrent infections. Approximately 300 distinct forms of PIDs have been recorded, with great genetic heterogeneity. In addition, different forms of PIDs often share similar symptoms, contributing to the overlapping of clinical manifestations, making an accurate clinical diagnosis difficult. Due to their allelic heterogeneity, it is necessary to use integrative approaches that combine clinical, genetic and computational data, providing physicians with clarity in the diagnosis of PIDs. In this context, this project aims to develop a computational approach based on a decision support system, using machine learning algorithms to assist in the differential diagnosis of PIDs. To this end, the following objectives were defined: (i) obtaining public exome data from patients with a clinical diagnosis of PIDs; (ii) developing a pipeline for processing and analyzing exome data; (iii) creating an algorithm for classifying pathogenic variants associated with PIDs; (iv) modeling a database for storing and integrating genetic variants with consolidated data from the literature; and (v) developing an interactive web platform to aid diagnosis. Germline variants were analyzed from the dataset provided by Ferreira et al. (2023), comprising 13 unrelated patients treated at three hospitals in Rio de Janeiro. The pipeline developed included the following steps: quality control (FastQC), trimming (Trim Galore), alignment to the GRCh38/hg38 reference genome (BWA), duplicate marking and quality recalibration (GATK), variant calling (GATK HaplotypeCaller) and functional annotation (SnpEff and SnpSift). The resulting VCF files were converted to MAF format, filtered by a genetic panel of 92 genes related to PDIs and analyzed by pathogenicity predictors (CADD, SIFT, PolyPhen, FATHMM and SpliceAI), according to guidelines in the literature. The preliminary results obtained through manual classification of the pathogenicity of the variants revealed mutations in twelve different genes, distributed among nine samples from the cohort, with the BTK gene standing out, identified in three of them. In five of these samples, only a single variant was detected, which reinforces the relevance of this finding for diagnosing the diseases. However, limitations were observed in the prediction stage, especially in the treatment of frameshift mutations, for which there are still no widely validated and specific predictors. Even so, all the variants described in the base article were also identified in this study. Based on this manual analysis and the difficulties encountered, a strategy was defined for the development of an automated classification algorithm, focusing on diagnostic inference and pattern recognition among the predictors, with a view to categorizing the pathogenicity of variants in genes associated with PIDs.
Palavras-chave: Primary immunodeficiency, exome, genomic analysis, bioinformatics
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125282

Exploring Genetic Diversity Associated with COVID-19 and MIS-C in Children: Findings from Whole-Exome Sequencing

Autores: Bruno Janke do Nascimento,Michelle Orane Schemberger,Beatriz Rosa De Azevedo,Carla Adriane Royer,Bárbara Reis,Flavia Anisio,Roberta Faccion,Marco Antonio Campanário,Letícia Graziela Costa Santos,Esdras Matheus Gomes da Silva,Vinícius Da Silva Coutinho Parreira,ALEXANDRE ROSSI PASCHOAL,Rubens Cat,Arnaldo Prata,Prof. Dr. Benilton de Sá Carvalho,Luiz Lehmann Coutinho,Patricia Savio de Araujo-Souza,Flávia Cristina de Paula Freitas,Tiago Minuzzi Freire da Fontoura Gomes,Alysson Henrique Urbanski,Michel Satya Naslavsky,Jaqueline Yu Ting Wang,Victor Hugo Calegari De Toledo,Daniela F. Gradia,Jaqueline Carvalho de Oliveira,Zilton Farias Meira De Vasconcelos,Helisson Faoro,Hellen Geremias Dos Santos,Fabio Passetti
Apresentador: Bruno Janke do Nascimento • bruno.nascimento@fiocruz.br
Resumo:
Despite the end of the COVID-19 pandemic, the search for causes and associations with the severity of the disease is still not well established. Although the pediatric population rarely develops the severe form of the disease, there are children who present with gastrointestinal and/or multiple organ dysfunction, accompanied by high levels of inflammatory markers, a condition classified as Multisystem Inflammatory Syndrome in Children (MIS-C). This study generated whole-exome sequencing (WES) data from samples of children affected by mild or severe COVID-19 or MIS-C. In order to identify variants with differential distribution among these groups, we obtained samples from 39 children living in Rio de Janeiro, São Paulo and Salvador, between 2019 and 2020, 10 of whom developed severe COVID-19 and 29 MIS-C. Between 2020 and 2023, samples from 163 children living in the state of Paraná, were obtained. Of these, 9 had severe COVID-19 and 154, the mild form of the disease. All the samples had their DNA extracted and WES was produced with an average coverage ranging from 40 to 150x, with a mean of 72.5x. The data was processed following the Genomic Analysis Toolkit (GATK) protocol, version 4.2.1.0. Annotation was carried out using the ANNOVAR and the filter was adapted using the CEGH Filter signaling algorithm, created by the Online Brazilian Archive of Mutations (ABraOM) project. After classification by the CEGH filter, 92% (n=628,970) of the variants were indicated as high confidence. Association analyses for common (MAF > 5%) and rare (MAF =< 5%) genetic variants were carried out using Firth's logistic regression and the STAK-O method, respectively. Only biallelic autosomal SNPs (Single Nucleotide Polymorphism) were considered for these analyses with quality control carried out by PLINK (v.1.9 and 2.0), which considered: variants with no missing data; in Hardy-Weinberg equilibrium; with no standard deviation > or < 3 of the mean heterozygosity and unrelated samples. The population structure was assessed by keeping only independent SNPs, calculating the multidimensional scale (MDS) and keeping the first 10 principal components (PCs), whose distributions were compared between groups. Ancestry proportions of the study population were estimated using ADMIXTURE (v4.4.0) and compared to reference populations from the 1000 Genomes Project (1kG). Finally, an over-representation analysis (ORA) was performed for rare genetic variants in at least one of the following public databases: 1kG; gnomAD and ABraOM, whose Rare Exome Variant Ensemble Learner (REVEL) pathogenicity predictor annotation was > 0.932, using the resulting list of genes as input into the Enrichr software. The Gene Ontology, KEGG and Reactome databases were evaluated, considering as significant the prioritized terms with a corrected p-value < 1%. After correction for PCs, the association analysis did not prioritize SNPs at a genomic significance level. The ORA prioritized terms from the metabolism pathways of fatty acids, cholesterol, lipids, diterpenoids and amino acids, as well as cell differentiation, removal of apoptotic cells and lipoprotein. These characteristics may guide future efforts to elucidate the immunopathology of COVID-19 and MIS-C in children.

Financial support: Inova Fiocruz/Fundação Oswaldo Cruz, Fundação Araucária and CNPq
Palavras-chave: COVID-19, Whole-exome sequencing, MIS-C
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125310

Characterization of mutational signatures in tumor tissue from cervical cancer infected with different HPV types

Autores: Caroline Carvalho,Michelle Marcela Paredes Escobar,rafaela de barros vieira santos,Shayany P. Felix,Ayslan C. Brant,Miguel Moreira,MARCELO ALVES SOARES,Mariana Boroni,Juliana D. Siqueira,Livia Ramos Goes
Apresentador: Caroline Carvalho • cdsacaroline@gmail.com
Resumo:
The human papillomavirus (HPV) is a double-stranded DNA virus from the Papilomaviridae family that infects epithelial cells. Chronic infection by HPV is associated with the development of different anogenital cancers, including virtually all cases of cervical cancer. This is the second most common cancer in the world and the second cause of death from cancer in women of reproductive age. Based on the genetic diversity in the viral L1 gene, there are more than 450 HPV types characterized. The HPV types 16 and 18 are the most common, responsible for 70% of cervical cancer cases. Viral infections can induce mechanisms of innate immunity, such as APOBEC enzymes. This family of proteins has a cytidine deaminase domain, causing nucleotide mutations from cytosine to thymine/uracil in DNA and RNA, and acting as a restriction factor for HPV and several other viruses. However, APOBEC editing can also produce mutations on the host genome and mutational signatures characteristic of these enzymes have been found in cervical cancer genomes. Recently, our group demonstrated that levels of APOBEC3B expression in tissue from cervical cancer are different according to the HPV types present. More specifically, HPV18-positive tumors appeared to have higher expression of APOBEC3B compared to other HPV types. Identifying the mutational profile present in cervical cancer infected with different HPV types can increase the comprehension of the oncogenic process and identify novel biomarkers for this neoplasia. Therefore, the current study explores the mutational profiles present in the DNA of cervical cancer cells and compares the role of APOBEC enzymes in these profiles according to HPV types. This study is part of a larger study, and to date, 22 cervical cancer cases from the Brazilian National Cancer Institute had their genomes sequenced by shotgun strategy. Regarding the methods for this study, the nf-core Sarek (v.3.5.1) pipeline was applied for preprocessing of reads, somatic variant calling (Mutect2) and for functional annotation of mutations. HPV genotyping was accessed using a reverse hybridization platform. Moreover, the TCGA-CESC cohort was analyzed for the mutational signature profile by extracting preprocessed exome mutation data using TCGAmutations R package, alongside clinical data obtained through TCGAbiolinks R package, and combining both with available HPV genotyping information. For mutational signatures extraction, the tools SigProfileMatrixGenerator and SigProfileExtractor were used for the TCGA cohort analysis. The characterization of the mutational profile in the samples from TCGA-CESC cohort generated three de novo signatures, which were then decomposed for the COSMIC signatures. Among those, it was observed a contribution of both APOBEC signatures (SBS2 and SBS13). When comparing the tumors infected by distinct HPV types and species, there was no statistically significant difference between groups (adjusted for tumor staging and age) in the absolute value of mutations from each APOBEC signature. Samples from INCA’s patients are being analyzed for their mutational profile. Later, mutational signatures will be extracted using the same workflow applied for the TCGA cohort and the results compared statistically for each tumor-infecting HPV and clinical information.
Palavras-chave: cervical cancer, genome, hpv, mutational signatures, cancer
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125361

Comparative analysis of AMPc/PKA signaling and mitochondrial dynamics in exercise and muscle atrophy models

Autores: Marylu Mardegan de Lima,Luciane Carla Alberici,Ricardo Roberto da Silva
Apresentador: Marylu Mardegan de Lima • marymardegan@usp.br
Resumo:
Muscle atrophy is common in conditions such as neuromuscular diseases and aging, characterized by the exacerbated activation of autophagy and the ubiquitin-proteasome system. This process leads to progressive loss of muscle mass and is often accompanied by alterations in mitochondrial dynamics, including fusion, fission, and mitophagy. Physical exercise acts as a therapeutic strategy through molecular pathways such as cyclic adenosine monophosphate/protein kinase A (cAMP/PKA); however, the exact molecular mechanisms of this pathway in muscle atrophy and its interaction with mitochondrial regulation remain poorly understood. This study investigated the expression of genes related to the cAMP pathway and modulation of mitochondrial dynamics in animal models of atrophy and exercise. We analyzed RNA-seq data from Drosophila melanogaster and from the gastrocnemius muscle of mice subjected to resistance exercise. In atrophic animals, we observed increased expression of autophagy-related genes (Atg2, in Mef2-Gal4 > UAS-Deaf1) and negative regulators of the AMPc pathway (PDE, in Mef2-Gal4 > UAS-Gsk3CA), along with reduced expression of genes associated with mitochondrial dynamics (PARK), suggesting catabolic activation and potential mitochondrial impairment. In exercised animals, genes involved in mitochondrial fission (DRP2), cAMP metabolism (ADCY, PDEs), and autophagy (FOXO3, FOXO1, ATG) were specifically modulated in females; in males, the same genes showed variations in expression direction and intensity, along with additional induction of Pink1 (mitochondrial quality control) and FBXO32 (ubiquitin-proteasome system). These findings indicate that both atrophy and physical exercise promote coordinated modulation of cAMP/PKA, autophagic, and mitochondrial pathways, with a hypothetized influence of biological sex. Further studies are needed to clarify the potential role of AMPc signaling as an integrative axis in these processes and its relevance for therapeutic strategies targeting muscle atrophy.
Palavras-chave: Transcriptome, Muscle atrophy, Resistance training, Mitochondrial dynamics
#1125375

Functional and Structural insights of AlphaMissense, SIFT, and PolyPhen-2 in Classifying CFTR Missense Variants

Autores: Ana Katarina Campos Nunes,Arthur Felipe Vasconcelos Ferreira Reis,Camila Forte,Gustavo Barra Matos,Gilderlanio Santana de Araujo
Apresentador: Ana Katarina Campos Nunes • nunesmusic12@gmail.com
Resumo:
Accurate classification of missense variants remains a significant challenge in medical genomics, especially in distinguishing between pathogenic mutations and benign variants in monogenic disorders such as Cystic Fibrosis (CF). This limitation highlights the need for robust computational and functional approaches to improve diagnostic accuracy. Widely used predictors, such as SIFT, PolyPhen-2, and now on AlphaMissense (AM), have shown varying usefulness. AlphaMissense, for example, is a deep learning-based tool that integrates sequence, structural and evolutionary information, showing superior performance compared to sequence-based methods. This study comprehensively evaluated the performance of AlphaMissense, SIFT, and PolyPhen-2 in classifying clinically significant CFTR missense variants, using 164 variants from the CFTR2 database as ground truth. Our results demonstrate that SIFT achieved the highest accuracy (0.99) for predicting CF-causing variants, while AlphaMissense outperformed other tools (accuracy: 0.78) for non-CF-causing variants. Notably, AlphaMissense showed the strongest correlation with FoldX-derived ΔΔG values (r = 0.5; p ≤ 2.2e−16), suggesting its predictions are more closely tied to protein destabilization energetics than SIFT (r = 0.21) or PolyPhen-2 (r = 0.39). Further regression analyses revealed that each unit increase in the AlphaMissense score corresponded to a 4.3-fold rise in ΔΔG (p ≤ 2.69e−08), and logistic regression confirmed that ΔΔG itself is a significant predictor of pathogenicity (OR = 1.376 per unit; 95% CI: 1.103–1.792; p ≤ 0.0095), increasing the odds of a variant being CF-causing by 37.6%. Collectively, these findings suggest that AlphaMissense’s pathogenicity predictions extend beyond simple destabilization effects, potentially capturing broader functional impacts of missense variants. The integration of AlphaMissense with FoldX free energy calculations emerges as a robust complementary strategy for interpreting CFTR variants, especially the ones of uncertain significance in CF patients, offering more accuracy and a structural overview of pathogenicity significance.
Palavras-chave: Pathogenicity predictors, CFTR, Gibbs free energy, missense variants, Databases.
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125411

Genome-wide Identification of RNA Virus–Derived Endogenous Viral Elements in Culicidae

Autores: Bianka Lopes Da Silva Paulino,Yago José Mariz Dias,Filipe Zimmer Dezordi,Gabriel da Luz Wallau
Apresentador: Bianka Lopes Da Silva Paulino • biankapaulino15y@gmail.com
Resumo:
The existence of endogenous viral elements (EVEs) – originated through horizontal gene transfer of viruses to eukaryotes – into the genomes of eukaryotic organisms represents a significant evolutionary phenomenon, particularly due to their potential role in shaping antiviral immunity and virus-host coevolution. Once fixed as a part of the genome of a species, EVEs may become degraded, remain latent, or be co-opted by the host, acquiring regulatory or functional roles. Notable examples include the syncytin gene in mammals—essential for placenta formation—and EVEs that contribute to the biogenesis of antiviral piRNAs in insects. The systematic identification of EVEs across different taxa has gained increasing attention in recent decades. Failure to properly annotate EVEs can lead to false positives in metagenomic analyses, misleading the detection of circulating viruses and impacting viral surveillance. In this context, the present study applies a standardized and automated approach to screen for EVEs in insects of the Culicidae family, a medically and ecologically important group due to its role in transmitting arboviruses such as dengue, Zika, and Chikungunya. We employed the recently developed EEfinder tool, designed for automated EVE screening based on sequence alignment strategy. A total of 131 mosquito genomes were retrieved from the NCBI database, encompassing genera with public health importance such as Aedes, Culex, and Anopheles. Specifically, to identify potential integrations, a database containing 680,634 viral proteins was used. To avoid potential false positives, a secondary database containing 721,685 non-redundant proteins from the Culicidae family was employed as a filter. A total of 8,963 endogenous viral elements (EVEs) derived from RNA viruses were identified. Among these, the majority were classified as ssRNA(-) (6,291 elements), followed by unclassified RNA (1,287), dsRNA (746) and ssRNA(+) (639). The predominance of ssRNA(-) EVEs reflects the long history of coevolution between ssRNA(-) viruses and mosquitoes. The EVE distribution per mosquito genus revealed distinct patterns, with species of the Aedes genus showing the highest average number of RNA-derived EVEs per genome (279.3), followed by Ochlerotatus (258), Armigeres (204), Malaya (176), Uranotaenia (106), Toxorhynchites (83), Sabethes (63.5), Wyeomyia (63), Anopheles (28.7), Culex (25.4), and Topomyia (23). Additionally, endogenization of viral elements from Flaviviridae was observed. Flaviviridae-derived EVEs were found in Aedes (300), Anopheles (8), Ochlerotatus (6), Wyeomyia (5), Sabethes (2), and Malaya (1). These findings reforces a prolonged evolutionary interaction between arboviruses and their natural mosquito vectors, especially within the Aedes genus as well as expand the current knowledge about EVEome on mosquitoes. As a result, an EVE database was generated that can support future screenings, functional studies, and serve as a filter for false positives in metagenomic analyses. This database also contributes as a resource for investigating virus-mosquito coevolution. In conclusion, the present findings demonstrate a significant diversity of RNA virus-derived EVEs in Culicidae genomes, with genus-specific distributions that suggest different evolutionary histories.
Palavras-chave: Paleovirology, Horizontal gene transfer (HGT), Mosquitoes vectors, RNA viruses
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125427

Micropollutants in Landfill Leachate: Prospection of Degradable Molecules by Laccase from Pleurotus ostreatus

Autores: Isac Antonio Alcantara Queiroz,Antenor Linhares,George Lacerda,José Cordeiro Nascimento Júnior,Douglas Thomaz,Adriel Martins Borges,Rafael Trindade Maia,Glauciane Danusa Coelho
Apresentador: Adriel Martins Borges • adriel852456@gmail.com
Resumo:
Laccases are ligninolytic enzymes produced by basidiomycetes that are recognized for
detoxifying contaminated environments. The Basidiomycete Pleurotus ostreatus
demonstrates the ability to degrade a variety of xenobiotic compounds by the action of
the laccase produced by it. This study aimed to evaluate the capacity of a theoretical
model of P. ostreatus laccase to degrade compounds classified as toxic and
environmental hazard present in landfill leachate. The molecules evaluated were
identified in leachate from the metropolitan landfill in the city of João Pessoa-PB. This
leachate has recalcitrant characteristics, and 31 compounds were identified in the natural
leachate that have toxic characteristics, of which 9 also pose an environment hazard,
according to the GHS. The molecules were prepared at the UCSF Chimera and the
docking simulation was performed at the AutoDock Vina. The analysis of the dockings
took place at Discovery Studio. The free energy of binding was negative for all
compounds studied, which indicates that the degradation reaction of the compounds
studied by P. ostreatus laccase is thermodynamically viable. Of the 9 compounds
evaluated, seven showed the occurrence of hydrogen bonds, with the exception of 1,4-
naphthoquinone and Kepone. The negative binding energy and the occurrence of
hydrogen interactions are indicative of the enzymatic degradation action of pollutants.
In the docking between laccase and the ligands 2-methyl-4,6-dinitrophenol, 4-chloro-2-
nitroaniline, strychnine, hexachlorophene, levomethamphetamine, and m-Toluidine, the
formation of a conventional hydrogen bond with the amino acids HIS 458 and/or ASP
208 was observed. In the analysis of the active site, these two amino acids are of great
importance in the molecular docking between laccase and several compounds because
they are electron and proton acceptors during the enzymatic reaction, respectively. Since
the 6 compounds interact with the residues of the laccase active site and have negative
binding free energy, they would probably be degraded by the enzymatic action of
laccase. The compound 1-Chloro-3-nitrobenzene showed hydrogen bond formation only
involving the residue ASN 210. The compounds 1,4-naphthoquinone and kepone did
not show hydrogen bond formation, however, several other types of interactions were
formed. In 1,4-naphthoquinone, a Pi-sigma bond was formed with the amino acid LEU
267; in the interaction of laccase with kepone, a carbon-hydrogen bond was formed, and
a halogen bond also occurred. In addition to these interactions, the formation of several
Van der Waals interactions was also seen in both compounds, which are decisive for the
formation of protein-ligand complexes and are relevant for molecular recognition
between the ligand and the amino acids in the binding pocket of a target protein. From

this, it is inferred that it is possible that the two compounds analyzed are degraded by
enzymatic action. Thus, the molecular dockings presented, indicating the potential of
application of P. ostreatus laccase in vivo and/or in vitro in the landfill leachate
treatment.
Palavras-chave: Pleurotus ostreatus, Hydrogen bonding,Binding free energy
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125438

AI-Assisted Clinical and Genetic Decision-Making: Application of Large Language Models in Healthcare Delivery

Autores: Jean Paes Landim de Lucena,Isaac Marlon da Silva Lourenço,Patrick Cesar Alves Terrematte
Apresentador: Patrick Cesar Alves Terrematte • patrick.terrematte@ufrn.br
Resumo:
The healthcare workflow demands the analysis of vast volumes of information, including medical records, laboratory results, and genomic data. The availability of such data grows daily, often overwhelming professionals tasked with aligning clinical and laboratory inputs with established protocols and guidelines to generate effective outputs for individualized diagnostic and therapeutic propaedeutics, tailored to the specific clinical context. In this scenario, tools based on large language models (LLMs) enable the handling of genomic data complexity and support clinical reasoning by optimizing information triage within an evidence-based practice framework. These tools aid in generating differential diagnoses and recommending personalized treatments. Within precision medicine, genomic data analysis facilitates the application of personalized therapies and preventive strategies. However, interpreting genomic data remains challenging, even for specialized professionals. Generative AI and large language models can assist in analyzing genetic variants and diseases, improving diagnostic accuracy and enhancing physician-patient communication. This project will present an API infrastructure based on intelligent agents implemented in LangGraph, utilizing an adaptive and self-corrective strategy via Retrieval-Augmented Generation (Adaptive RAG). The system employs the Llama 3.1 (70B) LLM model, with vectorized tokens stored in ChromaDB and metrics managed in LangSmith. The objective is to support clinical decision-making by integrating patient data with the Brazilian Ministry of Health’s Clinical Protocols and Therapeutic Guidelines (PCDT), alongside continuously updated genomic data from the MalaCards platform. This infrastructure aims to enhance diagnostic propaedeutics across healthcare tiers and enable precise, individualized clinical follow-up, particularly in resource-limited settings. As a project outcome, we will demonstrate a proof-of-concept chatbot addressing queries related to Type 2 Diabetes Mellitus PCDT guidelines and corresponding genetic testing options available in MalaCards.
Palavras-chave: Artificial Intelligence Agent, Generative AI, LLM, Healthcare, Precision Medicine.
#1125448

REEVALUATION OF RARE GENETIC VARIANTS IN BRAZILIAN POPULATIONS USING DEEP LEARNING TOOLS

Autores: Gabrielle Germek Coelho Santos,Daniel Henrique Ferreira Gomes,Julio Cesar da Silva Filho,Jorge Estefano Santana De Souza,Beatriz Stransky
Apresentador: Gabrielle Germek Coelho Santos • bibie.nuit@gmail.com
Resumo:
Genomic variants are alterations in the DNA sequence that occur at the individual or population level. Understanding their implications, mechanisms, and origin is essential for advancing precision medicine strategies. The Brazilian population, shaped by a history of extensive miscegenation, harbors unique variants and phenotypes. However, Brazilian genomic data remain underrepresented in major international databases, largely due to the population’s genetic complexity and the limited functional interpretation of available data. This gap hinders the progress of genomics-based studies in Brazil.
This study aims to identify and characterize rare genetic variants in Brazilian individuals using next-generation sequencing (NGS) data, as well as to reclassify the pathogenicity of previously reported variants using deep learning-based approaches implemented in DTreePred—a new mobile application developed by Gomes et al. (2025), to be presented at this congress.
As a case study, we reanalyzed the variants identified in a Brazilian cohort described by Amorim et al. (2025), consisting of 17 samples and 128 genetic variants, including single nucleotide variants (SNVs) and small insertions/deletions (indels). The variant calling pipeline followed a standard protocol, beginning with quality control of the reads using Trimmomatic. Filtered reads were aligned to the hg38 reference genome using BWA-MEM, followed by sorting, duplicate marking, and indexing with Samtools and Picard Tools.
Variant calling was performed using DeepVariant in WES mode, based on a custom BED file to define target regions. Functional annotation of the variants was done using a custom Perl script, integrating information from public databases. Finally, pathogenicity prediction was performed with DTreePred, which applies machine learning models to classify variants into clinically relevant categories, such as benign or pathogenic.
Through this approach, we expect to identify novel variants that may have been missed in previous analyses, and to reclassify variants previously labeled as variants of uncertain significance (VUS), those with conflicting interpretations, or absent from the ClinVar database. The effectiveness of pathogenicity predictors in clinical settings depends on the accurate identification and interpretation of potentially pathogenic genetic variants. Such understanding is crucial for enabling genetic findings to inform treatment decisions and clinical recommendations.
Palavras-chave: Genetic Variants, Brazilian population, Pathogenicity, Mutation, Machine Learning, Deep Learning, Precision medicine
#1125450

Associations of mtDNA Complex I Alterations, Heteroplasmy and Reduced ND5 Expression in Parkinson Disease

Autores: Gustavo Barra Matos,Camille Sena dos Santos,Letícia Cota Cavaleiro de Macêdo,Juliana Paiva dos Santos Diniz,Tatiane Piedade de Souza,Giovanna Chaves Cavalcante,Caio Santos Silva,Rebecca Lais da Silva Cruz,Dafne Dalledone Moura,ANDREA KELY CAMPOS RIBEIRO DOS SANTOS,Bruno Lobato,Gilderlanio Santana de Araújo
Apresentador: Gustavo Barra Matos • gustavobarra16@gmail.com
Resumo:
Mitochondrial DNA (mtDNA) alterations are associated with the risk and progression of Parkinson’s disease (PD) and levodopa-induced dyskinesia (LID), both of which are linked to mitochondrial oxidative stress. Insertions and deletions (INDELs) in mtDNA, by affecting coding and regulatory regions, can impact mitochondrial gene expression and impair the synthesis of proteins involved in oxidative phosphorylation, thereby contributing to mitochondrial dysfunction. However, the role of INDELs in PD and LID remains poorly explored, especially in understudied admixed populations whose genetic architecture differs from that of well-characterized populations. Thus, this study investigated the contribution of mitochondrial INDELs to PD and LID and their effects on gene expression, focusing on an underrepresented admixed population. We analyzed mtDNA INDELs from the blood of 87 admixed individuals from the Brazilian Amazon (45 with PD — 20 with LID and 25 without LID — and 42 controls) using FastQC, MultiQC, FastP, BWA, and the mtDNA-Server 2 workflow. Additionally, we analyzed blood bulk RNA-seq data from a similar cohort of 46 individuals (26 with PD — 15 with LID and 11 without LID — and 20 controls), with a focus on mitochondrial gene expression. The processing included quality control with FastQC/MultiQC, trimming with FastP, alignment with STAR, read counting with HTSeq, and differential expression analysis with DESeq2. We identified an increased repertoire of INDELs in PD patients without LID compared to both controls and the LID group, particularly in the ND5 and RNR2 genes. INDELs occurrences in Complex I were significantly associated with PD without LID (OR = 2.18; P = 0.022). Moreover, elevated levels of heteroplasmy in ND4 and ND5 were mainly observed in the PD without LID group. Conversely, a reduction in INDELs in Complex I was associated with greater motor severity as assessed by the MDS-UPDRS Part III in LID patients (R2 = –0.64; P = 0.026). We also found that increased heteroplasmy of INDELs in mtDNA regulatory regions was positively correlated with age in PD patients without LID (R² = 0.95; P = 2.6 × 10⁻⁹) and with LID (R² = 0.35; P = 0.019), while a negative correlation was observed in controls (R² = –0.77; P = 3.3 × 10⁻⁹). Finally, we observed that high INDEL burden in ND5 may be associated with its reduced expression in PD patients without LID (log₂ fold change = –0.857; P = 2.73 × 10⁻⁶; adjusted P = 0.007). Altogether, this study reveals that mtDNA INDELs, particularly in ND5 and Complex I, are associated with Parkinson’s disease without LID, potentially impacting mitochondrial gene expression and contributing to mitochondrial dysfunction. Correlations with age, clinical severity, and gene expression suggest that these variants may hold value as molecular biomarkers in underrepresented populations.
Palavras-chave: Mitogenome, INDELs, Transcriptome, Gene Expression, Parkinson’s disease, Levodopa-induced dyskinesia
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125493

SIBILANT: AN INTERACTIVE SYSTEM FOR PRELIMINARY STRUCTURAL ANNOTATION

Autores: Breno de Andrade Tauil,Alan Durham
Apresentador: Breno de Andrade Tauil • tauil.breno@gmail.com
Resumo:
Accurate characterization of coding sequences (CDS), transcription start sites (TSS) and start codon remains a significant bottleneck in genome structural annotation. Although efforts have been made, annotation pipelines often struggle to include ways to handle these bottlenecks. Furthermore, recent advances in next-generation sequencing (NGS) projects have considerably increased the number of unannotated species in all taxons. This study presents SIBILANT, a new annotation pipeline that incorporates specialized software tools into a standard annotation framework, which addresses these bottlenecks with a hybrid \textit{ab initio} and similarity-based approach while being trained on kingdom-level data. By integrating transcriptomic CDS prediction and BUSCO protein evidence with the genomic data we can assume a higher level of confidence in the gene predictions. Additionally, SIBILANT differentiates itself by implementing a TSS predictor, which not only characterizes TSS regions but also suggests possible start codon locations for the gene predictor. SIBILANT is pre-trained on reference genes with curated transcripts and proteins from the NCBI database, including twenty eukaryotic species subdivided in four training models: vertebrates, invertebrates, plants and fungi. However, our pipeline can easily be customized for different training models. This thesis aims for a more accurate gene prediction when compared to standard annotation approaches, especially regarding the characterization of CDSs and TSSs. SIBILANT is entirely written in NEXTFLOW, a domain-specific language pipeline manager optimized for bioinformatics analyzes. Moreover, SIBILANT may be implemented as a NF-Core community pipeline for easier distribution, operation, and modularity, so it can better contribute to sequencing projects as the ideal primary annotation pipeline of unannotated eukaryotic sequences.
Palavras-chave: gene annotation, pipeline, CDS prediction
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125509

Long non-coding RNA: A comprehensive analysis of expression related to triple-negative breast cancer

Autores: Emanuely Maria Santana,Tatianne Costa Negri Rocha
Apresentador: Emanuely Maria Santana • santanamanu2001@gmail.com
Resumo:
Breast cancer (BC) is the most common type of cancer and the leading cause of mortality among women worldwide, accounting for 11.7% of global cases. The triple-negative subtype (TNBC) stands out for its high mortality rate and aggressiveness, characterized by the absence of hormone receptors and HER2, as well as genetic and epigenetic instability. Long non-coding RNAs (lncRNAs) have shown promise as biomarkers and therapeutic targets for TNBC, as they regulate gene expression at multiple levels, influencing processes such as tumor growth, metastasis, and drug resistance. This project aims to identify epigenetically relevant lncRNAs related to TNBC through analysis of public databases and literature reviews. For this purpose, RNA-Seq files of the basal-like TNBC cell lines MDA-MB-436 (SRX17694387), MDA-MB-231 (SRX22882404), SUM149 (SRX21301768) and the normal cell line MCF10A (SRX22778572) will be retrieved from the NCBI Gene Expression Omnibus (GEO) database in SRA format. In addition, data from the lncRNAs RMST (NR_186095.1), SNHG12 (NR_146382.1), MALAT1 (NR_144568.1) and ADAMTS9-AS2 (NR_038264.1) will also be accessed. The SRA files will be processed on the Galaxy platform for conversion to Fastq format, with quality verification via FastQC and subsequent conversion to Fasta. Nuclear sequences will be trimmed and superimposed for alignment. Sequence alignment will be performed by BLASTn using the human genome reference GRCh38. This will allow the identification of genetic alterations and regulatory elements associated with differential gene expression in TNBC. The subcellular localization of lncRNAs will be mapped via AnnoLNC2, providing planned annotations and interactive visualizations. LncRNA clearance probabilities will be determined by CPC2, which categorizes sequences as coding or non-coding. PLEK will be used in a Linux environment to classify RNA-Seq sequences based on standard parameters, including k-mer analysis. This is expected to expand knowledge about the molecular mechanisms of TNBC and contribute to the development of new therapeutic strategies targeting this aggressive subtype of breast cancer.
Palavras-chave: TNBC, HER2, lncRNA, biomarker, agressiv tumor
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125565

Selective Mechanisms of RND Efflux Pumps in Pseudomonas aeruginosa: A Preliminary Exploration

Autores: Vinnícius Machado Schelk Gomes,Pedro Henrique Monteiro Torres,Manuela Leal da Silva
Apresentador: Vinnícius Machado Schelk Gomes • vinniciusmschelk@gmail.com
Resumo:
Pseudomonas aeruginosa is a Gram-negative opportunistic pathogenic bacterium with a broad habitat distribution, ranging from soil to surgical instruments. It exhibits significant antimicrobial resistance - an emerging concern that may result in up to 10 million deaths annually by 2050. One of the mechanisms conferring resistance in P. aeruginosa involves efflux systems, such as the Resistance-Nodulation-Division (RND) system. This study aims to investigate the key residues involved in antibiotic recognition by MexB. Protein sequences were retrieved from the Transporter Classification Database (TCDB), and BLASTp was used to select a template by aligning the MexAB-OprM efflux pump sequences against the RCSB database. Once a suitable template was identified, protein models were constructed using MODELLER 10.3 and validated through DOPEscore, Ramachandran plots (Procheck), and MolProbity scores. The resulting models were subjected to molecular docking with AutoDock Vina and GOLD (CHEMPLP and ASP functions) and a database of antibiotics (an antibiotic database selfmade with 96 molecules related to P. aeruginosa resistance, yet to be published) and decoys (made using DUDE-e), Docking poses were ranked based on binding energies and scoring functions. The best-scoring pose for each ligand per program and scoring function was selected to form a population of complexes, whose interactions were then analyzed using PLIP and Discovery Studio Visualizer. High-quality structural models were obtained. The number of interactions, the identity of residues involved, and their physicochemical properties were used as criteria to identify key residues involved in antibiotic selection. A previous analysis led us to believe that acid and basic residues had a lower interference in antibiotic selection than the other physicochemical classifications; but there is more analysis to be made for us to settle with this previous conclusion with this new methodology
Palavras-chave: antibiotic resistance; efflux pumps; Gram-negative.
#1125580

A WORKFLOW FOR FUNCTIONAL ANNOTATION OF BIODIVERSITY LATIN AMERICAN GENOMES

Autores: MARIA EDUARDA DE OLIVEIRA SILVA,Alan Durham,Vinicius Maracaja Coutinho
Apresentador: MARIA EDUARDA DE OLIVEIRA SILVA • mariaed.biologia@gmail.com
Resumo:
Genomic data has played a transformative role in the advancement of science, offering fundamental insights into the biological functions of diverse organisms. Therefore, understanding the functional roles of these genomic sequences is essential for interpreting them biologically. Species found in Latin America exhibit great biodiversity, as multiple biomes are present throughout its territory. The genomes of species across the twenty countries in this region have direct relevance to human health, neuroscience, conservation, pharmaceuticals, and agriculture. However, the annotation of eukaryotic species from many groups remains scarce or even nonexistent. In this context, studying the genomes of these species becomes particularly valuable, as biological data contain information that may be crucial for addressing future global challenges. In this study, we aim to extract genomic data from The Earth BioGenome Project (EBP), a global initiative in which scientists have joined efforts to sequence the genomes of all eukaryotic species on Earth, rather than just one representative from each family or genus. The annotation of these organisms will be conducted using a pipeline developed in Nextflow, a workflow manager. The analyses to be performed will include ten rigorously selected programs for: Structural Gene Prediction, Identification of Repetitive Regions, Identification of Functional Domains, Prediction of Transmem- brane Regions, Detection of GPI (Glycosylphosphatidylinositol), GO Term Assignment, and Metabolic Pathway Analysis. Subsequently, a database will be developed to store the information provided by the annotation pipeline. This database will serve as a centralized resource to facilitate access, analysis, and comparison of annotated genomic data from Latin American species. It will be a key component of the project, enabling long-term data integration, public access, and future large-scale analyses. Considering Latin America as one of the regions with the greatest biodiversity, the described pipeline will address the urgent need for genomic information on numerous species found in these countries, expanding research opportunities. Therefore, its development, along with the storage of information in an automated database, will be of utmost importance in filling gaps in data on various species in Latin America. Additionally, it will contribute to advancements in the application of annotation data, enhancing fields such as biotechnology, conservation, public health, and the economy.
Palavras-chave: Genomic annotation, Latin America, Biodiversity, Genome database, Nextflow
★ Running for the Qiagen Digital Insights Excellence Awards
#1125626

Genomic and Functional Characterization of a Chitin-Specific Jacalin-Related Lectin in Moraceae Species

Autores: Miguel Victor Bringel Sales,Maurício Mendonça Girão,Carlos Eduardo Reinaldo Custódio,Aila Maria Melo Correia,Julia Geyziana Oliveira Costa Araújo,Louhanna Pinheiro Rodrigues Teixeira,José Hélio Costa,Ana Cristina,Antonio Edson Rocha Oliveira
Apresentador: Carlos Eduardo Reinaldo Custódio • carloseduardoreinaldo13@gmail.com
Resumo:
Lectins are a diverse group of proteins capable of forming specific and reversible interactions with glycoproteins, glycolipids, or oligosaccharides, playing crucial roles in biological signaling. Jacalin-related lectins (JRLs), a subset of these proteins, were first identified in jackfruit (Artocarpus heterophyllus) seeds, followed by discoveries in breadfruit and Morus nigra (blackberry), all members of the Moraceae family. JRLs are typically classified as either galactose-specific or mannose-specific, depending on their carbohydrate-binding specificity. Numerous studies have highlighted the involvement of lectins in a wide range of biological processes, including carbohydrate recognition, host-pathogen interactions, cell targeting, cell-cell communication, apoptosis induction, metastasis, cancer differentiation, and antimicrobial activity. As a result, lectins have attracted increasing interest as promising targets for pharmaceutical and biotechnological applications. A previous study suggested the existence of a chitin-specific JRL (cJRL) in breadfruit; however, there is still a lack of genomic and biochemical information regarding this protein in other Moraceae species. In this project, we aim to characterize the cJRL at both the genomic and biochemical levels in plant species belonging to the Moraceae family. Our curation identified 14 distinct plant species within the Moraceae family, representing the genera Antiaris, Artocarpus, Morus, and Ficus, all of which possess high-quality genome assemblies, with sequencing coverage ranging from 66.5× to 303.2×. To delineate the genomic profile of the cJRL, we performed manual annotation of lectin genes across these Moraceae genomes. Our analysis revealed the presence of a single copy of the cJRL gene in all species examined. Gene structure analysis showed that cJRL consists of seven exons and encodes a protein of approximately 626 amino acids, in contrast to previously described JRLs, which typically contain three exons and encode proteins ranging from 149 to 217 amino acids. To validate our genomic findings, we conducted biological assays using Morus nigra. Proteins were extracted from blackberry seeds and isolated through affinity chromatography on a chitin-based matrix. The retained fraction was eluted using 0.1 M glycine buffer (pH 10.0), and purity was assessed via SDS-PAGE, which revealed a prominent band at 45 kDa, indicating a high degree of purification. To evaluate biological activity, a hemagglutination assay was performed using erythrocytes from the ABO system in serial dilutions. Additionally, microbial agglutination assays were conducted using Pseudomonas aeruginosa. The cJRL extract showed significant hemagglutination activity, with a titer of 10⁵, and demonstrated effective binding and agglutination of P. aeruginosa. We identified a putative cJRL gene in all examined Moraceae species and confirmed its functionality in Morus nigra, with strong hemagglutinating and microbial agglutination activity, reinforcing its biotechnological and therapeutic potential. Further biochemical validations will be carried out in other Moraceae representatives to confirm the structural and functional conservation of cJRL and explore its broader applications.
Palavras-chave: Lectins, Genomic, Moraceae family
★ Running for the Qiagen Digital Insights Excellence Awards
#1125633

Tracking the Dispersion of Oropouche Virus in Pernambuco Through Phylogeographic Analysis

Autores: Elverson Soares de Melo,Gustavo Barbosa de Lima,Adalúcia da Silva,Alexandre Freitas da Silva,Verônica Gomes da Silva,Elisa de Almeida Neves Azevedo,Daniele Barbosa de Almeida Medeiros,Keilla Maria Paz E Silva,Diego Arruda Falcão,Andreza Pâmela Vasconcelos,Mayara Matias de Oliveira Marques da Costa,Marcelo H. S. Paiva,Bartolomeu Acioli Dos Santos,Clarice Neueschwander Lins de Morais,Túlio Campos,Gabriel da Luz Wallau
Apresentador: Elverson Soares de Melo • elverson.melo@ufpe.br
Resumo:
Orthobunyavirus oropoucheense, also known as Oropouche virus (OROV), is a segmented single-stranded RNA virus that causes Oropouche fever, an acute febrile illness that may occasionally be fatal. Previously mostly confined to the Amazon region, OROV has expanded since 2023, with more than 13000 cases reported in different areas of Brazil in 2024, including nearly 1500 cases in the Northeast region and 176 confirmed cases in the state of Pernambuco alone. To elucidate the virus-introduction and spread pathways within Pernambuco, we sequenced 79 OROV-positive patient samples collected in 14 municipalities of Pernambuco and one of Sergipe on an Illumina MiSeq platform. Reads were processed with ViralFlow pipeline, and genomes showing <70% coverage for any segment were discarded from further analyses. To contextualize our sequences for viral dispersal within Brazil, we incorporated publicly available OROV genomes from ten additional Brazilian states, downloaded from GISAID, into a phylogeographic analysis. After concatenating the three viral segments, sequences were aligned with MAFFT and trimmed to remove unaligned and non-coding regions. A time-scaled Bayesian phylogeographic analysis was performed in BEAST v1.10.4 using a GTR+G+I substitution model, the sample collection dates incorporated as tip dates, an uncorrelated log-normal relaxed clock, and a Bayesian Skyline tree prior. Sampling city was treated as a discrete trait under an asymmetric substitution model with Bayesian stochastic search variable selection (BSSVS) procedure. The maximum clade credibility tree was obtained with TreeAnnotator and visualized in SpreaD3 to infer viral dispersal routes. OROV genomes from the state of Pernambuco clustered into two clearly distinct clades in the phylogeny, whose most recent common ancestor dates to March 19–August 29, 2023 (95% HPD intervals), evolving as independent lineages for 11 months until the first confirmed case in the state. Our Bayesian phylogeographic model therefore supports two separate introductions into Pernambuco. The first introduction (Clade I) originated from cases in the central region of Amazonas state (100% of probability), arriving in the Zona da Mata Sul region of Pernambuco between January 30 and April 12, 2024—specifically in the city of Jaqueira (72% of probability). Jaqueira then acted as the epicenter for this lineage within Pernambuco (70 cases). After arriving in Jaqueira, Clade I spread rapidly toward coastal municipalities and cities in the Zona da Mata Norte, as well as outside the state, towards Sergipe (94% of probability). The second OROV lineage (Clade II) was later introduced, between April 30 and June 28, 2024, in the city of Timbaúba, likely originating from the state of Rio de Janeiro (98% of probability). In contrast to Clade I, Clade II remained geographically confined to Timbaúba. Although our phylogeographic analysis included ~30% of confirmed cases in the state, the sample distribution largely mirrors the geographic burden of cases across cities. The results suggest that Clade I was the main driver of infections reported between May and September 2024, playing a dominant role in the local outbreak and demonstrating a clear potential to spread to neighboring states.
Palavras-chave: Oropouche, Phylogeography, Viral Dispersion
★ Running for the Qiagen Digital Insights Excellence Awards
#1125646

An alignment-free strategy for feature selection and HIV sequence classification

Autores: Tatiana Mari Saita,Matheus Henrique Pimenta-Zanon,Glaucia Maria Bressan,Artur Queiroz,Fabricio Martins Lopes
Apresentador: Tatiana Mari Saita • tatianasaita@gmail.com
Resumo:
This work presents a new feature selection method for classifying HIV (human immunodeficiency virus) sequences by subtypes. Identified in the 1980s, HIV currently affects millions of people worldwide and is the cause of AIDS (Acquired Immunodeficiency Syndrome). In-depth knowledge of HIV is essential for developing effective treatments and vaccines, as the virus exhibits a high mutation rate, resulting in various strains. HIV is classified into two types: HIV-1 and HIV-2. HIV-1 is subdivided into groups M, N, O, and P, with the largest group M containing subtypes A (subdivided into A1 and A2), B, C, D, F (subdivided into F1 and F2), G, H, J, K, and more recently L. HIV-2 is also categorized into subtypes, but with fewer reported cases. In addition to the classic strains, the high mutation rate of HIV leads to the formation of circulating recombinant forms (CRF) and unique recombinant forms (URF), resulting from the combination of one or more subtypes. Monitoring the evolution of the virus is ongoing, presenting challenges in collecting and analyzing genetic sequences, especially in rare subtypes or regions that are difficult to access. Traditional sequence classification methods rely on alignment, which can significantly increase processing time, especially with large data volumes and difficulties aligning sequences of varying lengths. To overcome these limitations, the proposed method employs an alignment-free approach based on the representation of sequences as directed graphs. Each sequence is transformed into a graph where triplets of consecutive nucleotides form vertices, and the edges connect these vertices according to the original sequence. Each pair of vertices linked by an edge represents a motif (6 nucleotides), allowing for the analysis of motif frequency independent of their location in the sequence. Since viruses of the same subtype can exhibit genetic variations, the method identifies motifs that capture these differences, enhancing classification accuracy. Hierarchical clustering is used to group sequences, identifying clusters with subtype homogeneity greater than 0,7. The proposed feature selection is performed by extracting more frequent motifs from each cluster and motifs exclusive to specific subtypes, based on adjacency matrices that reflect the motif density. These most frequent motifs are used as features (input) in classical classification models, such as Random Forest, SVM, and XGBoost. The methodology has been tested on 6764 sequences of HIV-1, encompassing subtypes A, B, C, D, F1, G, and H, varying from 200 to 12,000 base pairs. The feature selection process reduced the number of features from 4,096 to 496. In a model trained with 1,260 sequences, this process achieved an accuracy ranging from 0.94 to 0.96, depending on the classification method used, indicating its adequacy and efficiency. Further work includes performing the proposed method on other viruses and organisms.
Palavras-chave: HIV, feature selection, motif, classification
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125659

Integration of a Multi-Approach Strategy for Uncovering sRNA–mRNA Interactions in Staphylococcus aureus Biofilm

Autores: CAROLINA ALBUQUERQUE MASSENA RIBEIRO,Guadalupe del Rosario Quispe Saji,Maiana de Oliveira Cerqueira e Costa,J. Eduardo Martinez-Hernandez,Marisa Fabiana Nicolás
Apresentador: CAROLINA ALBUQUERQUE MASSENA RIBEIRO • Ca.lbuquerquemr@gmail.com
Resumo:
Small regulatory RNAs (sRNAs) are non-coding RNAs that play central roles in post-transcriptional gene regulation in bacteria, allowing fine-tuned responses to environmental stimuli. In Staphylococcus aureus, a major human pathogen and leading cause of hospital-acquired infections, sRNAs regulate essential processes, including metabolism, virulence, antimicrobial resistance, and the transition between planktonic and biofilm lifestyles. Understanding the role of sRNAs in biofilm formation is strategical to tackling chronic infections and antibiotic resistance. In this study, we analyzed six S. aureus strains: the Brazilian strain Bmb9393 and five USA-derived clones (USA100/N315, USA200/MRSA252, USA300/LAC, USA400/MW2, and USA500/NRS385). For strains lacking sRNA annotations, we applied an integrative prediction strategy combining sequence homology, covariance models, and transcriptomic profiling. A custom Python pipeline was used to classify identified sRNAs by genomic context (intergenic, antisense, intragenic, or UTR-associated), revealing that most sRNAs are located in intergenic or antisense regions. Transcriptomic data (RNA-seq) were generated for Bmb9393 and retrieved from the Gene Expression Omnibus (GEO) for the USA strains. Differential expression analysis between planktonic and biofilm conditions identified 14 sRNAs consistently regulated across all strains, suggesting a conserved regulatory role during biofilm development. Notably, the expression patterns of Sau-41 support an antagonistic role in the transcriptional regulation of alpha-hemolysin (Hla), an important biofilm-associated virulence factor. We hypothesize that this is due to a sponge-like interaction with the sRNA RNAIII, which enhances Hla translation. Our study proposes a refined functional model for the dynamic interaction between Sau-41 and RNAIII, which fine-tunes alpha-hemolysin expression and contributes to the regulatory control of biofilm-associated virulence. We then constructed a weighted gene co-expression network (WGCNA) integrating coding genes and sRNAs, identifying 23 expression modules, with seven associated with biofilm conditions. sRNAs such as Teg20 and sbrC emerged as central regulatory hubs, with functional enrichment suggesting roles in stress response and metabolic adaptation. To predict sRNA–mRNA interactions, we employed four computational tools (sRNARFTarget, TargetRNA3, IntaRNA, and RNAplex), generating over one million candidate interactions. These were filtered through a multi-step pipeline using thresholds based on validated interactions, DEG significance, WGCNA co-expression modules, and interaction scores. Final manual curation, supported by literature, annotation context, and expression patterns, prioritized 24 high-confidence sRNA–mRNA interactions for experimental validation. This integrative strategy, combining prediction, transcriptomics, and network analysis, provides a comprehensive framework for studying sRNA-based regulation in S. aureus.
Palavras-chave: sRNA, sRNA-mRNA, Target Prediction, Biofilm, S. aureus,
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125685

In silico characterization of TcPAQR4: A novel T. cruzi receptor in the treatment of Chagas disease

Autores: Maria Clara Esteves Monachesi,Pedro Henrique Monteiro Torres,ANGELA HAMPSHIRE DE CARVALHO SANTOS LOPES
Apresentador: Maria Clara Esteves Monachesi • mcemonachesi@biof.ufrj.br
Resumo:
T. cruzi is the etiologic agent of Chagas disease, a public health problem that greatly affects many countries, including Brazil. TcPAQR4 (T. cruzi Progestin and AdipoQ Receptor) is a putative receptor for PAF (Platelet-Activating Factor) and LPC (Lyso-phosphatidylcholine) recently discovered in Trypanosoma cruzi by our group. TcPAQR4 is homologous to members of the human progestin and adiponectin receptors (hPAQRs), a class of 7TM receptors. T cruzi’s life cycle involves many cyclic differentiation steps, both in its vector (triatomine bugs) and its definitive host (mammals, including humans), with three main morphotypes: amastigotes, trypomastigotes and epimastigotes. TcPAQR4 is important in the differentiation of trypomastigote to amastigote in the infection of mice peritoneal macrophages. In vitro assays showed that knockdown parasites for TcPAQR4 were not able to respond to PAF and LPC stimuli in cell differentiation, and its knockout rendered the protozoan non viable. In a previous study, we modeled TcPAQR4 and screened over 500k molecules from Enamine libraries for inhibitors through molecular docking and subsequent ligand-receptor analysis, and the 20 most promising molecules were chosen for further studies. In the present study, in vitro and in vivo analysis will be performed to assess cell viability and mortality, and the molecules that perform best in those analyses will be further investigated with molecular dynamics simulations (MD). The MD simulations will be with the apo protein, as well as with its natural ligands (PAF and PLC) and inhibitor (WEB2086) in a lipid bilayer. The MD will be performed in both all-atoms (AA) and coarse-grained (CG) simulations, to better understand how the molecules interact with the receptor.
Palavras-chave: Trypanosoma cruzi, molecular dynamics, molecular docking, SBDD, ligand-receptor interactions
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125695

APPLICATION OF MACHINE LEARNING IN THE PREDICTION OF HTLV PROGRESSION BASED ON INTESTINAL MICROBIOTA DYSBIOSIS

Autores: Marília Gabriela Barbosa da Silva,Laryssa Bandeira de Melo Silva,Matheus Azevedo Bomfim,Matheus Rodrigues Torres Farias,Gabriel Freitas Araújo,Patricia Muniz Mendes Freire de Moura,Silvana de Fátima Ferreira da Silva Caires,José Anchieta de Brito,Paula M. Magalhães,ELIANE CAMPOS COIMBRA,Adauto Barbosa Neto,João Pacifico Bezerra Neto
Apresentador: Marília Gabriela Barbosa da Silva • marilia.gabrielabarbosa@upe.br
Resumo:
The Human T-Cell Lymphotropic Virus (HTLV), identified in 1980, is classified into four types, with HTLV-1 being associated with clinical manifestations. Transmission occurs via two routes, vertical and horizontal, mainly through sexual intercourse. Studies indicate that HTLV infection may be related to changes in the intestinal microbiota, influencing immunoregulation and pain sensitivity, as the virus can lead to a chronic pro-inflammatory state. This state may favor the proliferation of bacteria associated with inflammatory processes. Such imbalance can affect the production of essential metabolites, such as short-chain fatty acids, which play protective roles in the intestine and nervous system. This study aimed to apply Machine Learning (ML) models to predict the risk of dysbiosis and clinical progression in People Living with HTLV (PLHTLV). From the recruitment of patients from the Infectious and Parasitic Diseases Service of the Oswaldo Cruz University Hospital, the participants signed the Free and Informed Consent Form and answered the questionnaire on the risk of dysbiosis. The research followed the guidelines determined in CNS Resolution No. 466, of 2012, and in Operational Standard No. 001, of 2013, of the CNS, being approved in the Brazil Platform under CAAE: 57785822.3.0000.5192. The data used were derived from the construction of a dataframe, used to enable the formation of the model and validation, which contained the results of the questionnaire, which were changed to numerical variables. After pre-processing all variables, they were used to train seven ML models: Gradient Boosting (GB), Random Forest (RF), Logistic Regression (LR), Naive Bayes (NB), Support Vector Machine (SVM), Decision Tree (DT), and Multilayer Perceptron (MLP). A total of 92 patients were included in the study. From the prediction made by ML models, it was observed that 88.8% had a moderate to very high risk of intestinal microbiota dysbiosis, while only 12% were classified as low risk. Regarding the generated heatmap, the metrics correspond to the evaluation of the models applied to predict the risk of dysbiosis in patients with HTLV. Among the models, the GB showed the best performance, standing out especially for the recall of 0.628, the highest among the models. The other results of the GB were: accuracy of 0.613, precision of 0.458, F1 score of 0.529 (indicating the best overall performance), ROC AUC of 0.577 and G-Mean of 0.616 (demonstrating the best balance between sensitivity and specificity). The correlation matrix indicates that the variable “chronic diseases” obtained the best correlation with dysbiosis. The study shows that most PLHTLV have a possible association between infection and changes in the microbiota, with a pro-inflammatory impact. The GB stood out by presenting the best performance indices, especially regarding recall, F1 score and G-Mean, relevant metrics for unbalanced data contexts. These findings highlight the potential of artificial intelligence as a tool to support clinical monitoring, offering subsidies for personalized intervention strategies in the care of patients with HTLV-1.
Palavras-chave: Dysbiosis, Microbiota, HTV-1.
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125709

ATLAS PROJECT: INTEGRATING NEXTFLOW PIPELINES FOR PLANT TRANSCRIPTOMICS RESEARCH THROUGH A WEB-BASED SERVICE

Autores: Luiz Victor Soriano da Conceição,Lorrana Verdi Flores,Gustavo José Rodrigues Pereira,Arthur Catarino de Oliveira,Renan Terassi Pinto,Renato Ramos da Silva,Joaquim Quinteiro Uchoa,Luciano Vilela Paiva
Apresentador: Luiz Victor Soriano da Conceição • luizvictor0205@gmail.com
Resumo:
RNA-Seq experiments produce vast sequencing data that must be processed through multiple stages to yield reliable expression estimates. In plant research, these expression data are fundamental for differential expression analysis and reconstructing gene coexpression networks (GCNs) that underpin processes such as stress response, development, and secondary metabolism. However, disparate scripts and manual steps across tools impede reproducibility and scalability, while spurious alignments from unannotated genomic regions can bias network inference. To address these challenges, we developed ATLAS — a Nextflow workflow orchestrated via a FastAPI-Celery-driven web platform that enables asynchronous, decoupled execution of RNA-Seq analysis. It has the potential to standardize and reproduce the full pipeline within containerized environments, from automated data acquisition to transcript quantification and integrated quality reporting. Genome and annotation data are retrieved automatically from the JGI Data Portal based on species name and genome accession. First, a metadata query generates JSON output linking species to genome IDs and version information. Next, annotation and assembly version details are extracted into a second JSON descriptor, which, combined with a session token, drives the download of annotation files (GTF, primary transcripts, exons) and genome FASTA. RNA-Seq reads are then fetched from the SRA by library ID, obtained via JSON metadata produced by pre-query scripts, and downloaded for downstream processing. The pipeline encompasses GTF generation via gffread; construction of a combined transcriptome with decoy sequences to produce gentrome.fa and the associated decoys.txt; Salmon indexing with decoy handling for fast and accurate quasi-mapping; read trimming using Trimmomatic to remove adapters and low-quality bases; transcript quantification with Salmon quant to obtain TPM and raw count estimates; and quality control reporting through FASTQC aggregated by MultiQC. Integrating decoy sequences enhances quantification accuracy by filtering out spurious mappings, while unified reporting streamlines the evaluation of both data quality and analysis outcomes. The execution of the ATLAS pipeline is performed asynchronously through a decoupled architecture managed by Celery, with Redis acting as the message broker orchestrating communication between services. A Fast API server exposes a lightweight API that accepts HTTP requests containing execution parameters and enqueues them as Celery tasks. The full-service stack, including Fast API, Redis, Celery workers, and database, is containerized and orchestrated using Docker Compose, ensuring portability and consistent environments. By integrating each step — from annotation conversion and decoy-enhanced indexing to adaptive read trimming and rigorous quality assessment — into a cohesive Nextflow workflow, ATLAS delivers reliable, future reproducible expression data optimized for plant gene coexpression network analysis. Its asynchronous, Celery-driven execution ensures scalability and responsiveness for multi-user platforms, while containerized deployment guarantees consistent environments. Finally, ATLAS has the potential to accelerate transcriptomic studies based on bulk RNA-seq in plants of biotechnological interest.
Palavras-chave: Nextflow, RNA-Seq, Salmon, MultiQC, decoy transcriptome, Trimmomatic, reproducible pipeline, Celery, ATLAS.
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125716

EEfinder: Automated Detection of Bacterial and Viral Horizontal Gene Transfer Events in Eukaryotic Genomes

Autores: Yago José Mariz Dias,Filipe Zimmer Dezordi,Gabriel da Luz Wallau
Apresentador: Yago José Mariz Dias • yag.dias@gmail.com
Resumo:
Genetic information is usually passed from parent to offspring through a process known as vertical gene transfer. However, horizontal gene transfer (HGT) represents an alternative mechanism, enabling the exchange of genetic material without the need for a parental relationship. It can occur between different species, genera, families, or even orders. This phenomenon has been observed across distant lineages, including between prokaryotes/viruses and eukaryotes. When HGT events occur in germ line cells and do not reduce the fitness of the host species, the bacterial or viral sequences may increase in prevalence and become fixed in the population. These sequences, known as bacterial endogenous elements (EBEs) and viral endogenous elements (EVEs), are the result of those endogenization events. They serve as genomic fossils, offering insights into past infections providing valuable information about host-pathogen interactions, coevolutionary processes, and also aid in filtering out false positives in metagenomic analyses. Despite their importance, the field has lacked standardized methodologies for the discovery of endogenous elements, resulting in studies reporting varying numbers of elements even within the same genome version. To address this, we developed EEfinder, a general-purpose tool for the identification and classification of endogenous bacterial and viral sequences in eukaryotic genomes. Our approach is based on commonly used methods described in the literature and comprises six steps: data cleaning, similarity search through sequence alignment, filtering candidate elements, taxonomy assignment, merging of truncated elements and flanks extraction. To assess the sensitivity of our tool, we benchmarked EEfinder reproducing two published studies: a EVE screening on Aedes aegypti Aag2 genome; and the identification of a large EBE on Armadillidium vulgare genome. In the EVEs benchmark using the Aedes aegypti (Aag2) genome, we recovered 357 of the 365 elements previously reported, and additionally identified 66 novel elements exclusively detected by EEfinder. In the EBEs benchmark, EEfinder detected a locus consistent with the position previously identified through wet-lab experiments. Benchmarking also indicated that EEfinder does not require significant computational resources and can be run efficiently on personal computers. Hence, EEfinder was the first general-purpose open-source tool which provides a reproducible, automated, and low-resource approach for systematically identifying endogenous elements in eukaryotic genomes.
Palavras-chave: Paleovirology, Horizontal gene transfer (HGT), Software Benchmark, Endogenous Viral Elements (EVEs), Endogenous Bacterial Elements (EBEs), Viral Integration
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125738

Identification of Potential Niches of Metastatic Cells in Single-Cell Expression Data in Uveal Melanoma

Autores: Felipe Haddad,Wilson Araújo da Silva Jr
Apresentador: Felipe Haddad • felipehaddad.h@gmail.com
Resumo:
Uveal melanoma (UM) is a rare malignant neoplasm of the eye, originating in the intermediate layer of the uvea, a structure composed of the ciliary body, choroid, and iris. Although it accounts for only 3 to 5% of all melanoma cases, UM presents a high rate of metastasis, with a strong hepatic tropism, which significantly contributes to its high lethality. Despite arising from melanocytes—cells also present in cutaneous melanoma—UM exhibits distinct biological and molecular characteristics that directly influence its metastatic behavior and the scarcity of effective therapeutic alternatives. In this context, transcriptomic approaches using single-cell RNA sequencing (scRNA-seq) have proven essential for investigating intratumoral heterogeneity, enabling the precise identification of cellular subpopulations involved in the metastatic process of UM. This resolution allows for the characterization of tumor cells with gene expression signatures compatible with epithelial-mesenchymal transition (EMT), a fundamental biological process in which epithelial cells acquire mesenchymal characteristics, including increased migratory and invasive capacity. The detection of EMT signatures in individual tumor cells may significantly contribute to the identification of new therapeutic targets and more effective clinical protocols, with potential impact on the prognosis and management of uveal melanoma.
The present study aims to analyze single-cell transcriptomic data to identify gene regulatory circuits associated with EMT in UM. For this purpose, we used publicly available data from the GEO repository (accession code GSE139829), comprising a total of 59,995 cells. Data processing was carried out using the Seurat package, including quality filtering, normalization, and clustering analysis based on gene expression profiles, which allowed discrimination between tumor cells and components of the tumor microenvironment. Subsequently, cell cycle analysis was performed based on the expression of specific genes for the G1, G2, and M phases, according to the gene sets available in the Seurat library. The investigation of potential niches of metastatic cells employed an EMT signature composed of the following canonical EMT markers: SNAI1, SNAI2, CDH1, JUN, VIM, FN1, PRRX1, TWIST1, TWIST2, ZEB1, and ZEB2, extracted from the Hallmarks collection of the Broad Institute.
As preliminary results, two cellular groups associated with EMT activity were identified (clusters 17 and 21). These two sets of cells display mesenchymal phenotypic behavior, typical of metastatic cells. Cluster 21 showed proportions of cells in the G1, S, and G2/M cell cycle phases of 88%, 8%, and 4%, respectively, consistent with invasive yet non-proliferative subpopulations that may be responsible for metastasis initiation. Further analyses will be conducted to identify potential genetic markers applicable to the monitoring or treatment of metastatic tumors.
Palavras-chave: Uveal Melanoma, Single-cell RNA-seq, Epithelial-Mesenchymal Transition
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125748

Analysis of Medulloblastoma Regulatory Networks from Single-Cell Transcriptomes

Autores: Gustavo Lovatto Michaelsen,João Vitor Almeida da Costa,André Tesainer Brunetto,Marialva Sinigaglia,Rodrigo Juliani Siqueira Dalmolin
Apresentador: Gustavo Lovatto Michaelsen • guga.micka@gmail.com
Resumo:
Medulloblastoma (MB), the most common malignant pediatric brain tumor, comprises four molecularly distinct subgroups (WNT, SHH, Group 3, and Group 4) with divergent clinical outcomes. While bulk genomic studies have characterized subgroup-specific drivers, the regulatory networks governing cellular intratumoral heterogeneity and regulation remain poorly resolved. Identifying transcriptional regulators, known as master regulators (MRs), can elucidate the dysregulated pathways underlying MB intracellular heterogeneity and uncover potential treatment targets. In this study, we used a single-cell RNA sequencing dataset with 28 primary MB tumors totaling 39.946 cells to investigate subgroup-specific regulatory architectures and their links to cellular intratumoral heterogeneity, focusing on the SHH, Group 3, and Group 4 subgroups. Cell states were annotated using non-negative matrix factorization (cNMF), identifying transcriptional programs specific to each cell. Cells belonging to the metaprogram (A) contained markers of cell cycle activity, metaprogram (B) was primarily characterized by ribosomal and translational initiation/elongation genes reflecting undifferentiated progenitors, and metaprogram (C) contained well-recognized neuronal differentiation markers reflecting a more differentiated neuronal-like program. Additionally, with cNMF we identified high cellular plasticity (HCP) state cells. This subpopulation exhibits stem-like properties and also active proliferation, therefore differing from traditional cancer stem-like cells. Afterwards, we utilize SCENIC (Single-Cell rEgulatory Network Inference and Clustering), a method capable of inferring gene regulatory networks (GRNs) and the regulatory activity of each regulon in each individual cell. With SCENIC, we inferred the MB's GRN and identified the MR activity in all cell states present across the MB subtypes. Cell-cell communication analysis using the R/Bioconductor CellChat package reviewed the main signaling pathways active between cell states. Altogether, our results help elucidate the complex regulatory architectures underpinning intratumoral heterogeneity and cellular plasticity in medulloblastoma (MB) subgroups (SHH, Group 3, and Group 4) through single-cell transcriptomic analysis. The inferred regulatory networks via SCENIC and cell-cell communication pathways further highlight subgroup-specific transcription factors and signaling interactions that may orchestrate cellular plasticity and regulation. HCP-associated regulons or intercellular signaling hubs may serve as actionable targets for disrupting resilience and progression. Future therapeutic strategies aimed at eradicating HCP cells or modulating their regulatory drivers could mitigate disease progression and improve outcomes in this pediatric malignancy.
Palavras-chave: medulloblastoma, single-cell RNA-seq, regulatory networks, cell-cell communication
#1125751

Cross-species analysis of Neuronal Activity Prediction in magnocellular neurons based on single-cell transcriptomic data

Autores: Beatriz Andrade de Souza,Victor Jardim Duque,João Victor Silva Nani,André de Souza Mecawi
Apresentador: Beatriz Andrade de Souza • andrade.beatriz@unifesp.br
Resumo:
The hypothalamus integrates interoceptive and exteroceptive sensory signals, regulating neural circuits that control endocrine, autonomic, and behavioral responses essential for maintaining homeostasis. Its cytoarchitecture includes specialized nuclei, such as the supraoptic and paraventricular nuclei, which are central to hydromineral regulation. These nuclei are predominantly composed of magnocellular neurons (MCNs), which produce the hormones vasopressin (AVP) and/or oxytocin (OXT), depending on the physiological stimulus. To investigate the evolutionary conservation of MCNs, we performed an integrative multi-species analysis using single-nucleus transcriptomic data from the hypothalamus of humans (Homo sapiens), macaques (Macaca fascicularis), marmosets (Callithrix jacchus), rats (Rattus norvegicus), and mice (Mus musculus). Data processing and clustering were conducted using the Seurat package (v5.0). MCNs were identified based on previously established markers, and subpopulations were classified according to the relative expression of AVP and OXT using normalized data. Genes were mapped to their human orthologs and integrated using the BENGAL pipeline (BENchmarking StrateGies for Cross-species Integration of Single-cell RNA Sequencing Data). Neuronal activity was estimated using NEUROeSTIMator, a deep learning model that integrates transcriptomic signals to infer neuronal activation, excluding 22 marker genes from the analysis. Neuronal activity scores were correlated with gene expression profiles using Spearman correlations. Our analyses of the integrated multi-species dataset revealed that the gene PCSK1, known for its role in hormonal regulation, neuroendocrine development, and metabolic responses, showed a positive correlation with neuronal activity in both MCNs-AVP and MCNs-OXT. In MCNs-AVP, we identified 75 conserved genes positively associated with neuronal activity, including FOSB, SYNPR, CDH13, and NELL1. These genes are primarily linked to pathways involved in synapse organization and structural regulation. In MCNs-OXT, 21 conserved genes were found to correlate with neuronal activity, such as CRELD1, ITM2C, and CCNC. These genes are significantly associated with actin cytoskeleton dynamics, including actin filament capping, depolymerization regulation, and cytoskeletal organization. Additional conserved genes were observed: FOSB, SLIT2, and NPAS4 were enriched in MCNs-AVP, whereas DNM3, CNTN5, ADGRB3, and NRXN1 were consistently found in MCNs-OXT. Furthermore, we observed significantly higher neuronal activity in MCNs-AVP compared to MCNs-OXT (p<0.05) in humans, macaques, marmosets, and rats. In mice, however, both clusters exhibited similar levels of activity. Correlation analyses showed that genes most associated with neuronal activity varied across clusters and species. An exception was observed in MCNs-OXT of marmosets and macaques, where both shared a positive correlation with PCK1 and a negative correlation with DENND6A. In MCNs-AVP, genes such as NPAS4 (human), NR4A3 (marmoset), COX4I1 (macaque), GEM (rat), and AC134222.4 (mouse) showed a positive correlation with neuronal activity. This study enhances our understanding of the functional conservation of MCNs across species and highlights key genes preserved throughout evolution and their potential roles in magnocellular neuron physiology
Palavras-chave: Hypothalamus,Evolutionary Conservation, Neuronal Activity, Single-nucleus Transcriptomics (snRNA-seq)
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125764

Insights into Hereditary Cancer Risk: Characterization of Germline Variants in a Clinical Diagnostic Setting in Northeast Brazil

Autores: IAGHO JOSÉ LIMA DINIZ,Ana Paula Almeida Cunha,DÉBORA CRISTINA SANTOS SILVA,ISABELLA ROMEIRO DE PAULA SENA,SUZANY DE SOUSA MORAES,CAMILA NORONHA MARQUES DE ARAUJO,ROSSY-ERIC SOARES,FLAVIA CASTELLO BRANCO VIDAL CABRAL
Apresentador: Ana Paula Almeida Cunha • ana.cunha@labcedro.com.br
Resumo:
Next-generation sequencing (NGS) has revolutionized genetic diagnostics by offering high-throughput, cost-effective approaches that enhance the detection of germline alterations in hereditary diseases. Hereditary cancer predisposition panels play a critical role in uncovering clinically relevant variants associated with tumor susceptibility. In this study, we evaluated genetic variants detected in 26 patients diagnosed with or with a family history suggestive of hereditary cancer syndromes, who underwent genetic testing using HCPPs over a 13-month period (January 2024 to January 2025). Variant classification was performed using ACMG guidelines. The study was approved by the Research Ethics Committee under protocol 7.452.185. Patient data, including age, sex, and personal and familial cancer history, were collected and integrated with variant findings using R language (version 4.5). A total of 28 distinct variants were identified across 21 genes implicated in cancer predisposition, such as BRCA1, BRCA2, APC, TP53, MUTYH, and RAD51C. Most alterations were missense (67.9%), followed by frameshift, nonsense, and splice-site variants. Variant classification revealed that 46.4% of variants were of uncertain significance (VUS), and 35.7% were likely pathogenic or pathogenic, primarily in high-penetrance genes such as BRCA2 and TP53. Notably, one patient harbored two concurrent variants in different genes, highlighting the potential for multigene interactions in hereditary cancer syndromes. The mean age of participants was 54 years, and 92% were female. Personal history of cancer was reported by 76.9% of individuals, predominantly breast and ovarian cancers, and family history of cancer was documented in 69.2% of cases. Among individuals with both personal and familial cancer history, 61.5% carried variants classified as VUS or higher. This preliminary analysis underscores the importance of multigene panels in identifying germline variants with potential clinical implications for cancer predisposition. It also highlights the interpretive challenges posed by variants of uncertain significance and the need for continuous curation and functional validation. Our findings contribute to the understanding of hereditary cancer risk in a clinical setting involving patients from Northeast Brazil, which is an underrepresented population in genomic research, and support the integration of genetic and phenotypic data to enhance variant interpretation and patient care.
Palavras-chave: Hereditary cancer, genetic variants, panel testing
#1125787

Cross-Tissue Regulatory Network Inference Identifies Deregulation of EGR2 and FOXJ1 in Alzheimer’s Disease

Autores: MARCELLA VITORIA BELEM SOUZA,Gilderlanio Santana de Araujo
Apresentador: MARCELLA VITORIA BELEM SOUZA • marcellabelem12@gmail.com
Resumo:
Alzheimer’s disease (AD), the most prevalent form of dementia, is characterized by progressive neurodegeneration affecting key brain regions, including the hippocampus, entorhinal cortex, and cingulate cortex, each exhibiting distinct molecular and pathological profiles. Mounting evidence highlights the pivotal role of transcription factors (TFs) in regulating the complex pathological cascade underlying AD. A comprehensive understanding of the mechanistic drivers of AD pathogenesis is crucial, particularly in elucidating how TFs influence phenotypic outcomes and exhibit region-specific regulatory dynamics. We conducted differential expression analysis of TFs and performed brain regulatory network reconstruction across four cohorts: ROSMAP, MAYO, MSBB, and GSE125283. Differential expression analysis was carried out using the edgeR package and analysis the sensitivity and specificity of differentially expressed transcripts using ROC curves with the pROC package in R. For network reconstruction, we employed a three-step pipeline from the RTN package: (1) computation of mutual information between each regulator and all potential targets, (2) elimination of non-significant associations via permutation analysis, and (3) removal of unstable interactions through bootstrapping, followed by application of the ARACNe algorithm. Furthermore, we applied one-tailed gene set enrichment analysis (GSEA-1T) to identify regulons associated with specific phenotypes, and two-tailed GSEA (GSEA-2T) to determine whether these regulons exhibited positive or negative associations with the phenotypes. To support functional interpretation, we annotated our findings with transcription factor evidence-data from TFLink and integrated transcriptome-wide association signals obtained from the TWAS Atlas for AD. We identified 12 transcription factors (TFs) that were deregulated across all examined brain regions: HIPK2, TSC22D4, MAFK, WWTR1, FOXJ1, GATA2, EGR2, FOXC1, MSX2, EGR4, FOSB, and ADCYAP1. Notably, FOXJ1 exhibited downregulation across multiple regions, including the Posterior Cingulate Cortex, Inferior Frontal Gyrus, and Parahippocampal Gyrus. Likewise, EGR2 was upregulated in the Inferior Frontal Gyrus, Parahippocampal Gyrus, and Fusiform Gyrus. The regulon associated with FOXJ1 exhibited a differential enrichment score (dESm) of -1.39, indicative of transcriptional repression in Alzheimer's disease (AD). Conversely, the EGR2 regulon showed a dESm of 1.41, suggesting transcriptional activation during AD progression. These opposing patterns of regulatory activity are consistent with insights derived from transcriptional network analyses. FOXJ1 was found to regulate 12 negatively regulated genes and 2 positively regulated genes and showed moderate predictive values across tissues (0.72 ≤ AUC ≤ 0.78), while EGR2 targets 4 negatively regulated genes and 2 positively regulated genes and showed lower predictive values (0.63 ≤ AUC ≤ 0.67). Notably, within these regulons, two genes - FOSB and AEBP1 - emerged as particularly significant. FOSB, itself a transcription factor regulated by EGR2, and AEBP1, regulated by FOXJ1, both demonstrated robust association signals with AD across multiple transcriptome-wide association studies, specifically in the Anterior Cingulate Cortex and Dorsolateral Prefrontal Cortex. Functional enrichment analysis of these regulatory networks revealed significant associations with key biological processes implicated in AD, including synaptic plasticity, transcriptional activation, GABAergic signaling, humoral immune regulation, and extracellular matrix organization. Collectively, these findings highlight EGR2 and FOXJ1 as potential transcriptional modulators with pivotal roles in the molecular mechanisms underlying Alzheimer's disease.
Palavras-chave: Alzheimer's disease, Transcription factors, Regulatory networks, Differential expression, Transcriptomics
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125813

Bioinformatic analyses of coding and non-coding RNAs in Brevipalpus yothersi to study the interaction with citrus leprosis virus C, acaricide response and virome composition.

Autores: Daniel Gonzalez Ibeas,Pedro Luis Ramos-González,ALINE DANIELE TASSI,Laura Rossetto Pereira,Daniel Carrillo,Eric Roberto Guimarães Rocha Aguiar,Federico Ariel,Eduardo Chumbinho de Andrade,Daniel Júnior de Andrade,Juliana Freitas-Astúa,Ricardo Harakava,Valdenice Moreira Novelli,Elliot Watanabe Kitajima
Apresentador: Daniel Gonzalez Ibeas • gonzalez.ibeas@protonmail.com
Resumo:
Brevipalpus mites cause economic losses in a large number of horticultural and ornamental plants through the transmission of viruses. Affected crops include citrus and coffee, both of relevance for Brazil. In particular, Brevipalpus yothersi is responsible for the transmission of citrus leprosis, a viral disease especially problematic in the state of São Paulo due to the significance of orange crops. Much effort has been done in understanding the plant-virus interaction, but Brevipalpus mites remain less studied at the genetic level. In genome and transcriptome data, protein-coding sequences have been usually the center of the research, but they typically account for a minor percentage of the whole gene space, whereas several studies suggest that the vast majority of a genome is made of structural and regulatory elements essential to study the biology of species, and to address fundamental questions or practical problems in both academic and applied research. Since a large proportion of those non-coding areas are transcribed, they can be characterized by transcriptomic approaches, including long non-coding RNAs and several types of small RNAs. This work includes the scaffolding of the current genome draft of Brevipalpus yothersi, and transcriptome analyses for the annotation of coding and non-coding areas of its genome, as well as to answer questions regarding the virus-mite interaction, pesticide response and gene silencing pathways in mites. Taking advantage of those research deliveries, the proposal also addresses the development of new-generation pesticides based on RNA interference technology, as an alternative to conventional chemical acaricides. Additionally, exploration of the whole set of viruses that these vectors are able to host becomes of relevance to evaluate their potential as a source of emergent pathogens and inter-crop virus spread, due to the polyphagous nature of these organisms. Forty-five Illumina transcriptome libraries corresponding to 6 Brevipalpus species from 9 different countries and several plant species, including field and laboratory populations, were sequenced to study virome composition. More than 1 billion reads have been used for a homology-based comparison at the protein sequence level against public repositories with Kaiju, a focused analysis of citrus leprosis viral species at the nucleotide level by read mapping with Bowtie2, and a de novo transcript assembly with Trinity and subsequent identification of contigs of viral origin with the machine learning classifier Virsorter2. All the software was run under a Linux OS. Preliminary results show that the virome is primarily structured by geographic location rather than by host plant or mite species and that, despite showing high complexity, identified plant viruses other than those traditionally reported for Brevipalpus mites were rare or low abundant, suggesting specificity as vectors.
Palavras-chave: Bioinformatics, transcriptomics, genomics, acarology, virology, virome
★ Running for the Qiagen Digital Insights Excellence Awards
#1125819

Integrative Transcriptomic Profiling Reveals Novel lncRNA–miRNA–mRNA Regulatory Networks in a Neurohormonal Model of Cardiac Hypertrophy

Autores: Sebastián Urquiza-Zurich,Sebastián Leiva-Navarrete,Francisco Sigcho-Garrido,Rodrigo Juliani Siqueira Dalmolin,Paulo de Paiva Rosa Amaral,Sergio Lavandero,Vinicius Maracaja Coutinho
Apresentador: Sebastián Urquiza-Zurich • seba.urquiza.z@gmail.com
Resumo:
Pathological cardiac hypertrophy (PCH), a key feature of cardiac remodeling in response to chronic stress stimuli such as mechanical and neurohormonal activation, is a major risk factor for heart failure and sudden cardiac death. This process is characterized by increased cardiomyocyte size, reactivation of fetal gene expression programs, and widespread transcriptomic reorganization. While protein-coding genes have been extensively studied in cardiac hypertrophy (CH), growing evidence suggests that non-coding RNAs (ncRNAs), including long non-coding RNAs (lncRNAs) and microRNAs (miRNAs), play crucial regulatory roles in modulating gene expression and signaling pathways involved in cardiac growth, metabolism and cell survival. In this study, we performed total and small RNA-seq in neonatal rat ventricular myocytes (NRVMs) stimulated with norepinephrine (NE), a well-established in vitro model of neurohormonal stress-induced hypertrophy. Additionally, based on the poor ncRNA annotation in rats, we implemented a comprehensive transcriptome assembly which included coding potential estimation of new lncRNAs through FEELnc and miRge3.0 for miRNAs identification. Our data revealed extensive remodeling of the transcriptome, including the upregulation of 196 lncRNAs out of 6,434 identified. Many of these were novel, and either intergenic or intronic based on a 100 kilobases (kb) genomic window, to genes with known roles in cardiac function or stress responses. For instance, the new lncRNA MSTRG.1800 lies near Frmd8, linked to inflammatory signaling, well documented in CH, suggesting possible regulatory functions in inflammation in heart cells. Intronic lncRNAs such as MSTRG.6214 (within Lats2) may modulate pathways like Hippo signaling, involved in remodeling and cell growth process. Although unannotated in major cardiac lncRNA databases, their expression patterns and genomic location context point to biologically meaningful roles. Downregulated DE miRNAs such as miR-708, miR-511, and miR-652 were negatively correlated (Pearson r ≤ –0.8, p < 0.05) with specific lncRNAs, supporting a competing endogenous RNA (ceRNA) model. These miRNAs are known to regulate cardiomyocyte proliferation, apoptosis, fibrosis, and inflammation. To further investigate these interactions, we performed computational RNA–RNA interaction analyses to predict novel interactions of miRNAs binding to lncRNAs, supporting their potential sponge activity. Additionally, we looked at miRNA known targets through the miRDB web tool with a binding score above 85. The final ceRNA networks were visualized using Cytoscape, revealing clusters of co-expressed and co-regulated RNAs functionally connected to calcium signaling, apoptotic, and metabolic pathways. Collectively, our findings reveal a complex and dynamic ceRNA network operating in NE-induced PCH, offering a rich catalog of novel ncRNAs candidate regulators in rat model and providing a systems-level framework for future mechanistic studies in cardiovascular disease.
Palavras-chave: cardiac hypertrophy, lncRNAs, norepinephrine, transcriptome remodeling, ceRNA network
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125824

Graph Neural Networks in Precision Medicine: Predicting Clinical Outcomes of Infectious Diseases

Autores: Natalia Mansur
Apresentador: Natalia Mansur • nataliavsmansur@gmail.com
Resumo:
Understanding how gene expression shapes the immune response to infectious diseases is essential for uncovering mechanisms of pathogenesis, identifying therapeutic targets, and improving diagnostic strategies. Transcriptomic signatures identified through machine learning (ML) techniques are increasingly being explored to support precision medicine in this context. However, the high dimensionality and sparsity of transcriptomic data often lead to overfitting and reduced model stability. It is suggested that considering gene relationships may improve prediction accuracy in heterogeneous biological data. This project aims to explore the use of graph neural networks (GNNs) to model gene expression data within the context of gene-gene relationships, aiming to develop more robust and accurate classifiers for infectious disease outcomes. As a case study, we focus on Mycobacterium tuberculosis infection, a major global health challenge. Current tuberculosis (TB) diagnostics lack sensitivity and often fail to identify individuals with latent infection who are at high risk of progressing to active disease. RNA-seq data from individuals with varying TB statuses were analyzed, and differentially expressed genes were identified between progressors and non-progressors, as well as between latent and active TB cases. A GNN-based binary classifier is currently being trained to predict clinical outcomes based on both expression profiles and gene network topology. Preliminary results using progressor and non-progressor expression data and an initial PPI matrix reached 95.45% test accuracy, 93.70% validation accuracy, 95.43% test F1-score, and 93.36% validation F1-score. Next steps involve refining the biological network, by integrating selected genes using publicly available protein–protein interaction and co-expression data. Further on, it will be improved model generalization, and applied layer-wise relevance propagation to interpret predictions. The most predictive genes will be functionally analyzed and mapped onto interaction networks to identify key regulatory elements. This approach supports the development of precision diagnostics and deepens our understanding of host-pathogen interactions in TB and other infectious diseases.
Palavras-chave: Machine Learning, Graph Neural Networks, Systems Biology, Transcriptomics, Tuberculosis, Precision Medicine
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125833

Single-Cell RNA Transcriptomic Atlas of the Human Placenta Across Gestation

Autores: Rafaela Giraçol da Cruz,Victor Jardim Duque,João Victor Silva Nani,André de Souza Mecawi
Apresentador: Rafaela Giraçol da Cruz • rafaela.giracol@unifesp.br
Resumo:
The human placenta is a dynamic and heterogeneous organ essential for sustaining pregnancy and supporting fetal development. Its functions rely on tightly regulated gene expression programs, and disruptions in its cellular composition or transcriptional activity are linked to complications such as preeclampsia and gestational diabetes. This work aims to construct a comprehensive single-cell transcriptomic atlas of the human placenta under physiological and pathological conditions, focusing on RNA-level dynamics and leveraging recent advances in single-cell sequencing technologies.
To achieve this, a systematic review was conducted to identify relevant scRNA-seq and snRNA-seq studies. Seventeen datasets were curated and downloaded from public repositories, including NCBI GEO and the Human Cell Atlas. Following acquisition, raw data were preprocessed using standard bioinformatics pipelines. Quality control (QC) was performed using Seurat v5, in which cells with high mitochondrial RNA content or extreme transcript counts were filtered out. RNA expression matrices were normalized and integrated by gestational stage using Reciprocal Principal Component Analysis (RPCA), which minimizes batch effects while preserving biological variability. In total, 780,483 cells were processed across the first, second, and third trimesters.
Transcriptomic profiles were used to identify cell types through the ScType tool, which annotates clusters based on known RNA markers. Additional manual validation employed recent literature, particularly for placental-specific populations. A dynamic shift in cell composition was observed across trimesters, with 11, 12, and 13 distinct cell types annotated in each respective period.
Differential gene expression analysis was performed using the Wilcoxon rank-sum test implemented in Seurat. Tens of thousands of Differentially Expressed Genes (DEGs) were identified, capturing gestational stage-specific transcriptional changes across cell types. These genes were further characterized using functional annotation tools. Classification via the IUPHAR database highlighted differentially expressed GPCRs, enzymes, and transcription factors, emphasizing their relevance in hormone signaling and immune modulation. Enrichment analysis using Gene Ontology (GO) and KEGG databases revealed critical pathways involved in vascularization, immune response, and endocrine regulation, all derived from RNA-level signatures.
Gene co-expression networks were examined using Spearman correlation analyses, with a focus on top DEGs in key placental cell types. For instance, strong positive correlations were found between ADGRL4 and PECAM1 in vascular endothelial cells, and between PSG1 and CSH1 in fusing syncytiotrophoblasts, indicating tightly coordinated transcriptional programs that govern placental function.
In summary, this study presents a robust RNA transcriptomic atlas of the human placenta across gestation. The use of rigorous RNA-centric bioinformatic analyses enables a detailed view of gene expression dynamics and cellular diversity, offering foundational insights for understanding placental biology and its perturbations in disease. This work contributes valuable resources to the field of transcriptomics and maternal-fetal medicine.
Palavras-chave: Placenta, Single-cell RNA-seq, Transcriptomics, scRNA-seq integration, Transcriptomic atlas
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125855

Cytokine profiling-based machine learning for predictive symptomatic disease for seasonal infections and pandemic preparedness: a public health imperative

Autores: Marina Vieira Agostinho Galvani,Karina Da Silva Oliveira,Felipe Henrique da Cunha Xavier,Itauá Leston Araujo,Paulo Emílio Corrêa Leite,Gabriela Corrêa e Castro,Maria Celia Chaves Zuma,Roberta Soares Faccion,Suelen Martins Perobelli,Joanna Reis Santos de Oliveira,ADRIANO GOMES DA SILVA,Alda Maria Da Cruz,Hugo Caire,Wilson Savino,Zilton Farias Meira De Vasconcelos,Patrícia Brasil,Adriana Bonomo,Rômulo Gonçalves Agostinho Galvani
Apresentador: Rômulo Gonçalves Agostinho Galvani • rgalvani@lncc.br
Resumo:
Mass vaccination is crucial for public health. However, challenges remain for infections caused by mutating viruses like Influenza and SARS-CoV-2. Influenza vaccination reduces hospitalizations and deaths, as well as vaccination for COVID-19. Studies show that factors like advanced age and comorbidities increase the risk of severe diseases in infected individuals with cytokine dysregulation playing a key role. The "cytokine storm" exacerbates inflammation, worsening outcomes. Here, we employed innovative approaches using inflammatory biomarkers and T and B cell dynamics to enable predictive machine learning models to identify individuals at higher risk before infection. We used a pooled cross-sectional study which included two macroprojects conducted in Rio de Janeiro, Brazil. The first, explores the natural history of SARS-CoV-2 infection across diverse populations. The second study investigates healthcare workers’ clinical and immunological responses. Both studies identified infected patients by PCR and the outcome evaluated was if the infection was symptomatic or asymptomatic. Cytokine levels were quantified with a multiplex immunoassay for 48 cytokines, while T and B cell neogenesis was assessed through molecular quantification of TRECs and KRECs. Our aim was to distinguish asymptomatic from symptomatic cases by using immune patterns present up to 180 days in the pre-infection phase. Selected samples (105) were subsequently screened for cytokine detection and TREC/KREC quantification to evaluate lymphoid output dynamics. Correlation network modeling revealed distinct immune coordination. Asymptomatic individuals displayed tightly linked cytokine networks associated with leukocyte activation, chemotaxis, and lymphocyte proliferation, while symptomatic cases exhibited more complex networks, including hematopoiesis regulation. IL-8 served as a central hub in symptomatic pathways. The cohort was divided into 75% for training and 25% for testing. Training was performed using the repeated cross-validation method with 5-fold and 10 reps. A random search was performed for the mtry, ntree, nodesize and maxnodes parameters and the optimal model was established based on the F1-Score metric. Machine learning model identified key cytokines as predictors of disease severity, achieving a robust AUC of 0.878 as long as 1 of Accuracy in Symptomatic, 0,84 of Log-loss and 0.6 of F1-Score for symptomatic risk prediction. Thymic output biomarkers (βTREC, sjTREC) were comparable across groups. However, when B cell dynamics was accessed, symptomatic individuals showed elevated recent B-cell production (sjKREC levels), while asymptomatic cases exhibited previous clonal B-cell expansions. These findings suggest that the pre-existing immune profile, which precedes the infection, can predict if the disease will be asymptomatic and symptomatic This research underscores the potential of cytokine profiling and machine learning for early risk stratification, helping in vaccine prioritization in future pandemics or seasonal outbreaks.
Palavras-chave: Vaccination, Artificial Intelligence, Machine Learning, Cytokines, Pandemics, Seasonal, Immunology
#1125862

Comparative Analysis of Strategies for Plasmid Reconstruction: Insights into blaKPC-Carrying Resistance Vectors

Autores: Melise Chaves Silveira,Giovanna de Oliveira Santos,Bruna Ribeiro Sued Karam,Ana Paula D'Alincourt Carvalho Assef
Apresentador: Melise Chaves Silveira • melise@lncc.br
Resumo:
Antimicrobial resistance is a global health problem, and plasmids play a crucial role in the dissemination of associated genes among bacteria. The blaKPC gene is a significant contributor to carbapenem resistance in Gram-negative bacteria (GNB), particularly in Klebsiella pneumoniae, and is primarily spread via plasmids. The plasmid genetic diversity inspired different classification methods, like assigning them to Incompatibility groups (Inc). Plasmid`s molecular size also varies substantially, and Inc groups are eventually related to average plasmid size. To better classify and study these resistance vectors, the reconstruction of plasmid from whole genome sequencing (WGS) data is crucial. However, despite advances in short-read sequencing analysis, reconstructing individual plasmids from this technology remains challenging. Strategies to address this include plasmid binning tools from short reads assemblies, like GplasCC and MobTyper, the former making use of assembly graphs and the latter relying on comparison of curated databases only. Long reads sequencing technologies, like Oxford Nanopore are also a strategy, besides implicating more money investments and, if not associated with short reads, is prone to assembly errors. This study aimed to evaluate different methods for reconstructing plasmids carrying the blaKPC gene upon short-read sequencing data from GNB, mainly K. pneumoniae. The gold standard method used was wet lab techniques results: Pulsed Field Gel Electrophoresis (PFGE) with the S1 restriction enzyme associated with Southern blot hybridization with blaKPC probe, and band size predicted by Bionumerics software. We compared this with three bioinformatic approaches: (i) combined assembly of long and short reads, (ii) short reads assembly followed by MobTyper analysis, and (iii) short reads assembly followed by GplasCC analysis. Statistical analysis were conducted to compare the plasmid size predictions from each method against the Southern blot results using R libraries. A total of 17 WGS using short reads from five bacterial species were explored. Overall, the combined long and short reads approach yielded size estimates that most closely matched those from Southern blot. GplasCC tended to overestimate plasmid sizes and showed the poorest agreement with Southern blot. MobTyper demonstrated good agreement in predicting plasmid sizes, although results varied more for IncFIB/FII plasmids, those with higher molecular size. Plasmids from the IncU/X3 group were consistently easier to predict across all methods. Small plasmids, like the ones from IncQ1 group, have predicted size more homogeneous beyond different samples from the same Inc. These findings suggest that hybrid assembly is the most reliable sequencing strategy for plasmid reconstruction, besides its higher cost and not widely used. For bioinformatic tools approach using only short reads, MobTyper offers most reliable results for solving this problem without the need for extra wet lab techniques. Considering that short reads are broadly used, improving and evaluating binning tools is crucial.
Palavras-chave: Plasmid, Whole-genome sequencing, Antimicrobial Resistance
★ Running for the Qiagen Digital Insights Excellence Awards
#1125866

Gene Expression Analysis in 15 Ewing Sarcoma Cell Lines Following EWSR1::FLI1 Silencing: Identification of New EWSR1::FLI1-Regulated Genes

Autores: Marialva Sinigaglia,Júlia Wiederkehr,Alexsandro Vargas de Ávila,Bruno Léo Hammes,Monique Banik Siqueira,André Tesainer Brunetto,Daner Acunha Silveira
Apresentador: Daner Acunha Silveira • d.silveira.bioinfo@ici.ong
Resumo:
Ewing sarcoma (ES) is an aggressive malignancy of the bone and soft tissues that predominantly affects children and young adults. In the majority of cases, tumor development is driven by the fusion oncogene EWSR1::FLI1, a chimeric transcription factor that reprograms gene expression to promote proliferation, metastasis, and resistance to treatment. Despite extensive studies, the complete network of genes and pathways regulated by EWSR1::FLI1 remains to be fully elucidated. In this study, we investigated transcriptome data from the Gene Expression Omnibus (GEO) database, focusing on 15 ES cell lines subjected to EWSR1::FLI1 silencing. Differential expression analysis revealed six genes consistently modulated across all cell lines when comparing wild-type and silenced conditions. Heatmap visualization revealed a pattern of upregulation or downregulation associated with EWSR1/FLI1 status, highlighting these genes as potential direct targets or downstream effectors of the fusion oncogene. Among the six genes, we identified two transcription factors, FEZF1 and PAX7, that are already well characterized in ES for their roles in neural differentiation and tumor progression. Notably, we also identified FEZF1-AS1, a long non-coding RNA, as consistently downregulated upon EWSR1::FLI1 silencing. Although FEZF1-AS1 has been implicated in the progression of several cancers, including gastric, colorectal, and non-small cell lung carcinomas - where it promotes proliferation, invasion, and poor prognosis - its function in Ewing sarcoma remains completely unknown. The other three genes (CDON, MYOM2, and SRGAP1) also lack characterization in ES. As a next step, we aim to perform functional enrichment and gene correlation analyses to explore potential pathways and biological processes involving these genes. Our results uncover novel possible targets of EWSR1::FLI1 and provide a basis for future investigation into their potential roles in ES aggressiveness.
Palavras-chave: Ewing Sarcoma, EWSR1::FLI1; Transcriptomics Analysis
#1125875

Hi-C genome assembly and synteny analysis in Meliponini: A window into bee chromosomal evolution

Autores: Felipe Cordeiro Dias,Maria Cristina Arias
Apresentador: Felipe Cordeiro Dias • felipecordeiro210@gmail.com
Resumo:
Bees are essential pollinators in both natural and agricultural ecosystems, yet most species remain genomically understudied, especially regarding chromosomal architecture. Hi-C sequencing has enabled chromosome-level assemblies, even in non-model taxa, advancing evolutionary and functional genomics. Understanding chromosomal structure is crucial for tracing evolutionary changes across bee clades, as genome rearrangements may underlie important genetic and molecular patterns with potential impacts on morphology and behavior. In this context, we present Hi-C-based chromosome-level assemblies for two Meliponini species (Lestrimelitta limao and Tetragonisca angustula), offering novel insights into the genetic and evolutionary dynamics of this diverse and ecologically important tribe. Hi-C libraries were prepared from whole-body samples, sequenced and cleaned, while long-read PacBio HiFi data provided draft assemblies. We aligned the Hi-C reads to the draft assembly using BWA-MEM and then combined HiCExplorer to identify restriction sites and valid paired reads for matrix building, YaHS for scaffolding (with Juicer implementations) to generate chromosome-level assemblies, evaluated using QUAST and BUSCO. Hi-C contact maps were visualized through Juicebox. Gene prediction was performed with BRAKER3, integrating RNA-seq and Arthropoda protein data (OrthoDB), followed by functional annotation with InterProScan and eggNOG-mapper. To assess synteny, we compared the annotated protein sets of T. angustula, L. limao, and Melipona quadrifasciata (available in the literature) using BLASTp, followed by MCScanX to identify collinear blocks. Based on cytogenetic evidence suggesting n=17 as the ancestral karyotype, T. angustula was used as reference. Additional comparisons were made with Bombus terrestris, Apis mellifera, Nomada fabriciana, and Xylocopa dejeanii to explore broader Apidae relationships. We obtained high-quality chromosome-level assemblies using Hi-C data, with ~97% of the genomes anchored to the expected chromosome number (14 for L. limao, 17 for T. angustula), and strong BUSCO scores. Synteny results revealed strong gene content conservation across Meliponini species, despite substantial differences in chromosomal arrangement between M. quadrifasciata and L. limao, indicating independent rearrangement pathways upon a shared genomic framework. By comparing T. angustula to B. terrestris (from a sister-tribe) we observed a less defined syntenic pattern, which became even more fragmented when compared to Apis. The pattern was further diluted in comparisons with further Apidae bees like Xylocopa (Xylocopinae) and Nomada (Nomadinae), highlighting increased synteny breakdown with greater phylogenetic distance. These results underscore a Meliponini-specific pattern of chromosomal conservation, progressively lost across deeper Apidae splits. These findings reinforce the value of Hi-C for uncovering chromosomal architecture, enabling the detection of lineage-specific evolutionary signatures that might remain obscured in fragmented assemblies. The contrasting rearrangement patterns within Meliponini, coupled with the progressive loss of synteny in more distantly related taxa, suggest that chromosomal evolution in bees follows both conserved and divergent trajectories, shaped by phylogenetic distance. By expanding high-resolution genomic resources beyond traditional model species, this study contributes to a deeper understanding of genome evolution in Apidae highlighting the potential of Hi-C data to inform phylogenomics, functional genomics and the evolutionary basis of ecological adaptations.
Palavras-chave: Hi-C, Chromosome-level assembly, Comparative genomics, Synteny, Genome evolution, Bees
★ Running for the Qiagen Digital Insights Excellence Awards
#1125889

Modulation of Inflammatory Responses by Pesticides in Parkinson’s Disease

Autores: Juliana Paiva dos Santos Diniz,Gustavo Barra Matos,Tatiane Piedade de Souza,ANDREA KELY CAMPOS RIBEIRO DOS SANTOS,Bruno Lopes dos Santos Lobato,Gilderlanio Santana de Araujo
Apresentador: Juliana Paiva dos Santos Diniz • juliana.diniz@itec.ufpa.br
Resumo:
Parkinson's Disease (PD) is the second most prevalent neurodegenerative disorder, characterized by motor symptoms such as tremors, rigidity, and bradykinesia, as well as non-motor manifestations. Levodopa remains the standard treatment, however, its prolonged use can lead to levodopa-induced dyskinesia (LID). Genetic and environmental factors, such as pesticide exposure, contribute to the progression of PD and the development of LID, but the genetic and molecular mechanisms involved in this interaction are still poorly understood, especially in genetically diverse populations, such as those from Northern Brazil, which historically have a higher Indigenous contribution in its genetic architecture compared to other regions of the country. Gene expression studies using bulk RNA-seq approaches have the potential to reveal molecular profiles associated with the disease and its complications. Therefore, this study aims to analyze gene expression profiles in peripheral blood from PD patients, with and without LID, and assess associations with pesticide exposure, using bulk RNA-seq. For this, bulk RNA-seq data from peripheral blood of 46 individuals were used, distributed into three groups: controls (CT, n = 20), PD patients with LID (LID, n = 15), and PD patients without LID (NLID, n = 11). Sequence quality was assessed using FastQC and MultiQC, trimming was performed with FastP, alignment with STAR, and transcript quantification with HTSeq. Differential expression analysis was conducted using the EdgeR package, considering age, sex, experimental groups, and pesticide exposure as covariates. Principal component analysis (PCA) was performed using the factoextra package, and functional enrichment analysis of pathways and biological with clusterProfiler. From these methods, 94 differentially expressed genes (DEGs) were identified (p-value ≤ 0.05 and -0.5 ≤ FC ≤ 0), with 81 associated with the comparison between CT and LID, and 13 between CT and NLID. We correlated PC1, PC2, and PC3 with pesticide exposure and found a negative and significant correlation between PC3 and pesticide exposure (Spearman, R = -0.30; p = 0.04). In PC3, we identified 838 genes with contribution to gene expression variability, of which 37 were DEGs from the CT vs. LID comparison. These genes were subjected to functional enrichment analysis. The results indicated activation of pathways related to cell migration and immune response, including the genes SLC11A1, HBG1, BLMH, HBG2, and HBB, with emphasis on leukocyte chemotaxis and neutrophil migration. At the molecular level, functions associated with immune receptor activity and serine kinase activity were observed, which are essential for the activation of inflammatory responses. Regarding cellular components, the enriched pathways suggest the involvement of secretory granules, particularly ficolin-1-rich granules, which are responsible for the storage and release of inflammatory mediators. The analyses indicated the activation of immune and inflammatory pathways in peripheral blood, and reinforced the involvement of systemic immunological mechanisms, as well as the importance of investigating environmental and genetic mechanisms in genetically diverse and underrepresented populations in molecular studies.
Palavras-chave: Trascriptomics, Differential Gene Expression, Parkinson's Disease, Pesticide Exposure, Immune Response
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125904

Reconstruction of the Evolutionary Landscape of Biological Processes Involved in the Early Stages of the Metastatic Cascade

Autores: Gleison Medeiros De Azevedo,Epitácio Dantas de Farias Filho,João Vitor Ferreira Cavalcante,Rafaella Sousa Ferraz,Bruno William Fernandes Silva,Diego Marques Coelho,Rodrigo Juliani Siqueira Dalmolin
Apresentador: Gleison Medeiros De Azevedo • gmazevedo05@gmail.com
Resumo:
Metastasis is a complex process characterized by the spread of tumor cells and the formation of secondary tumor sites. The comprehension of this evolutionary state is a significant challenge in cancer prognosis and treatment. Despite advances in understanding its molecular mechanisms, the evolutionary origin of the genes involved in this process still needs to be explored. This study addresses this gap by investigating the evolutionary history of genes associated with the early stages of the metastatic cascade, focusing on biological processes such as cell adhesion, extracellular matrix organization, regulation of metalloproteinase activity, organization of cell junctions, cell extravasation, and epithelial-mesenchymal transition. Through analyzing protein-protein interactions (PPIs) networks and phylogenetic reconstruction using the GeneBridge package, it was identified the last common ancestors (LCAs) in which these genes are rooted. Our findings indicate that the complexity of metastatic mechanisms emerged progressively in different species throughout evolution. The Human-Metamonada LCA, one of the earliest eukaryotes, showed an enrichment of genes related to cell motility mediated by cytoskeletal modulation, while the Human-Choanoflagellata LCA, marking an important step toward multicellularity, demonstrated a significant increase in genes regulating cell-cell and cell-extracellular matrix adhesion, including integrins, cadherins, and catenins. The Human-Actinopterygii LCA, characterized by the development of a circulatory system, showed an enrichment of genes involved in cell extravasation and the major histocompatibility complex, suggesting evolutionary adaptations associated with tumor cell survival and dissemination. Our study reveals that ancestral genes and processes, originally essential for cellular homeostasis, have also been co-opted in cancer processes, promoting disease progression. Understanding this evolutionary basis may open pathways for developing more effective therapies, focused on targets conserved throughout evolution.
Palavras-chave: metastasis, evolution, systems biology, bioinformatics, protein-protein interaction network
★ Running for the Qiagen Digital Insights Excellence Awards
#1125915

Comparative Metagenomic Analysis of Nitrogen Cycling in Brazilian Mangrove Soils

Autores: Paulo Miguel Vieira de Sousa,Tallita Cruz Lopes Tavares,Aristóteles Góes Neto
Apresentador: Paulo Miguel Vieira de Sousa • miguel.v9045@gmail.com
Resumo:
Mangroves are unique ecosystems located between the continents and oceans. They are essential for marine species, nutrient cycling, and human communities, but have been under stress through the years due to the combined influences of climate change, pollution, and direct conversion and loss. This kind of degradation directly impacts the soil microbiome, essential for nutrient cycling, including nitrogen, phosphorus, sulfur, and carbon biogeochemical cycles, and pollutant removal. The advent of metagenomics contributed to many studies along the Brazilian coast that investigated the mangroves’ soil microbiome with different bioinformatic approaches, from the MG-RAST pipeline to more recent software and pipelines. This work’s objective is to analyze and compare the shotgun-metagenomics generated data available in the NCBI and MG-RAST databases under the same modern pipeline and identify taxonomical and functional patterns in the nitrogen cycle along the Brazilian coast, and to infer how microbiomes react to different forms of impact. After analyzing 34 samples under the SqueezeMeta pipeline in co-assembly mode, patterns were identified in the gene abundance among various environmental conditions. Samples were clustered based on the impact, with pristine mangrove displaying increased nitrogen fixation genes. Regarding nitrogen removal, among urban-influenced and shrimp-farming influenced mangroves, which tend to have a more significant nitrogen input, and in oil refinery-influenced mangroves, there is a tendency towards the enrichment of genes related to anaerobic processes, such as annamox and DNRA. Taxonomic profiles revealed the dominance of Pseudomonadota across all conditions, with significant contributions from Bacteroidota and Actinomycetota in impacted sites. These taxonomic and functional distinctions underline how anthropogenic pressures shape not only the microbial composition but also the functional capacities of mangrove soils. Understanding these shifts is critical for ecological restoration and the sustainable management of mangrove ecosystems in Brazil and other vulnerable tropical coasts.
Palavras-chave: metagenomics, mangroves, nitrogen cycle
#1125921

Comparative Analysis of Clustering-Based and Curated Model Approaches for Transposable Elements Characterization in Cenostigma pyramidale [Tul.]

Autores: Amanda Pedrosa da Câmara,João Pacifico Bezerra Neto,Ana Luíza Trajano Mangueira de Melo,José Ribamar Costa Ferreira Neto,ANA MARIA BENKO ISEPPON,VALESCA PANDOLFI
Apresentador: Amanda Pedrosa da Câmara • amanda.apc@ufpe.br
Resumo:
Transposable elements (TEs) constitute a substantial fraction of plant genomes and play critical roles in genome evolution, structural organization, and gene expression regulation. A thorough identification and characterization of TEs within the genome is essential for understanding these dynamics. Despite their prevalence and functional importance, the accurate identification and classification of TEs remain challenging due to their high sequence diversity, nested insertions, and frequent degeneration over evolutionary time. To address these challenges, various bioinformatic tools have been developed, each with distinct methodologies and analytical frameworks. The objective of this study was to characterize the transposable elements landscape of Cenostigma pyramidale [Tul.] and evaluate the relative effectiveness and complementarity of clustering-based and curated model approaches in TE analysis. To investigate the repetitive content we first used the RepeatExplorer pipeline, and the paired-end Illumina genomic reads were analyzed through the Galaxy platform. Sequences were clustered based on similarity using graph-based clustering algorithms, and the resulting clusters were classified by comparison to the Viridiplantae 3.0 TEs database. The clusters were further examined to assess the abundance and classification of TEs families. Complementarily, the Dfam TE Tools pipeline was applied to a genome assembly in scaffold format. Prior to annotation, a custom transposable element library was generated using RepeatModeler, which identified consensus TE sequences de novo from the Cenostigma pyramidale genome and additional Fabaceae genomic data. Using this custom library alongside the Dfam database based on profile hidden Markov models (HMMs), RepeatMasker was then employed to annotate and classify transposable elements. A set of output files was obtained, including detailed annotation reports (.out), summary tables (.tbl), genomic coordinates (.gff), and divergence metrics indicating the relative age of TE insertions. The RepeatExplorer analysis revealed 194 clusters corresponding to transposable elements, with Gypsy-like LTR retrotransposons accounting for 5.58 % of the C. pyramidale genome. A substantial proportion of clusters (551 out of 867) could not be classified, potentially representing novel or highly degenerated TEs. This approach enabled the detection of abundant elements without the need for a genome assembly, although the classification resolution was limited. In contrast, the Dfam-based approach identified ten distinct TEs superfamilies, encompassing elements from the LTR, LINE, and DNA transposon groups. The classification was more refined, allowing for divergence estimates between TEs copies and precise mapping within scaffolds. The abundance profile also revealed LTR retrotransposons as the predominant group (15.3%), with low divergence peaks suggesting recent transpositional activity. The approaches applied proved to be complementary. RepeatExplorerwas well-suited in identifying repetitive elements in species without an assembled genome, demonstrating high efficiency in identifying abundant and recent TEs. Dfam, on the other hand, offered more reliable and detailed classification, enabling a deeper insight into TEs genomic distribution and evolutionary dynamics. The combination of clustering-based and curated model approaches provides a more comprehensive view of the mobilome in C. pyramidale. While RepeatExplorer is well-suited for initial discovery and exploratory studies, Dfam excels in refinement, classification, and detailed genomic analysis.
Palavras-chave: Mobilome annotation, Retrotransposons, RepeatMasker, RepeatModeler, Genome evolution, Bioinformatics.
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125927

BR-FDP-EYE: Brazilian Forensic DNA Eye Phenotyping

Autores: Luiza Marques Prates Behrens,Guilherme da Silva Fernandes,Mateus Boiani,Marcio Dorn
Apresentador: Luiza Marques Prates Behrens • behrens.luiza@gmail.com
Resumo:
Forensic DNA Phenotyping (FDP) is a technique used to predict physical appearance traits, such as eye color, from genetic data. Eye color, a highly heritable genetic trait, is one of the most obvious and distinguishable externally visible characteristics (EVCs) employed in human identification. Previous genotype-phenotype association studies have identified several single nucleotide polymorphisms (SNPs), located in or nearby genes directly or indirectly involved in pigment synthesis, that contribute to eye color variation. In this study, we evaluated the predictive potential of 66 pigment-related SNPs in an admixed population (n=438) from Rio Grande do Sul, the southernmost state in Brazil. Our approach combined bioinformatics and machine learning techniques, we applied Tomek Links undersampling to address class imbalance and used VariantSpark for feature selection. Three classifiers - Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Multilayer Perceptron (MLP) - were trained with incremental feature addition to identify the optimal SNP subset for eye color prediction. We optimized each model using grid search for hyperparameter tuning, Leave-One-Out Cross-Validation (LOOCV), and 30 replicate runs per experiment. The outcome is BR-FDP-EYE, a population-specific web-based system for predicting eye color from DNA data. Using only 5 to 7 SNPs, BR-FDP-EYE achieves up to 85% overall accuracy and offers four classification approaches. The first option is a five-class model, distinguishing between Blue, Green, Hazel, Light Brown, and Dark Brown colors, with high sensitivity for Blue and Dark Brown (81-87%), though intermediate shades show reduced accuracy (up to 33%). Alternatively, users can choose between three simplified three-class models for improved performance, depending on their specific needs: (1) Blue, Intermediate (Green + Hazel + Light Brown), and Dark Brown; (2) Light (Blue + Green), Hazel, and Brown (Light Brown + Dark Brown); and (3) Blue, Intermediate (Green + Hazel), and Brown (Light Brown + Dark Brown). Among all evaluated markers, HERC2 rs12913832 and rs1129038 consistently emerged as the most informative SNPs across models. Additional SNPs, such as HERC2 rs7494942 and rs11636232, further improved prediction accuracy. The central role of HERC2 was previously identified as critical to eye color determination, and this was also confirmed in the Southern Brazilian admixed population. BR-FDP-EYE is specifically optimized for this population, where standard FDP systems often underperform, highlighting the need for population-specific predictors in both research and forensic contexts. The platform features a simple and user-friendly interface, making it suitable for applications in genetics research, anthropology, education, and potentially in law enforcement investigations.
Palavras-chave: BR-FDP-EYE, Human pigmentation, Eye color, Forensic DNA Phenotyping, Single Nucleotide Polymorphisms
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125930

Assessment of Metagenomic Assembly Strategies for Studying the Gut Microbiota in Hamsters Infected with SARS-CoV-2

Autores: Patrícia Brito Rodrigues,Douglas Terra Machado,Ana Tereza Ribeiro de Vasconcelos,Prof. Dr. Marco Aurélio Vinolo,François Trottein,Vinícius de Rezende Rodovalho
Apresentador: Vinícius de Rezende Rodovalho • vrr@lncc.br
Resumo:
Aging is associated with increased vulnerability to respiratory viruses, including SARS-CoV-2. Concomitantly, age-related gut microbiota dysbiosis may influence susceptibility to viral infections. Using golden hamsters (Mesocricetus auratus) as a preclinical aging model, we previously examined the impact of SARS-CoV-2 infection (strain BetaCov/France/IDF-0372/2020) on gut microbiota composition. Employing multi-omics analyses — including metagenomic profiling of cecal samples, plasma metabolomics, lung tissue transcriptomics, and clinical infection data — we identified age-dependent responses to COVID-19 in hamsters. Our initial metagenomic analysis relied on a reference-based approach with the integrated Mouse Gut Metagenomic Catalog (iMGMC) database. However, given the novelty of shotgun metagenomics for hamster gut microbiota studies, here we present a complementary approach with an assembly-based analysis to enhance insights into microbial dynamics in response to infection. We analyzed metagenomic data from aged (22-month-old) hamsters in two experimental groups: control (n=6) and SARS-CoV-2-infected (7 days post-infection, n=6), using modules in the MetaWRAP pipeline. First, host-derived sequences were removed, and raw reads underwent quality control (READ_QC module). Next, assembly was performed using either MEGAHIT or metaSPAdes (ASSEMBLY module). Assemblies were executed individually for each sample or through a co-assembly approach, in which all samples within each experimental group were pooled before assembly. Resulting contigs were processed for bins generation using the BINNING module with MaxBin2, metaBAT2, and CONCOCT. The bins underwent quality assessment via CheckM, retaining only those with ≥50% completeness and ≤10% contamination. The BIN_REFINEMENT module aggregated bins from individual assemblies and co-assemblies, while REASSEMBLE_BINS improved bin quality through read alignment and iterative reassembly. Final bins were subjected to taxonomic classification (CLASSIFY_BINS) and functional annotation (ANNOTATE_BINS). Comparative analyses revealed that assemblies generated by MetaSPAdes, including both individual assemblies and co-assemblies, showed improved contiguity, as indicated by higher N50 and lower L50 values. Notably, co-assemblies produced a greater total number of contigs, while both MetaSPAdes assemblies and co-assemblies consistently yielded the largest individual contigs across datasets. Analysis of the metagenome-assembled genomes (MAGs) revealed a predominance of bacteria from the Bacteroidota (formerly Bacteroidetes) and Bacillota (formerly Firmicutes) phyla. The infected group exhibited a marked proportional decrease in the abundance of Bacillota, accompanied by a corresponding expansion of Bacteroidota, compared to the non-infected group. Bacteria present in lower abundance, including the phyla Pseudomonadota (Proteobacteria) and Actinomycetota (Actinobacteria), exhibited similar relative abundances in infected and non-infected groups at the phylum level. Our reanalysis supports the utility of assembly-based approaches as potentially effective tools for investigating hamster gut microbiota metagenomic data and for characterizing host-viral interactions. These methods enable enhanced gene-centric functional annotation and strain-level resolution, though constrained by computational demands and limitations in detecting low-abundance species. Collectively, these findings provide critical insights into infection-driven microbiota dynamics, advancing our understanding of host–microbiota interactions in the hamster gut ecosystem.
Palavras-chave: SARS-CoV-2, coronavirus, metagenomics, shotgun, assembly
#1125934

Development of an Integrated Biodiversity Data System for the Araguaia Vivo 2030 Program: Supporting Research, Public Engagement, and Decision-Making in the Araguaia Basin

Autores: Cleon Xavier Pereira Júnior,Heder Filho Silva Santos,Higor Koakovski Pereira,Kevyn Menezes Carvalho,Mateus Barros Macedo,Rhewter Nunes
Apresentador: Rhewter Nunes • rhewter@ueg.br
Resumo:
The Araguaia Vivo 2030 program, an initiative of the Tropical Water Research Alliance (TWRA) and financially supported by the Research Support Foundation of the State of Goiás (FAPEG), is producing extensive data on biodiversity and hydrological resources across the Araguaia River Basin through collaborative, multidisciplinary research. To address the challenges of managing this information, a dedicated bioinformatics initiative was launched to create a centralized system for storing, organizing, and sharing the program’s diverse datasets. The system aims to facilitate access for researchers, policymakers, and the general public through an open, secure, and scalable infrastructure. A relational database model was designed to support the integration of data from various sources, including species occurrence records, genetic information, and environmental metadata. The backend was developed using NestJS and PostgreSQL, while the frontend interface was built with React to ensure usability across devices. The system includes authentication protocols and access level hierarchies to preserve data integrity while enabling collaborative data contribution and analysis. Data upload and download functionalities are already operational, and a public web interface with interactive maps has been made available as a prototype. Cloud deployment has been implemented via AWS services for testing and scalability. To ensure consistency and reusability of data, standard formats were defined in collaboration with biodiversity teams working on flora, fauna, and eDNA. The system also supports batch uploads using CSV templates, allowing efficient population of the database with large volumes of ecological data. Ongoing improvements include cross-referencing tools, mobile app development, and enhanced visualization modules for synthesized outputs. This system represents an important step in consolidating environmental information in a region with high ecological relevance and limited data infrastructure. It addresses practical challenges in environmental informatics and contributes to the open science movement in Brazil. Moreover, the platform plays a key role in advancing biodiversity conservation and sustainable resource management in the Araguaia Basin, while also fostering scientific capacity in underrepresented regions. The development of this system exemplifies the role of bioinformatics and computational biology in bridging data gaps between science and society. It not only supports ongoing research activities but also enables long-term data stewardship for the Araguaia Vivo program. A prototype of the Araguaia Vivo Data Management System (v.1) is available at: https://araguaiavivo.github.io/.
Palavras-chave: Biodiversity Informatics, Biological Databases, Data Management System, Environmental Monitoring, Decision Support
#1125959

GenRefBR: a web-portal for genetic information on the Brazilian biodiversity

Autores: Romildo Oliveira Souza Júnior,Bruno M. Silva,Alexandre Aleixo,Renato Renison Moreira Oliveira,Gisele Lopes Nunes
Apresentador: Romildo Oliveira Souza Júnior • mildo@mildo.dev
Resumo:
Brazil is globally recognized for its extraordinary biodiversity, encompassing a wide range of ecosystems and species with high levels of endemism. However, despite substantial documentation of the country's biodiversity, significant gaps remain in the genetic data of many native species. The lack of comprehensive genomic information limits our understanding of their evolutionary history, adaptive strategies, and conservation needs—an issue of growing concern as Brazil’s ecosystems face escalating threats from deforestation, illegal hunting, and climate change.

To address this gap, we developed GenRefBR (Genetic References of Brazilian species), a web-based platform designed to provide access to genetic data from Brazilian vertebrate species. The platform consolidates and continuously updates information from mitochondrial and nuclear genomes, including assembly metrics and original data accessions. Built using Python and Dash, GenRefBR enables dynamic visualization of genetic data based on specific research interests, such as taxonomic group, biome, or type of genetic information. Multiple filters can be combined to refine searches.

GenRefBR is powered by publicly available data from repositories such as GenBank, BOLD Systems, and Genomes on a Tree (GoaT). Data retrieval is based on a curated species list assessed by ICMBio’s System of Extinction Risk Assessment (SALVE). Currently, GenRefBR provides data on 9,777 vertebrate species, including detailed taxonomic, genetic, ecological, and conservation status information. A key feature of the platform is its ability to highlight significant gaps in genomic coverage. For example, only 10.81% of Brazilian species in the database have complete mitochondrial genomes, and just 4.42% have available nuclear genome data. The platform also allows users to generate customized datasets based on specific organellar markers and/or complete mitogenomes, which can be valuable for phylogenetic research and environmental monitoring using eDNA (DNA metabarcoding).

Beyond consolidating existing data and enabling personalized database creation, GenRefBR plays a strategic role in supporting decision-making. It facilitates the identification of species or taxonomic groups with insufficient genetic data, guiding future sequencing efforts to expand genomic coverage. This is essential not only for advancing our understanding of genetic diversity and evolutionary dynamics but also for informing adaptive conservation strategies in the face of environmental change.

As future perspectives, we plan to expand GenRefBR to include data from other taxonomic groups, such as plants and invertebrates, and to make this information publicly accessible in the near future. This expansion will further enhance the platform’s utility for biodiversity research and conservation planning, strengthening its role as a comprehensive genomic reference for Brazil’s biodiversity.
Palavras-chave: genomic databases, genomic monitoring, biodiversity conservation, genomic coverage
#1125964

The Role of the Enzyme FBXW2 as a Potential Modulator of NF-κB Activity and Its Influence on the Transition from the HER2+ to Triple-Negative Subtype

Autores: Karina De Menezes Leitão,Mariana Teixeira de Freitas,Patrícia Neves,Ana Paula Dinis Ano Bom,FRANCISCO JOSE PEREIRA LOPES
Apresentador: Karina De Menezes Leitão • karinademenezes159@gmail.com
Resumo:
Breast cancer is the second most common neoplasm worldwide. In Brazil, it is the leading cause of cancer-related death among women (INCA, 2024). Due to its clinical urgency—given the high resistance and toxicity of current therapies—it is extensively studied. It is known that certain molecules play essential roles in the development and progression of carcinoma, being differentially expressed among the disease subtypes. One such molecule is NF-κB (Nuclear Factor kappa-light-chain-enhancer of activated B cells), which is fundamental for various biological processes in healthy cells. However, in cancerous cells, its altered expression contributes to the epithelial-mesenchymal transition (EMT), a critical process for metastasis (PIRES et al., 2017). Our gene network model shows that HER2+ cells—a breast cancer subtype with high expression of the human epidermal growth factor receptor 2 (HER2)—can undergo a transition to triple-negative (TN), the most aggressive breast cancer subtype (LOPES et al., 2024). This model suggests that the subtypes behave as stable and attractive steady states, interspersed by an unstable and repulsive steady state. The possibility that one breast cancer subtype can transform into another is an innovative idea that could help answer important questions regarding heterogeneity, recurrence, and resistance to therapies. Recently, the protein FBXW2 (F-box and WD repeat domain-containing 2), a member of the F-box family and part of the SCF complex, has been described as targeting NF-κB for proteasomal degradation (REN et al., 2021). Through database analyses using UALCAN and The Human Protein Atlas, we observed that FBXW2 does not correlate with breast cancer tumorigenesis, showing gene expression levels in tumor lines equivalent to non-tumor ones. This may indicate that its alteration does not directly contribute to pro-tumoral behavior. We also noted high NF-κB expression in tumor cell lines compared to normal lines, highlighting its role in breast cancer development, along with the differential expression of EMT-related genes and breast cancer biomarkers between HER2+ and TN subtypes. This will enable the investigation of the transition process through gene expression. Our simulations showed that changes in the NF-κB degradation rate affect the system’s dynamic behavior, causing either an approach to or distancing from the unstable steady state that marks the threshold between HER2+ and TN basins. This demonstrates that the NF-κB-dependent transition may be either facilitated or hindered. Based on these findings, we propose increasing NF-κB concentrations by silencing FBXW2 in HER2+ cells to investigate the transition between the HER2+ and TN subtypes. In addition to validating our gene network model, we aim to explore FBXW2 as a potential strategic therapeutic target, modulating NF-κB levels to prevent progression to a more aggressive subtype.
Palavras-chave: Breast cancer, NF-?B, FBXW2, Transition, Gene network model
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125967

An intimate view of Leishmania infantum chromosome ends shows less conserved subtelomeric regions and variations in the telomeric repeat

Autores: Habtye Bisetegn Endalamaw,Beatriz Cristina Dias de Oliveira,Arthur de Oliveira Passos,Dra. Cristiane de Santis Alves,Evan Ernst,Maria Isabel Nogueira Cano
Apresentador: Habtye Bisetegn Endalamaw • habtye.bisetegn@unesp.br
Resumo:
Leishmania infantum is an intracellular parasite that causes visceral leishmaniasis, the fatal form of the disease, primarily affecting marginalized segments of the population. The Leishmania infantum genome consists of 36 chromosomes, whose ends have not been characterized. In most eukaryotes, chromosome termini are capped by telomeres, nucleoprotein structures that maintain genome stability and prevent the ends from being mistaken for broken DNA sites. Telomeres comprise repetitive G-rich DNA replenished by telomerase. Characterizing the L. infantum telomeres is critical to the parasite's biology and searching for new therapeutic targets or vaccine candidates to combat leishmaniasis. Here, we provide an in-depth view of the Leishmania infantum 72 chromosome end termini using Southern blotting and Oxford nanopore (ONT) whole genome sequencing. L. infantum telomeres contain hexamer variants alongside the canonical hexameric repeat. The subtelomeres include highly frequent octameric repeats intercalated with telomeric hexamers, the 62 bp Leishmania conserved telomere-associated sequence (LCTAS) CSB2 element, and other non-conserved sequences. Double digestion of the DNA with frequently cutting restriction enzymes helped to separate the telomere termini from the rest of the chromosome, estimate the L. infantum TRF (terminal restriction fragment) length in 100-500 bp, and confirm the subtelomeric localization of the octameric repeats. The ONT data provided a comprehensive view of L. infantum chromosome termini, containing more inward clusters of high gene density and determining the telomeres' size in all chromosome arms. Our results showed that the organization of L. infantum subtelomeres and telomeres is species-specific compared to other Leishmania sp., adding valuable information about the genome structure of an important pathogen.
Palavras-chave: Leishmania infantum telomere composition; telomeres’ size; subtelomeric region organization.
★ Running for the Qiagen Digital Insights Excellence Awards
#1125970

Rumen microbiomes from a new perspective: taxonomic limits of ciliated protozoa based on computational species delimitation methods in the Big Data era

Autores: Mylena Barros de Lima,Julliane Dutra Medeiros,Roberto Júnio Pedroso Dias,Mariana Fonseca Rossi
Apresentador: Mylena Barros de Lima • barros.lima@estudante.ufjf.br
Resumo:
Next-generation sequencing techniques associated with metabarcoding have led to the large-scale generation of molecular data, allowing access to deep levels of biodiversity knowledge. However, the processing of these data faces methodological limitations, such as the lack of reference data in databases and limited computing power for analysis, leading to their underutilization. This is accentuated in complex microbiomes, such as those of herbivorous mammals. The order Trichostomatia is composed of endosymbiont protozoa that play a central role in the rumen metabolism of their hosts. Despite this, their molecular diversity in unconventional hosts such as camelids remains poorly explored. Alignment programs such as PaPaRa, which align metabarcoding sequences to curated reference datasets, combined with computational methods for species delimitation, could be an alternative for dealing with this accumulated data. This study proposes a workflow based on the PaPaRa alignment algorithm and computational species delimitation methods for the taxonomic analysis of endosymbiont ciliate protozoan short-reads. A reference phylogeny containing all representative Trichostomatia sequences (>1000 bp) present in the GenBank database was constructed using the MAFFT aligner, subsequently edited in GBLOCKS, and the topology inferred using the IQ-TREE platform. Rumen microbiomes of Bos taurus (bovine) and Camelus dromedarius (camelid), available in public databases, were processed via DADA2 for quality control and inference of ASVs. The ASVs were aligned to the reference phylogeny using the PaPaRa software, and a consensus phylogeny was obtained using the IQ-TREE program. Three species delimitation methods were then applied to the consensus phylogeny: phylogenetic (PTP and mPTP) and distance-based (ASAP). The results were categorized into match profiles (single taxon per evolutionarily significant units – ESUs), lumper (several taxa per ESU), splitter (taxon divided into multiple ESUs), and unidentified (ASVs with no known taxonomy). After the screening stage, 20 and 45 bovine and camelid ASVs were categorized in the Trichostomatia subclass, respectively. The bovine ASVs were taxonomically classified by DADA2 to the genus level, while the highest taxonomic level classified for camelid ASVs was subclass. In the consensus tree, all the bovine ASVs were positioned within a single family: Ophryoscolecidae, while 16 of the camelid ASVs were positioned within the Buetschiliidae, Cyclophostidae, and Ophryoscolecidae families. The rest were phylogenetically positioned in a cluster not associated with any known taxon. The computational species delimitation algorithms showed considerably different results. For the bovine microbiome, mPTP had the best performance (more match profile ESUs). For the camelid dataset, no algorithm reached the species level. However, the methods were highly effective in recovering families and genera, especially in the camelid microbiome. The application of the delimitation methods demonstrated the potential to refine the taxonomy of ASVs beyond what traditional pipelines such as DADA2 achieve. The superior performance of PTP indicates its applicability in big data contexts for well-represented microbiomes, such as cattle. The approach adopted revealed an underestimated taxonomic diversity in camelids and brought greater clarity to the community structure of these protists in cattle, as well as highlighting the need for greater representation of these groups in genetic databases.
Palavras-chave: Taxonomic biodiversity, Metabarcoding, Trichostomatia, Bioinformatic
#1125972

Dynamics and Stability of LILRB2 ITIM3: Impact of Polymorphisms on SHP-2 Modulation

Autores: Laura Maria de Araújo Pereira,Silvana Giuliatti
Apresentador: Laura Maria de Araújo Pereira • lauramariabioinformatica@usp.br
Resumo:
The immune system comprises a complex network of molecules, cells, and signaling pathways that ensure defense against pathogens and maintenance of homeostasis. Moreover, among its regulators, the receptor LILRB2 negatively modulates the immune response via immunoreceptor tyrosine-based inhibitory motifs (ITIMs), which upon phosphorylation, recruit phosphatases such as SHP-2 to dephosphorylate signaling proteins and thereby inhibit cellular activation. Single-nucleotide polymorphisms (SNPs) in the LILRB2 ITIM3 region may perturb this interaction, undermining the balance of immune signaling. This study aims to analyze the structural and conformational impact of single-nucleotide polymorphisms (SNPs) in the ITIM3 region of LILRB2 and to evaluate their effects on its interaction with the phosphatase SHP-2. The full three-dimensional structure of LILRB2 was modeled using Modeller, I-TASSER, and Rosetta softwares. The tertiary structure of SHP-2 was retrieved from the Protein Data Bank (PDB ID: 2SHP). Protein–protein docking analyses were performed with HADDOCK 2.4. Complementarily, ENCoM was employed to conduct Normal Mode Analysis (NMA), and variation of Gibbs energy values (ΔΔG, in kcal·mol⁻¹·K⁻¹) were calculated via FoldX4, which also provided vibrational entropy changes (ΔS, in kcal·mol⁻¹·K⁻¹). The polymorphisms S589N (rs1430091303), A592T (rs745850573), and T593P (rs201073905) were identified and introduced into the LILRB2 sequence (UniProt Q8N423). All computational analyses and figure generation were carried out in Python. The S589N mutation markedly reduced correlated motions within ITIM3, with an average ΔS of –7.65 ± 7.70 kcal·mol⁻¹·K⁻¹, indicating increased local flexibility and dynamic disruption relative to wild type. In contrast, A592T produced negligible changes in global vibrational pattern (ΔS = –0.07 ± 0.11 kcal·mol⁻¹·K⁻¹), suggesting that this site tolerates substitution without impacting overall dynamics. On the other hand, T593P induced a moderate increase in motion correlations (ΔS = 2.37 ± 3.08 kcal·mol⁻¹·K⁻¹). The mean absolute change across all mutants was 3.37 ± 5.74 kcal·mol⁻¹·K⁻¹. When compared to ΔΔG values, S589N was significantly destabilising (+0.99 kcal·mol⁻¹), whereas A592T and T593P remained essentially neutral (0.13 and –0.16 kcal·mol⁻¹, respectively). Thus, the pronounced flexibility induced by S589N may impair SHP-2 recruitment to ITIM3, where precise phosphotyrosine engagement is critical, whereas the preserved stability of A592T could facilitate a more stable interface with SHP-2. Polymorphisms in LILRB2-ITIM3 differentially affect SHP-2 interaction. S589N increases regional flexibility and destabilizes the motif, likely hindering phosphotyrosine recognition. A592T maintains structural and functional stability, potentially enhancing SHP-2 binding, while T593P confers moderate rigidity. Subsequently, future work will extend coarse-grained simulations to the entire LILRB2 cytoplasmic region to map long-range conformational effects and allosteric mechanisms arising from these mutations.
Palavras-chave: LILRB2, coarse-grained, protein stability.
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125978

Proteomics shotgun analysis of mice brain tissues after injection of extracellular vesicles from Cryptococcus neoformans

Autores: Vinicius da Silva Coutinho Parreira,Alan Péricles Rodrigues Lorenzetti,Marlon Dias Mariano Santos,Juliana de Saldanha da Gama Fischer Carvalho,Tatiana Maron Gutierrez,Flávia C. G. dos Reis,Paulo Costa Carvalho,Marcio Rodrigues,Fabio Passetti
Apresentador: Vinicius da Silva Coutinho Parreira • PARREIRA.VSC@GMAIL.COM
Resumo:
There are a few pathogens that can cross the blood-brain barrier (BBB) and infect the central nervous system (CNS). This infection may impact some inflammatory pathways leading to neurotoxic and neurodegenerative processes. Even though the CNS is not the initial site of infection, chemical mediators of inflammatory response and many antigens may be carried through extracellular vesicles to the brain and cross the BBB. Extracellular vesicles (EVs) are double-membrane vesicles that can carry different molecules, including nucleic acids, proteins, polysaccharides, and lipids. Additionally, to human cells, some microorganisms may release EVs. These EVs may play a role in pathogen-host interplay, it is known that Cryptococcus neoformans extracellular vesicles may impact positively in adhesion and invasion of the BBB by fungal cells. However, there is a lack in the evaluation of the role of isolated EVs on the CNS. We performed an analysis of proteomics shotgun data from mouse brain injected with EVs from C. neoformans in the context of a 5-days and 15-days after injection. We found 179 proteotypic peptides exclusively in a 5-days context. We have performed an Over-representation Analysis (ORA) using Enrichr software. For the genes represented by these proteins are found in KEGG 2021, some enriched pathways (p-adj < 0.001), such as Pathways of neurodegeneration, Alzheimer disease, Synaptic vesicle cycle, Parkinson disease, Oxidative phosphorylation and Glycolysis / Gluconeogenesis. For 15-days context we did not find any exclusively peptides, but we found 288 peptides for 5-days and 15-days context (EVs-injected context) in comparison to controls, that present the same enrichment results observed in the 5-days context. Additionally, in EVs-injected context our results show a co-chaperone that has already been associated with the inhibition of TAU aggregation and this upregulation has been already related to Alzheimer’s disease (AD) and mild cognitive impairment. This result, associated with that observed in ORA’s results, lead us to analyze our data in comparison to five public datasets from mouse brain of AD model (ADm group) and Controls obtained from Pride database. For this analysis we found 12 peptides in EVs-injected group that are present in ADm group. The ORA analysis shows enrichment for some pathways (p-adj < 0.05), such as: Spinocerebellar ataxia, Huntington disease and Alzheimer disease, and a high enrichment (p-adj < 0.0001) to beta-Alanine metabolism. Moreover, we found 35 exclusive peptides in control groups. Our enrichment analysis emphasized pathways (p-adj < 0.01) such as Citrate cycle (TCA cycle), Glyoxylate and dicarboxylate metabolism and Pathways of neurodegeneration. This last pathway is correlated with the presence of a peptide from a protein that are associated with an inhibitory effect that decreases the Aβ production. These results suggest an imbalance of neurological pathways caused by the injection of C. neoformans EVs into mouse brains, indicating possible targets to be studied in the context of brain infection by this pathogen, as well as in the context of several neurological diseases from different causes. We gratefully acknowledge the financial support of CNPq and Fiocruz for this project.
Palavras-chave: proteomic, neurophatlogy, neurodegenerative pathways , neuroinflamation, Cryptococcus neoformans, mouse brain
★ This work is running for the Next Generation Bioinfo Award
#1125992

mtDNA-Network: Visual Analytics of Mitochondrial Variants in Complex Diseases

Autores: Letícia Cota Cavaleiro De Macêdo,Gustavo Barra Matos,Helber Gonzales Almeida Palheta,Giovanna Chaves Cavalcante,Gilderlanio Santana de Araujo
Apresentador: Letícia Cota Cavaleiro De Macêdo • leticia.cavaleiro.macedo@itec.ufpa.br
Resumo:
Mitochondria play a crucial role in maintaining the health and function of eukaryotic cells. Mutations or alterations in mitochondrial DNA (mtDNA) are increasingly recognized as contributors to a range of diseases, including neurodegenerative disorders and various types of cancer. To better understand these associations, we present the mtDNA-Network (https://apps.lghm.ufpa.br/mtdna), an interactive web platform designed to facilitate the exploration of mtDNA variants and their potential links to diseases such as gastric cancer, Parkinson’s disease, and leprosy. A key feature of the mtDNA-Network is its underlying database, which includes complete mitochondrial genome sequences from 180 individuals from the North of Brazil - a population historically underrepresented in genomic studies with high contribution of indigenous people. By offering access to this region-specific dataset, the platform aims to support more inclusive and comprehensive research into mtDNA-related diseases. The database includes 65 heteroplasmic variants in gastric cancer, 40 INDELs and 717 SNPs in Parkinson's data distributed across macro-haplogroups (A, B, and C), and 26 SNPs shared between polar lepromatous form, borderline lepromatous form, and borderline tuberculoid form, distributed across macro-haplogroups (A, B, C, and D). The mtDNA-Network includes powerful visualization tools and interactive tables to analyze mtDNA variants. Features include filtering, sorting, full-text search, pagination, data export, and similarity measurement using the Jaccard index. The platform uses bipartite networks to illustrate relationships between genetic variants and specific phenotypes. Examples include networks for gastric cancer, Parkinson’s disease, and leprosy, each highlighting variant types and associations. It offers dashboards with key metrics like variant frequency, transition/transversion (TS/TV) ratios, heteroplasmy distribution, and cross-dataset comparisons. Additionally, interactive genomic maps display variant locations with detailed annotations. Users can upload their own data in .csv format to generate custom bipartite networks. This feature supports integration of external datasets and allows for tailored analysis, enhancing the platform’s research capabilities. The mtDNA-network tool is maintained and updated periodically with new visualization components. The ability to interactively visualize complex networks of genes and genetic variants provides a deeper understanding of the different genetic elements and how these relationships can influence disease processes. With its effectiveness in interactively representing associative data, it stands out as a valuable tool for advancing research into complex diseases and could be adapted to analyze other pathologies in the future.
Palavras-chave: mtdna, complex diseases, genetic variants, database, visual analytics, networks
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1125995

Association of somatic variants with molecular subtypes in endometrial cancer

Autores: Rafaela de Barros Vieira Santos,Alessandra Serain,Michelle Marcela Paredes Escobar,Andreia C. de Melo,Cláudia Bessa Pereira Chaves,Aretha Brito Nobre,Nayara Gusmão Tessarollo,Mariana Boroni
Apresentador: Rafaela de Barros Vieira Santos • rafavidigaluemg@gmail.com
Resumo:
Endometrial cancer (EC) is among the most prevalent gynecological malignancies, ranking sixth in incidence among women globally. In 2022, it was estimated that there were 420,368 new cases and 97,723 deaths due to EC worldwide. In Brazil, it also holds the sixth position in cancer incidence among women, with the Brazilian National Cancer Institute (INCA) estimating 7,840 new cases of uterine corpus cancer annually between 2023 and 2025. By 2040, global incidence is expected to rise by more than 50% compared to 2021, emphasizing the need for enhanced molecular characterization and targeted management strategies. In the context of molecular risk characterization of EC, one of the most relevant approaches is the Proactive Molecular Risk Classifier for Endometrial Cancer (ProMisE), which categorizes tumors into four prognostically relevant subtypes: i) Mismatch repair deficient (MMR-D), responsive to immunotherapy; ii) POLE-ultramutated tumors, with mutations in POLE’s exonuclease domain, associated with excellent prognosis; iii) TP53-mutated tumors, generally linked to poorer outcomes; and iv) No Specific Molecular Profile (NSMP), with lack POLE mutations, MSI, or TP53 abnormalities, and represent a heterogeneous group. While this classification is clinically informative, its routine implementation remains limited due to the need for specialized molecular testing and complexities in data interpretation. Moreover, tumors with overlapping features from different subgroups pose additional challenges for accurate stratification and treatment planning. To contribute to a better understanding of EC's molecular landscape, our study aimed to explore the mutational profiles of each ProMisE subgroup using publicly available data. To this, a total of 530 EC samples were analyzed from The Cancer Genome Atlas (TCGA) mutation dataset. The initial phase of the study focused on refining the variant identification pipeline. Filters were applied for read depth (DP ≥ 20) and allele frequency (AF ≥ 2%) to improve the reliability of the variant calls. Missense, splicing, and nonsense mutations were re-annotated to enrich for variants with moderate or high predicted impact, as assessed by the Variant Effect Predictor (VEP). Kaplan-Meier survival analyses were performed using a five-year overall survival as the endpoint, and group comparisons were performed with the log-rank test through the Survival package in R, whereupon the TP53 group had the worst overall survival, as expected. However, the POLE group did not show the best overall survival, which led to further investigation of our classification and analytical approach. The mutational profiles were assessed using the Maftools package, enabling the generation of oncoplots to visualize the most frequently mutated genes within each subgroup. The POLE subgroup showed high mutation frequencies in POLE, PIK3CA, ARID1A, and PTEN (79%–50%). In MMR-D tumors, the predominant mutations were in PTEN, PIK3CA, SYNE1, and ARID1A (82%–64%). TP53-mutated tumors exhibited alterations in TP53, PIK3CA, PPP2R1A, and PTEN (55%–21%). NSMP tumors were most frequently mutated in PTEN, ARID1A, PIK3CA, and CTNNB1 (68%–26%). Thus, this study contributes to the exploration and understanding of the genes and potential biomarkers associated with each molecular subgroup in endometrial cancer that could enhance future therapeutic strategies and prognosis assessment.
Palavras-chave: Endometrial Cancer; Biomarkers; Mutation Profile
★ Running for the Qiagen Digital Insights Excellence Awards
#1126008

Genome-Wide Association Study Reveals Isocitrate Dehydrogenase as a Potential Key Factor in Azithromycin-Resistant Neisseria gonorrhoeae

Autores: Hector Hugo Furini,Helisson Faoro
Apresentador: Hector Hugo Furini • hectorfurini@outlook.com
Resumo:
Neisseria gonorrhoeae (Ngo) is the pathological agent of the sexually transmitted infection known as gonorrhea. In Brazil, the Ministry of Health estimates that 500,000 individuals are infected annually. Treatment of the disease involves a combination of azithromycin and broad-spectrum cephalosporins. However, the global spread of multidrug-resistant Ngo strains and the lack of therapeutic alternatives have led the World Health Organization to include this bacterium on the list of priority pathogens for research and development of new drugs. Therefore, the aim of this study was to identify genetic variations associated with azithromycin resistance in Ngo through a genome-wide association study (GWAS). For this purpose, complete genomes from the public database BV-BRC (v.3.50.5i) were used and annotated with Prokka (v1.14.5). The GWAS analysis was based on k-mer counting, conducted using Pyseer (v1.3.10), and included core genome definition (Roary, v3.13.0), phylogenetic tree construction (FastTree, v2.1.11), k-mer counting (fsm-lite, v1.0), and statistical evaluation of the variants. Identified k-mers were annotated for gene association and mapped using the reference genome ATCC19424. A total of 1,568 Ngo genomes were retrieved from the database, with 236 classified as resistant to azithromycin (MIC > 1.0 mg/L) and 1,332 as susceptible (MIC < 0.5 mg/L). The k-mer-based GWAS revealed genetic variations associated with azithromycin resistance in 303 regions (158 genic and 145 intergenic). In the analysis based on the reference genome, polymorphisms were identified in 22 genes and 28 intergenic regions. Most polymorphisms (73.7%) were of the SNP type, with 42.8% classified as missense, 56.3% as synonymous, and only one (1.4%) as a frameshift mutation. Both analytical approaches identified genetic variants in hypothetical proteins, proteins involved in cellular metabolism, and transporters not previously associated with azithromycin resistance. In the reference genome-based GWAS, hypothetical proteins encoded by loci MNNJMMFF_02179 and MNNJMMFF_01163 were identified with polymorphisms potentially associated with azithromycin resistance, each containing two missense and two synonymous SNPs, suggesting the presence of a potential resistance site not yet characterized. Additionally, variation in the isocitrate dehydrogenase (icd) gene was observed, including one missense and seven synonymous SNPs, 361 hits in the k-mer-based GWAS analysis, and variation in a hypothetical protein at a nearby locus (MNNJMMFF_00500) with two missense SNPs. The protein isocitrate dehydrogenase (IDH) has been previously characterized in bacteria involved in resistant pulmonary infections, such as Klebsiella pneumoniae and Legionella pneumophila, due to its central role in carbon metabolism, being essential for energy generation and the production of substrates required for bacterial proliferation. Furthermore, IDH is associated with the management of metabolic stress in bacterial cells, which may be induced by antimicrobials, thereby justifying the observed alterations that could contribute to bacterial survival under drug pressure. We identified multiple genetic variants potentially associated with azithromycin resistance in Ngo. Moreover, significant polymorphisms were detected in the icd gene—its protein previously studied in drug resistant bacteria—suggesting an involvement in resistance mechanisms and highlighting it as a potential target for future investigations.
Palavras-chave: Neisseria gonorrhoeae, Azithromycin resistance, Genome-wide association study, GWAS, Isocitrate dehydrogenase, Antimicrobial resistance, Genomic variants, SNP
★ Running for the Qiagen Digital Insights Excellence Awards
#1126015

Evolutionary analysis of the establishment of the immune system

Autores: Bruno William Fernandes Silva,Gleison Medeiros De Azevedo,Rodrigo Juliani Siqueira Dalmolin
Apresentador: Bruno William Fernandes Silva • brunowilliam130@gmail.com
Resumo:
The human immune system consists of multiple functional modules that interact to maintain body homeostasis. Previous studies on its evolution highlight key mechanisms driving its molecular and functional diversification, including selective pressures from host-pathogen interactions, duplication of ancestral genes, and recombination of transposable elements (TEs). However, analyses of the evolutionary context behind the emergence of these genes and the consolidation of this complex biological system remain scarce. To address these gaps, this study analyzed genes associated with immune system metabolic pathways annotated in the KEGG database (Kyoto Encyclopedia of Genes and Genomes). Gene origin inference was performed using the GeneBridge package (R language), using orthology data mapped via the STRING database. Results revealed that 51% of the immune genes analyzed emerged before the rise of Choanoflagellata (the sister group of Metazoa), with 79% of T-cell receptor signaling pathway proteins originating in the Metamonada clade. In Choanoflagellata, approximately 9% of all genes arose, primarily linked to the chemokine signaling pathway—essential for cell migration in multicellular organisms. The diversification period of invertebrates showed limited new gene acquisition. Despite their extreme morphological diversity and varying structural complexity, this was not mirrored in immune system evolution. The highest peak of immune gene emergence occurred in the Actinopterygii clade, the first vertebrates represented in our data. This group exhibits marked structural complexity, including vertebral columns, robust tissue organization, and closed circulatory systems. At this stage, critical genes emerged for antigen processing/presentation and hematopoietic cell lineage, fundamental for cellular specialization and functional diversity observed in the human immune system. Our findings suggest that nearly half of the immune proteins analyzed have ancient evolutionary origins, rooted in basal clades. Conversely, vertebrate structural complexity appears to have demanded more elaborate immune adaptations, reflected in a significant increase in genetic innovation.
Palavras-chave: Biological Evolution, Systems Biology, Molecular Evolution
★ Running for the Qiagen Digital Insights Excellence Awards
#1126018

Integrative Cross-Study Analysis of Extracellular Vesicle-Derived Small Non-Coding RNAs Highlights Potential Stable Biomarkers for Colorectal Cancer

Autores: Daniel Carvalho da Fonseca Nigro,ERIKA MARTINS DE CARVALHO,Rodrigo Jardim
Apresentador: Daniel Carvalho da Fonseca Nigro • daniel.nigro@hotmail.com
Resumo:
Small non-coding RNAs derived from extracellular vesicles (ex-sncRNAs) hold promise as minimally invasive prognostic biomarkers for colorectal cancer (CRC), the third most prevalent cancer worldwide, due to their availability and stability in biofluids such as plasma. However, there are still significant library preparation biases, which impact the identification and quantification of certain ex-sncRNA biotypes. In addition, most studies have focused predominantly on miRNAs, whereas other small RNAs, such as piRNAs and YRNAs, are increasingly recognized for their involvement in key biological processes during tumorigenesis. This study presents an in silico cross-analysis of publicly available datasets from the GEO repository, aiming to identify small non-coding RNAs (sncRNAs) that are differentially expressed despite variations in library preparation methodologies. Additionally, their potential relationship with CRC staging will also be evaluated. To achieve these goals, the quality assessment was conducted in two steps: first, adapter trimming was performed with Cutadapt. Phred quality score was set to 30 and reads shorter than 15 bp and longer than 40 bp were discarded. Then, FASTQC and MultiQC were used for visualization. The following steps were performed using the Excerpt pipeline, removing residual adapters and filtering homopolymers, contaminants from the UniVec database and annotated rRNA from Ribosomal Database Project. Reads were aligned with STAR to reference genome and transcriptome (hg38) and then aligned to RNAcentral databases (miRbase, piRbase, GENCODE, etc) in the default order (miRNA > tRNA > piRNA > GENCODE > circRNA) through the Excerpt pipeline. Differential expression analysis was conducted using the Bioconductor package DESeq2. Ex-sncRNAs with a |log2FC| > 2 and FDR < 0.05 were considered differentially expressed. A total of 100 samples were analyzed, including 50 from healthy individuals and 50 from colorectal cancer patients. On average, 65% of raw reads passed quality control, resulting in a mean input of 11,378,381 reads. Of these input reads, an average of 53% mapped to the human genome, with a concordance rate of 90.78% between the transcriptome and reference genome mapping. The read length distribution showed two distinct peaks at 22–23 bp and 33 bp. The pipeline runtime per sample was on average 86.19 minutes. Overall, 244 ex-sncRNAs were identified as differentially expressed, with 238 upregulated and 6 downregulated in colorectal cancer samples. Many of these RNAs are associated with biological processes relevant to tumorigenesis, including cell proliferation, immune evasion, and therapy resistance. Therefore, the differentially expressed ex-sncRNAs were consistently identified across multiple independent studies, establishing them as strong biomarker candidates for CRC.
Palavras-chave: small non-coding RNA, colorectal cancer, extracellular vesicles
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126020

ELUCIDATING THE MECHANISM OF AN ANTI-LEISHMANIAL BENZOTHIAZOLE HIT WITH NETWORK PHARMACOLOGY

Autores: Caio de Jesus de Oliveira,JOÃO VICTOR SILVA-SILVA,Leonardo Luiz Gomes Ferreira
Apresentador: Caio de Jesus de Oliveira • caio.dejesusoliveira@usp.br
Resumo:
The World Health Organization established, as one of its sustainable development goals, the eradication of neglected tropical diseases which affect more than a billion people worldwide, especially in impoverished communities. Among such diseases, Chagas disease and leishmaniasis are both caused by parasites from the genus Trypanosomatidae and are responsible for millions of deaths globally. Available treatments present limited efficacy, high toxicity and high costs, urging the search for new drugs. Studies indicate that benzothiazoles show in vitro activity against strains of Leishmania and Trypanosoma, and the compound N-(benzo [d] thiazol-2-yl) cyclopropanecarboxamide (B1) stands out by its selectivity to Leishmania infantum. However, there are few studies about its mechanism of action. This work uses computational tools to investigate the molecular targets of B1, focusing on the identification of common and exclusive targets among the parasites and the comprehension of the mechanism that leads to cellular death. In order to predict the molecular targets, seven servers based on compound structure and five servers dedicated to the identification of human targets related to the diseases were used The hubs of the protein-protein interaction networks (PPI), originating from the intersections between compound and disease target prediction, were selected with the software Cytoscape. The functional enrichment analysis using GO and KEGG was made through the DAVID dataset. There were 580 targets with plausible interaction with B1, 502 targets associated with leishmaniasis and 598 with Chagas disease. The intersection between predicted targets for the compound and Chagas disease resulted in a PPI with 49 nodes and 260 edges, showing hub genes including AKT1, CASP3 and HIF1A. The enrichment analysis showed the participation of these proteins in 166 biological processes (BP) such as inflammatory response and regulation of genetic expression; 29 cellular components (CC) including plasmatic membrane, cytosol and nucleoplasm; 46 molecular functions (MF) including protein homodimerization and binding to kinases and other specific proteins; and 92 pathways on KEGG, including cancer and infection related pathways, and PI3K-Akt signaling. The intersection between predicted targets for the compound and leishmaniasis resulted in a PPI with 47 nodes and 316 edges, showing hub genes such as NFKB1, HSP90AA1 and STAT1. The enrichment analysis showed the participation of these proteins in 170 biological processes including positive regulation of gene expression and regulation of transcription by RNA polymerase II; 25 cellular components such as cytoplasm, nucleus and cytosol; 47 molecular functions such as enzyme and histone deacetylase binding and regulation of nitric-oxide synthase activity; and 104 pathways on KEGG, including IL-17 signaling, prolactin signaling and cancer related pathways. The results suggest that the compound B1 acts on the inhibition of the targets common to both diseases, such as AKT1, CASP3, HSP90AA1 and CCR2, indicating a mechanism of action shared among the Trypanosomatidae. The inhibition of specific targets of leishmaniasis, such as NFKB1, STAT1 and GZMB reinforces its selectivity to Leishmania infantum. These findings offer new insights about the mechanism of action of B1, which is being investigated in complementary structural studies to confirm the targets and elucidate the molecular mechanisms involved.
Palavras-chave: Network analysis, PPI, Drug Discovery, Chagas disease, Leishmaniasis
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126022

"Bioinformatics on Instagram: An Analysis of Scientific Dissemination and Teaching Strategies on Social Media"

Autores: Natália Lima Quintanilha
Apresentador: Natália Lima Quintanilha • nlqprof@gmail.com
Resumo:
This study aimed to investigate the use of the Instagram platform as a tool for scientific dissemination and teaching of bioinformatics, a crucial interdisciplinary field for the analysis of biological data. Given the increasing relevance of social media in scientific communication, the study sought to analyze how Instagram has been used to popularize and educate about bioinformatics. The adopted methodology, based on Bardin's (2011) content analysis, involved the selection of Instagram profiles focused on bioinformatics, identified through searches using the terms "bioinformática" and "bioinformatics". After applying inclusion criteria (profile with more than 1,000 followers, regular content, posts in Portuguese, profile active for at least 12 months, and a declared focus on bioinformatics), 10 profiles were selected for in-depth analysis. Each profile was classified according to the nature of the creator/maintainer (personal or institutional) and the thematic predominance of the posted content, divided into four categories: educational/didactic, general scientific dissemination, career and opportunities, and mixed. Additionally, an analysis of the engagement of the last 10 posts of each profile was carried out, measuring the total and average number of likes. The results revealed a significant diversity in the communication strategies adopted. Personal profiles, such as those of individual disseminators, stood out for the predominance of content with a didactic and educational character, which highlights the growing role of autonomous professionals as teaching agents in digital environments. Institutional profiles, on the other hand, presented more diversified approaches, with an emphasis on general scientific dissemination and varied content, reflecting a broader intention of reach and institutional representativeness. Regarding engagement, a large variation was observed among the profiles. The profile with the highest average of likes per post was that of ISCB-SC (284), followed by Revista Bioinfo (102.5) and Naila Soler (111), demonstrating that visually appealing, up-to-date content relevant to the scientific community tends to generate greater interaction. On the other hand, profiles of academic leagues and associations showed lower levels of engagement, which may be related to limited reach or the absence of specific communication strategies. The discussion of the results suggests that individuals play an important role in the dissemination of knowledge in bioinformatics on Instagram through educational content. Institutional profiles, on the other hand, may have broader communication objectives, seeking both the dissemination of the field and the engagement of the scientific community. In summary, Instagram has proven to be a platform with potential for the scientific dissemination and teaching of bioinformatics, although the effectiveness of strategies varies significantly between different types of profiles and the shared content. The research evidenced the coexistence of approaches focused on education, general dissemination, and the communication of specific information in the field. The engagement analysis suggests that high-impact content and relevance to the community can be key factors in generating greater interaction. The insights obtained can contribute to the development of more effective communication and teaching strategies for bioinformatics in digital environments, leveraging the reach and interactivity of Instagram.
Palavras-chave: Bioinformatics, Scientific Communication, Instagram, Content Analysis
#1126026

Synthetic thiosemicarbazones as potential inhibitors of Staphylococcus aureus protein tyrosine phosphatases

Autores: Hewelyn Cunha,Mayara dos Santos Maia,Angela Camila Orbem Menegatti
Apresentador: Hewelyn Cunha • hewelyn100@gmail.com
Resumo:
Staphylococcus aureus is an opportunistic pathogen associated with persistent and difficult-to-treat infections, particularly in hospitalized patients. Its high virulence is linked to immune evasion mechanisms, including the secretion of proteins that interfere with host cell signaling pathways, promoting bacterial survival and replication. Among these proteins, protein tyrosine phosphatases (PTPs) A and B (PtpA and PtpB) stand out, as they dephosphorylate tyrosine residues on key host proteins, modulating essential intracellular processes. These enzymes contain conserved catalytic motifs CX₄CR (P-loop) and DPY, characteristic of the low molecular weight protein tyrosine phosphatase (LMW-PTP) family, making them promising targets for enzyme inhibitor development. Synthetic thiosemicarbazone derivatives have been investigated as potential PTP inhibitors and have demonstrated antitumor, antifungal, and antimicrobial activities. This study aimed to evaluate, through in silico molecular docking, the inhibitory potential of 37 thiosemicarbazone derivatives against the PtpA and PtpB proteins from S. aureus. The 3D structure of PtpA was retrieved from the Protein Data Bank (PDB ID: 3ROF), and the 3D structure of PtpB was predicted using AlphaFold (P0C5D3). The stereochemical quality of the AlphaFold model was assessed using a Ramachandran plot, showing over 95% of residues in favored regions. Compound structures were drawn using ChemSketch (version 2024.2.0) based on the library described by Sens et al. (2018) and converted to the .SDF format. The compound PTP Inhibitor IV (PubChem CID 6420094), previously described as a selective PTP inhibitor, was used as a positive control. Docking simulations were performed with GOLD software (version 2024.3.0) using the GoldScore scoring function with default parameters. The most promising complexes were analyzed using Discovery Studio (version 24.1.0.23298). The best fitness values obtained for the positive control were 51.08 for PtpA and 52.61 for PtpB. Twenty-two compounds surpassed the reference fitness value for PtpA, while only one exceeded this threshold for PtpB. The most promising ligand for PtpA ((E)-2-(benzo[d][1,3]dioxol-5-ylmethylene)-N-(4-methoxyphenyl) hydrazinecarbothioamide, CID 6875628) achieved a fitness value of 73.43 and interacted with catalytic residues such as Arg14, Cys8, Cys13, and Asp120 via pi-alkyl, pi-anion, and hydrogen bond interactions. For PtpB, the most promising compound ((E)-2-([1,1′-biphenyl]-4-ylmethylene)-N-(4-methoxyphenyl)hydrazinecarbothioamide, CID 9715006) with a fitness of 53.67 exhibited interactions including pi-sulfur (Cys7, Cys12), pi-alkyl (Thr8, Arg13), pi-anion (Asp111), pi-pi (Tyr113), and conventional hydrogen bonds with Thr11 and His82. Notably, the top-performing ligands formed hydrogen bonds with the nucleophilic cysteine, adjacent polar residues of the P-loop, and the catalytic aspartate from the DPY-loop. These results suggest the tested thiosemicarbazone derivatives exhibit greater selectivity towards PtpA than PtpB, emerging as promising prototypes for future studies and encouraging further in vitro assays to confirm their inhibitory activity and potential as anti-virulence agents.
Palavras-chave: Virtual Screening, Bacterial protein tyrosine phosphatases, Thiosemicarbazones
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126030

TaxoScan: From Sanger Chromatograms to Species Identification in a Single Tool

Autores: Rômulo Lucio Vale de Moraes,Luiza Marques Prates Behrens,Karine Lena Meneghetti,Sthéfani da Cunha
Apresentador: Rômulo Lucio Vale de Moraes • romulo.moraes@sc.senai.br
Resumo:
Although next-generation sequencing (NGS) technologies have become a standard in molecular biology, Sanger sequencing remains highly relevant for species identification, especially in laboratories working with low sample volumes, requiring cost-efficiency and high accuracy. Despite its routine use, many bioinformatics workflows for analyzing Sanger chromatograms are still fragmented, manual, or dependent on commercial software.
To address these limitations, we present TaxoScan, an open-source and web-based platform that delivers a fully automated pipeline for species identification based on Sanger sequencing data. The tool performs three core steps: (i) trimming of chromatograms based on user-defined Phred quality scores; (ii) consensus sequence generation from forward and reverse reads using Clustal Omega; and (iii) taxonomic identification via real-time querying of the BOLD Systems database. Developed with Python, Django, and Biopython, and deployed with Docker, the platform is cross-platform and reproducible.
TaxoScan is designed to be accessible to non-specialists, offering both a graphical web interface and command-line access. The platform also includes features for user authentication, visualization of results, and automatic report generation. A comparative analysis with existing tools—such as Geneious, BioEdit, and SeqTrace—shows that TaxoScan is the only fully automated solution integrating all steps, from chromatogram processing to identification.
The tool is already being applied in a real-world context to support the identification of fish species in authenticity testing, contributing to fraud prevention and improved traceability. In this practical application, TaxoScan has demonstrated tangible impact and reliability.
Given its modular design and open-source availability, TaxoScan is also suitable for expansion to other molecular markers, as well as integration with NGS data and cloud pipelines, making it a versatile tool for research, education, and industrial applications.
Palavras-chave: automation, sanger sequencing, species identification, bioinformatics, open-source tools
#1126034

Epitope prediction against Klebsiella pneumoniae with K1 and K2 capsule as a response for the prevention of hospital associated infections through vaccination

Autores: Amalia R F Lobato,Marcelo Cleyton da Silva Viera,Rafael Azevedo Baraúna,Danielle Murici Brasiliense Seligmann
Apresentador: Amalia R F Lobato • amalialobato007@gmail.com
Resumo:
Klebsiella pneumoniae hypervirulent (KpHv) is one of the most challenging pathogens to treat in Healthcare Associated Infections (HAI). The emergence of KpHv with resistance to last resort antibiotics is a major threat to global health systems. Also, as one of the most prevalent pathogens in hospital infections, with a high mortality rate, it leads to significant healthcare costs and poorer patient outcomes. Due to the narrow time windows for treatment and limited antibiotic options for antibiotic-resistant Gram-Negative pathogens, alternative treatments are necessary against the infections, as the treatment with a new vaccine. To produce possible new immune targets, the gene neuB from capsular K1 and K2 was selected, capsule types related to higher virulence. The neuB sequences were obtained from UniProt, with their cellular localization on bacterial surface confirmed by PSORTB and classified as allergenic by AlgPred2. According to SwissModel, a molecular structure was predicted with a QMEANDisCo Global score of 0.91 for both K1 and K2. B-cell epitopes were predicted by ABCpred and the epitope’s affinity was predicted with NetMHCpan v. 4.1 to the alleles HLA-A01:01, HLA-A02:01 and HLA-B35:01. From the epitopes predicted, four were selected for K1 capsule and five for K2 capsule, with scores of 0.93-0.92 and 0.95-0.93, respectively. The binding of peptides to MHC class I results in one strong binder for K1 and five for K2, none of which exhibit toxicity mutations according to ToxinPred, and a K2 epitope with strong binding affinity to both HLA-A01:01 and HLA-B35:01 alleles. We conclude that the epitopes predicted are a good option for further studies aiming at vaccine design. Preventing HAI in patients can provide better hospital stays, reduce costs and prevent hospital outbreaks. Vaccination before hospitalization, particularly for scheduled procedures like surgery or small invasive procedures, or for immunocompromised patients, can aid in patient recovery and reduce the length of hospital stay until discharge.
Palavras-chave: 1792614
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126049

EVALUATION OF EPIGENETIC PROGNOSTIC BIOMARKERS IN BREAST CANCER OF BLACK AND BROWN WOMEN

Autores: André Luiz da Silva Christianes,Monique de Souza Almeida Lopes,Diego Gomes,Jennifer Vieira Gomes,Tatiana de Almeida Simão,Leonor Gusmão,Cynthia Chester Cardoso,Lissette Delgado-Cruzata,Sheila Coelho Soares Lima
Apresentador: André Luiz da Silva Christianes • andre.christianes@ufvjm.edu.br
Resumo:
Breast cancer is the most common and deadliest cancer among women, with incidence and mortality rates varying by race and ethnicity. While White women have a higher incidence, Black women face higher mortality and are often diagnosed with the most aggressive subtype, triple-negative breast cancer (TNBC). This study evaluated prognostic epigenetic biomarkers in Brazilian black and brown women to understand disease behavior and prognosis. Black and brown women diagnosed with breast cancer, enrolled at National Cancer Institute (INCA) between 2010 and 2017 were included. Ancestry and high throughput DNA methylation profiles of 48 patients were analysed using Sequenom iPLEX platform and Illumina MethylationEPIC BeadChip kit respectively. ChAMP package was used for Differential methylation analyses. Differentially methylated positions (DMPs) with Benjamini-Hochberg adjusted p-value < 0.0001 were identified by comparing tumor and adjacent non-tumor tissues. DMP proportions in gene regions, CpG islands, enhancers, and chromosome enrichments were evaluated by Chi-Square test (p<0.05). Gene Set Enrichment Analysis (GSEA) was performed using GOMETH methodology and KEGG database, with significantly enriched pathways identified at a False Discovery Rate (FDR) < 0.05. Differentially methylated regions (DMRs) with FDR < 0.001 and at least seven CpG sites were determined. Tumor clusters were constructed using 1,000 most variable positions evaluated by kmeans and Hierarquical Clustering in ConsensusClustering package. 1,000 bootstraps were realized, each containing 80% of samples. Optimum k cluster number was identified by cumulative distribution function plot and clinical meaning. Impact on overall survival was evaluated by log rank-test. Associations between clusters and molecular subtypes, color/race and stage were tested with Fisher Exact Test (p<0.05). Ancestry and age associations were analyzed using T-test or Wilcoxon test after normality testing with Anderson-Darling. 268,153 DMPs between tumor and adjacent non-tumor tissues were identified, with 56,679 showing an absolute methylation difference (deltaBeta) > 0.2. Most sites (72.73%) were hypomethylated in tumors. Hypermethylated DMPs were enriched in CpG islands and promoter regions, possibly silencing tumor suppressor genes, while hypomethylated DMPs were enriched in intergenic and open sea regions, potentially linked to global hypomethylation. GSEA revealed “Pathways in Cancer,” “cAMP,” and “Ras” signaling among most frequently altered pathways. 11,025 DMRs were identified with 294 showing absolute mean differences > 0.2, 178 (59%) exhibited lower methylation in tumors. Hierarquical Clustering with k=2, revealed a cluster with higher methylation (0.8 vs. 0.5, p=9.889e-05) associated with worse survival (p=0.0088). This cluster had all TNBC samples and was enriched for advanced stage cases (2B, 3A and 3B). Among luminal molecular subtype cases, survival was worse in this group (p=0.038). 3,248 DMPs distinguished two clusters, 1566 (48,21%) hypermethylated in cluster 1 and 2,706 with absolute deltaBeta > 0.2. Our results show that DNA methylation alterations are capable to predict breast cancer prognosis among black and brown women, which seems to be independent of molecular subtype. Understanding these molecular alterations may improve prognostic evaluation and support personalized treatment strategies for vulnerable women with breast cancer.
Palavras-chave: Breast Cancer, DNA methylation, Ancestry, Biomarkers, Black and Brown Women.
#1126060

At the Genome’s Edge: Telomeric Diversity in Leishmania major Revealed by Nanopore Sequencing

Autores: Arthur de Oliveira Passos,Habtye Bisetegn Endalamaw,Ana Beatriz Rodrigues,Matheus Rodrigues Sauda,Guilherme Targino Valente,Maria Isabel Nogueira Cano
Apresentador: Arthur de Oliveira Passos • arthur.passos@unesp.br
Resumo:
Protozoan parasites of the Leishmania genus are responsible for causing leishmaniasis, a neglected tropical disease that lacks effective treatment and control, underscoring the urgent need for the development of alternative therapeutic strategies. The genome of Leishmania major reference strains has a haploid content of 32.8 Mb, comprising 36 chromosomes. These genomes, available in GenBank and regularly curated by TritrypDB, have gaps of crucial information regarding certain chromosomal elements, such as telomeres. Third-generation sequencing technologies, such as the Oxford Nanopore (ONT), have shown that telomeres are much more heterogeneous than previously thought. In addition to canonical repeats (TTAGGG), variant sequences are commonly found at the chromosomal ends of a wide range of organisms. These variant sequences (TVSs) play an important role, potentially altering the local chromatin composition and interaction network. This study proposes investigating the length and heterogeneity of Leishmania major telomeres/subtelomeres (reference strain Friedlin) and searching for unsolved gaps in the available genomes. We used the Promethion 2 Solo platform long-read sequencing from ONT. The sequence quality was assessed using Nanopack and Quast. De novo assembly was done using Flye and Canu, followed by the Companion genome annotation. We filtered telomeric reads with a canonical sequence using Grep, and identified TVS using a script in Python. Our preliminary results show that we assembled the 36 chromosomes using de novo assembly. We also got better coverage of the chromosome end terminus and recovered 58 telomeres out of 72, with a median size of 3378 bp. Of the total number of repeats in the telomeric region, at least 5% are TVSs, showing heterogeneity at the parasite´s telomeres. Our work expands knowledge about the organization of Leishmania major telomeres. Our ongoing studies aim to provide gapless sequences and uncover hidden regions important for understanding parasite biology and evolution.
Palavras-chave: Leishmania major, Nanopore sequencing, Genome, Telomere
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126061

Characterization of Transcripts from High-Risk HPV Non-16/18 in cervical tumoral samples infected by multiple viral genotypes

Autores: Larissa Freitas Artiles Gonçalves,Nicole Scherer,Mariana Boroni,Miguel Moreira
Apresentador: Larissa Freitas Artiles Gonçalves • artileslarissa@gmail.com
Resumo:
Human Papillomavirus (HPV) is a virus whose persistent infection is primarily associated with cervical cancer. Its small, circular and double strand DNA genome is transcribed into polycistronic mRNAs that undergo splicing, generating transcripts and alternative isoforms that play a crucial role in infection and carcinogenesis. Among the fifteen high-risk HPV subtypes, HPV16 and HPV18 are the most well characterized. Given the carcinogenic potential and the limited understanding of other high-risk genotypes, this study aims to identify transcripts from non-HPV16/18 types in nineteen cervical cancer biopsies samples with multiple HPV infections. RNA-seq data were obtained from GEO (access number GSE144293). DNA from HPV subtypes 31, 33, 35, 39, 42, 45, 52, 54, 58, 61, and 68 was identified across fifteen of these samples using reverse hybridization. Reads from high-throughput sequencing were assembled into transcripts using Trinity software (version 2.15.1) using the de novo approach. Assembled transcripts were aligned to the human genome GRCh38, to high-risk HPV genomes, and HPV6 and HPV11 using BLAT tool (version 2.5.1). HPV aligned sequences were further characterized using PaVE PV-specific BLAST and Human BLAT Search tools. The abundance of transcripts and alternative isoforms from non-16/18 HPV subtypes was evaluated using Trinity Transcript Quantification script, with an alignment-based approach, employing RSEM and Bowtie2 tools. Of these fifteen samples, six had transcript assemblies that aligned to other high-risk HPV subtypes. These samples, but one, also had assemblies aligning to HPV16. Considering the other genotypes, assembled transcripts were found for: HPV31 (in two samples, with eight and three assembles), HPV33 (one sample with eight assembles), HPV39 (one sample with four assembles), HPV45 (two samples, with seven and one assembles) and HPV58 (one sample with one assemble). Presence of LCR, E6, E7, E1, E2 and E5α were found in assemblies from HPV45, HPV31, HPV39 and HPV33. Presence of E2 was found in the single assembly from HPV58. Most of these assembled transcripts also aligned to human genome, mainly in intronic regions of distinct human genes, suggesting the presence of chimeric (human/HPV) transcripts and the integration of viral genome into cellular tumoral genome.
Palavras-chave: Non-HPV16/18 genotypes, Cervical Cancer, Viral Transcripts
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126062

Exploring Hidden Layers of Gene Expression in Campylobacter jejuni: A Focus on Stress-Activated smORFs

Autores: Érica Mendonça Ziehe,Alex Sanchez Yumbo,Guadalupe del Rosario Quispe Saji,Marisa Fabiana Nicolás
Apresentador: Érica Mendonça Ziehe • eziehe@posgrad.lncc.br
Resumo:
Campylobacter jejuni is a major etiological agent of bacterial gastroenteritis worldwide, frequently associated with the consumption of contaminated food. Although its genome was sequenced several years ago, critical aspects related to its physiology, virulence mechanisms, and environmental adaptation remain poorly understood. Recently, small open reading frames (smORFs), which encode peptides of up to approximately 100 amino acids, have emerged as important components of bacterial biology, involved in regulatory processes, membrane interactions, and virulence.
The identification of smORFs is challenged by limitations in genome annotations and the sensitivity of traditional proteomic techniques. However, approaches such as ribosome profiling have enabled the precise mapping of these regions. In C. jejuni, a recent study applied Ribo-seq to refine the translatome of the bacterium, leading to the discovery and experimental validation of numerous smORFs, as well as the creation of a functional catalog of these small proteins. The availability of these validated smORFs now opens new avenues for investigating their transcriptional behavior under a broader range of environmental stress conditions, expanding our understanding of their potential regulatory and adaptive roles.
In the present study, we analyzed public RNA-seq datasets of C. jejuni cultivated under stress conditions, including exposure to deepoxy-deoxynivalenol, deoxynivalenol, hypochlorite, and metabolic stress induced by serine as the sole nitrogen source. We aimed to investigate the transcriptional expression patterns of smORFs under these stress conditions.
Sequencing data were quality-checked using FastQC and MultiQC, trimmed with Fastp, and filtered for ribosomal RNA using SortMeRNA. Reads were aligned to the reference genome with Bowtie2, and gene quantification was performed using FeatureCounts. Differential expression analysis was performed using both EdgeR and DESeq2, with a focus on smORFs previously described in the literature.
We identified 22 novel smORFs with detectable transcription across stress and control conditions. Notably, CJsORF23 and CJsORF24 were found within annotated sRNA regions, suggesting potential non-coding regulatory functions. Additional smORFs, including CJsORF3, CJsORF12 (TrpL), and CJsORF41 (LeuL), were located in regulatory regions such as 5′ untranslated regions (5′UTRs), which are commonly associated with translational control. These include leader peptides that may modulate gene expression in response to nutrient availability.
CJsORF3 was associated with the cioY gene, which encodes a putative chlorine oxidoreductase involved in oxidative stress response, further supporting the functional relevance of smORF–gene associations. Several smORFs were also distributed across 3′UTRs, intergenic, and pseudogenic regions, reinforcing the hypothesis that these elements may influence translation or mRNA stability.
While transcriptional evidence alone does not confirm functional activity, the consistent expression patterns observed under stress conditions indicate that smORFs may represent previously uncharacterized elements of C. jejuni's regulatory network. These results provide a robust basis for future functional validation and highlight smORFs as potential targets for novel antimicrobial strategies.
Palavras-chave: Campylobacter jejuni, gene expression ,smORFs
#1126063

Molecular Reactivity in Drug Discovery: Applying Fukui Reactivity Indices in the Selection of Promising SARS-CoV-2 Main Protease (MPro) Inhibitors

Autores: Igor Camilo Ferreira,Pedro Henrique Monteiro Torres
Apresentador: Igor Camilo Ferreira • igorcf@biof.ufrj.br
Resumo:
The SARS-CoV-2 Main Protease (MPro) is a key therapeutic target in the fight against COVID-19 because it is essential in the processing of viral polyproteins and, consequently, in the replication of the virus. In this context, the identification of effective inhibitors against MPro is a key strategy in the development of new antivirals. Many candidate molecules for new drugs have been discovered through Virtual Screening and Computer-aided Drug Design techniques, which are rapid and cost-effective methods to identify promising therapeutics. The ligands evaluated in this study were selected from a previous Virtual Screening of the Enamine database with a filter for glutamine-analogous structures, resulting in approximately 200 thousand compounds. Then, Molecular Docking of these compounds was performed in the catalytic site of MPro. 104 compounds with the best binding affinity were subjected to in silico predictions of pharmacokinetic properties, resulting in 11 ligands with a favorable profile regarding criteria such as oral bioavailability, toxicity and metabolism. This work proposes the use of Fukui reactivity indices, derived from Density Functional Theory (DFT), as a complementary tool in the screening and selection of promising ligands in the early stages of Computer-aided Drug Design. Fukui indices (f⁺, f⁻ and f⁰) allow mapping electrophilic and nucleophilic regions in candidate molecules, aiding in the identification of reactive centers with the potential to interact specifically with critical residues of the MPro catalytic site, such as Cys145 and His41. Preliminary results obtained for 3 ligands, here termed LMDM-CoV-1, LMDM-CoV-2 and LMDM-CoV-3, indicated an increase in catalytic reactivity in the complex with LMDM-CoV-1, while in the complexes with LMDM-CoV-2 and LMDM-CoV-3, a reactivity profile similar to that of the APO form was observed. Furthermore, the presence of a strongly electrophilic group at the P1 position of the substrate is important for the protease inhibition mechanism since it interacts directly with Cys145, which has a nucleophilic character, suggesting that these inhibitors may act in the inhibition of the target in a covalent manner. The results presented here reinforce the potential of Fukui reactivity descriptors as allies in the rational design of drugs and in the reduction of the chemical space during in silico screening, in addition to allowing for a more detailed description of the inhibition mechanism of the ligands on MPro.
Palavras-chave: SARS-CoV-2 Main Protease (MPro), Inhibitors, Computer-aided Drug Design, Fukui reactivity indices
#1126067

Genomic and Transcriptomic Analyses Reveal Trypanothione Reductase (TPR) as Strategic Drug Targets in Trypanosomatids

Autores: Julia Geyziana Oliveira Costa Araújo,Aila Maria Melo Correia,Maria Angelina Da Silva Medeiros,Antonio Edson Rocha Oliveira
Apresentador: Antonio Edson Rocha Oliveira • oliveiraaer@unifor.br
Resumo:
Trypanosoma cruzi and Leishmania spp. are eukaryotic unicellular organisms belonging to the Trypanosomatidae family and the Kinetoplastida order, and are the etiological agents of Chagas disease and leishmaniasis, respectively, both of which can affect humans and animals. These parasites share complex life cycles and rely on insect vectors for transmission, Triatoma infestans in the case of T. cruzi and phlebotomine sandflies for Leishmania. Both diseases can be fatal if left untreated, and current treatments are not ideal due to toxicity and the emergence of drug resistance. Therefore, there is an urgent need to develop new, effective, and safe therapeutic strategies. Tripanothione reductase (TPR) is an enzyme involved in redox metabolism in trypanosomatids, including both T. cruzi and Leishmania, and is essential for parasite survival. This enzyme protects against oxidative damage mediated by the host immune system, maintaining a reducing intracellular environment. It is analogous to human glutathione reductase, which makes it a promising drug target due to its similar biochemical pathway. Although several studies have explored TPR as a drug target, few have performed genomic and transcriptomic characterization among different strains of T. cruzi and Leishmania, or assessed how inter-strain variability may impact its potential as a pharmacological target. This project aims to conduct genomic and transcriptomic characterization of the TPR gene across different strains of T. cruzi and Leishmania to identify sequence or expression differences that may influence its application as a drug target. Genomic analysis of 30 different Trypanosoma cruzi strains representing the six DTUs (1 Tcbat, 9 TcI, 14 TcII, 1 TcIII, 1 TcV, and 5 TcVI), along with 21 different Leishmania strains available in the TriTrypDB database, revealed that the TPR gene is present as a single copy in all T. cruzi DTUs, except for DTU VI strains, which carry two copies due to their hybrid genomic nature, and as a single copy in Leishmania as well. Regarding genomic organization, analysis of 14 T. cruzi and 11 Leishmania strains showed that TPR genes are syntenic, flanked by genes encoding an ATP-dependent RNA helicase and a DNA replication licensing factor. Multiple sequence alignment revealed high conservation of the TPR protein among different strains, supporting its candidacy as a therapeutic target. RNA sequencing datasets corresponding to different life stages (epimastigote, amastigote, and trypomastigote) of T. cruzi (Dm28c, Y, CL Brener) indicated that TPR expression is highest in the epimastigote stage, followed by the amastigote and trypomastigote forms. Transcriptomic analysis of RNA-seq data from the amastigote, procyclic promastigote, and metacyclic stages of the Leishmania braziliensis strain revealed that TPR expression is highest during the metacyclic stage. These findings contribute to a comparative understanding of TPR function in both parasites and strengthen its potential as a cross-species drug target. In conclusion, our comprehensive analyses have successfully enabled the genomic and transcriptomic characterization of the TPR gene across multiple strains of T. cruzi and Leishmania, providing an essential foundation for the future development of new therapeutic interventions targeting these neglected tropical diseases.
Palavras-chave: Trypanosoma cruzi, Trypanothione reductase, drug target
★ This work is running for the Next Generation Bioinfo Award
#1126081

Lowering Barriers to Bioinformatics Pipeline Development: Insights from Hackathon nf-core 2025

Autores: Mateus Lucas Falco,Camila Rickli,LIDIANE APARECIDA FERNANDES,Christopher William Lee,Carolina Paula de Almeida,Sandra Mara Guse Scós Venske,David Figueiredo
Apresentador: Camila Rickli • lilarickli@yahoo.com.br
Resumo:
The Hackathon nf-core 2025, in Guarapuava - Brasil, aimed to promote collaborative development in bioinformatics workflows, engaging researchers, developers, and computational biologists in a hands-on environment. This event centered on the nf-core — a community-driven repository of best-practice bioinformatics pipelines built using Nextflow, a workflow system for creating scalable, portable, and reproducible analyses. The primary objective of the hackathon was reproducibility, scalability, and modularity in the analysis of high-throughput sequencing data.
In this work, we describe the implementation of a structured, collaborative framework to guide participants through developing and optimizing bioinformatics workflows using Nextflow DSL2 (Domain-Specific Language). The hackathon was designed not only as a contribution-driven, but also as an educational experience, enabling participants to engage with real-world projects while acquiring essential skills in workflow development, containerization, and version control.
The event was conducted in a hybrid format, both in-person and remotely (Gather), ensuring broad accessibility. Participants were organized into teams, each assigned a mentor who facilitated onboarding and technical support. GitHub was used as the central platform for issue tracking, version control, and pull request submissions. GitHub Codespaces served as a development environment, offering an integrated and reproducible cloud-based workspace, allowing participants to engage directly with the core teams in official nf-core Slack channels.
Before the hackathon, a training session covered Nextflow syntax, pipeline structure, and configuration customization. Everyone was invited to answer a pre- and post-event inquiry. A total of 23 individuals participated in the training, primarily undergraduate (73.9%) computer science students (78.3% from Exact Sciences) aged between 18 and 25 (69.6%). Notably, 52.2% of attendees had no prior exposure to genetics or molecular biology, and 91.3% reported this as their first encounter with the Nextflow DSL. Besides, 82.6% had prior knowledge of programming languages, and Python was the main language (69.6%). In contrast, 95.7% were already familiar with GitHub, though 60.9% had not previously used containers such as Docker or Singularity.
Over the three-day event, 12 participants completed the full hackathon experience, including nine computer science students. Only one participant reported no prior experience with programming. Post-event feedback revealed significant improvements in participants' technical proficiency, particularly with tools such as GitHub and Nextflow DSL2. Beyond technical skills, a key outcome was effective collaboration and communication strategies. Participants engaged in problem-solving, interacting directly with maintainers and peers. This strengthened the sense of community and underscored open-source software as a powerful tool for collaborative learning in bioinformatics.
A key observation was that prior knowledge in molecular biology was not a limiting factor for participation. The tasks emphasized computational problem-solving and code-based contributions rather than biological insights. This highlights the accessibility of workflow development to individuals with informatics backgrounds and the importance of interdisciplinary training environments.
In conclusion, the Hackathon successfully combined collaborative development with hands-on education, lowering the barrier for newcomers to bioinformatics and enabling meaningful contributions to open-source projects. The model implemented can serve as a blueprint for future events aiming to engage diverse technical audiences in bioinformatics software development.
Palavras-chave: Nextflow; Bioinformatic; Hackathon; Collaborative development; Reproducibility.
#1126089

Prediction of possible new Beta-Lactamases from Metagenome Assembled Genomes from Brazil’s North swine metagenomic data through the One-Health GUARANI Network

Autores: Amalia R F Lobato,MIKHAIL JAVIER SANTOS DE SOUZA,Rodrigo Cayô da Silva,Ana Cristina Gales,Rafael Azevedo Baraúna,Danielle Murici Brasiliense Seligmann
Apresentador: Amalia R F Lobato • amalialobato007@gmail.com
Resumo:
Beta-lactams are among one of the most prescribe antimicrobial agents for clinical and animal use, but the rising in antibiotic resistance in bacteria poses a serious threat to the treatment of future infections, primally caused by bacteria that produce Beta-lactamases, enzymes capable of degrading Beta-lactams. One fundamental approach is the One Health perspective, which, combined with new sequencing technologies can help understand and monitor the spread of antibiotic resistance between the environment, animals and humans. For that purpose, 80 MAGS were retrieved from NCBI, specifically from Gut Microbiome of Food-Producing animals and Humans produced by the GURANI network. The assembled MAGs originated from swine farms in Castanhal, Pará, Brazil. The data was BLASTed against ResFinder with an 80% identity threshold for acquired antimicrobial resistance genes, and against CARD using the loose algorithm. The results were then confirmed on the Beta-Lactamase DB. Swiss-Model was used to generate a possible 3D conformation of the Beta-Lactamases. A potential new Extended Spectrum Beta-Lactamase (ESBL) was predicted from an Ochrobactrum genus MAG, featuring 387 amino acids and an 83.78% identity to blaOCH-3, a chromosomal class C ESBL described in 2001. From SWISS-MODEL, the closest match had 71.83% identity to a constructed model and a 0.94 GMQE. The second ESBL was predicted from a Desulfovibrio desulfuricans MAG, also from a Castanhal farm with 315 amino acids and an 86.28% identity to blaDES-1, first described in 2002 as a class A Amber Beta-Lactamase, showing 99.37% of similarity to an already constructed model and a 0.89 GMQE from the SWISS-MODEL conformational prediction. The two proposed new Beta-Lactamases have less than 90% identity with other ESBLs, but further studies are necessary to fully describe the enzymes and their potential impact on environmental, animal and clinical spread. New approaches to analyzing already public data can uncover valuable insights into the complexities of human, animal, and environmental relationships, as well as help monitor potential new antibiotic resistance mechanisms.
Palavras-chave: Beta-Lactamase, Bacteria, Metagenomics
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126090

Conformational characterization of TMEM176B ion channel for different oligomeric states and under distinct pH conditions

Autores: Yolanda Maria Barros Marcello,SIMONE QUEIROZ PANTALEAO,Roberto carlos navarro quiroz,Angelo J. Magro,Ignacio J. General,Marcelo Hill,Ana Ligia Scott
Apresentador: Yolanda Maria Barros Marcello • yolanda.marcello@ufabc.edu.br
Resumo:
This study investigates the structural dynamics and metastable states of the transmembrane protein TMEM176B, with a focus on its role as a non-selective monovalent cation channel. TMEM176B is known to inhibit the NLRP3 inflammasome by regulating cytosolic ionic concentrations, and silencing of the Tmem176b gene has been shown to enhance caspase-1/IL-1β–mediated antitumor immunity. Starting from structural models predicted by AlphaFold, we employed a combination of methodologies—including Normal Mode Analysis (NMA) and Molecular Dynamics (MD) simulations—to explore the protein’s flexibility and conformational transitions across different oligomeric states and pH conditions, with the aim of proposing a plausible native model. NMA results indicated that the tetrameric form of TMEM176B exhibits higher global structural flexibility, with increased RMSF values particularly at pH 4.0. This trend was also observed in the dimeric and trimeric forms, suggesting a consistent influence of acidic environments on conformational mobility. Energy minimization profiles along normal modes revealed regions of enhanced dynamic coupling. Principal Component Analysis (PCA) demonstrated a clear separation between initial and more open conformational states, especially under low pH conditions. Interatomic distances computed from visually selected residues further corroborated the conformational shifts induced by vibrational modes. Based on these findings, vibrational modes with the most significant conformational impact were selected for MD simulations with excited normal modes (MDeNM). The simulations preliminarily revealed structural behaviors consistent with ion channel activity for the tetrameric and trimeric forms, such as coordinated helical twisting. Furthermore, in ongoing Steered Molecular Dynamics (SMD) simulations with constant velocity pulling, initial results suggest a favorable conduction pathway for sodium ions over potassium ions considering the extra to intracellular direction for the tetramer. This research provides novel insights into the mechanistic role of TMEM176B in immune regulation and highlights promising directions for future investigation.
Palavras-chave: TMEM176B. Ion channel. Conformational heterogeneity. Oligomeric states. pH conditions.
★ Running for the Qiagen Digital Insights Excellence Awards
#1126092

Comprehensive In Silico Evaluation of MUC16 Variants in Gastric Cancer: Comparative Analysis Between a Brazilian Cohort and TCGA-STAD Cohort

Autores: Arthur Felipe Vasconcelos Ferreira Reis,João Ricardo Guerreiro Duarte,Ana Katarina Campos Nunes,Gilderlanio Santana de Araujo,ANDREA KELY CAMPOS RIBEIRO DOS SANTOS,SIDNEY EMANUEL B DOS SANTOS
Apresentador: Arthur Felipe Vasconcelos Ferreira Reis • arthurreis.ufpa@gmail.com
Resumo:
Gastric adenocarcinoma is a malignant disease that originates in stomach epithelium and frequently progresses through local invasion and distant metastasis. It comprises two major histological types – intestinal and diffuse – each with distinct clinical and molecular profiles. Mucins, particularly transmembrane mucins, are increasingly recognized as relevant biomarkers in gastric cancer. Among them, MUC16 is a large and heavily glycosylated mucin commonly studied in ovarian cancer, but emerging evidence has suggested its importance in gastric tumorigenesis and immune modulation. In this study, we conducted a comparative in silico analysis of MUC16 missense variants observed in two gastric cancer cohorts: one from the Hospital Universitário João de Barros Barreto (HUJBB) 158 missense variants in 95 individuals and the other from TCGA-STAD 213 variants in 137 individuals. Notably, MUC16 ranked as the fourth most frequently mutated gene in the TCGA cohort, with a mutation frequency of approximately 33.5%.
To assess the potential pathogenicity of these variants, we employed the AlphaMissense deep-learning predictor and the MetaRNN meta-predictor. Most variants of unknown significance from both cohort (VUS) were classified as likely benign (TCGA – 190, and HUJBB – 123). Only four missense mutations in TCGA – p.Leu13043Pro, p.Gly12815Val, p.Phe12546Val, and p.Val12344Asp – were predicted as pathogenic by both tools. Also, we assessed the distributions, through a scatter plot, of both cohort variants among the 275,633 predictions made by AlphaMissense for the MUC16 gene. Interestingly, in such distribution, MUC16 has 16.4% possible pathogenic variants, 61.8% possible benign variants, and 21.8% ambiguous ones. These values reinforce MUC16 as a mutational hotspot with potential clinical relevance. We further explored the molecular mechanisms potentially affected by these mutations using the MutPred2 tool. In the HUJBB cohort, 18 variants were associated with molecular mechanisms, 83.3% of which involved gain or loss of post-translational modification (PTM) sites, such as glycosylation, phosphorylation and ubiquitination. Similarly, in the TCGA database, 181 variants showed an association with molecular mechanisms, 66.3% of which were also related to PTM’s. Other mechanisms identified included changes in structural dynamics, catalytic activity and macromolecule-binding domains. It is particularly interesting to note that all four variants predicted to be pathogenic – p.Leu13043Pro, p.Gly12815Val, p.Phe12546Val and p.Val12344Asp – were associated with relevant structural changes, such as gain or loss of intrinsic disorder and stability. These findings suggest a possible functional relevance of these mutations at the protein level, reinforcing their potential biological impact. These findings demonstrate that MUC16 is frequently mutated in gastric cancer across diverse populations, with a significant portion of mutations potentially impacting protein function through PTM alterations and structural changes. The integration of structural bioinformatics and population-based variant analysis provides valuable insights into the functional roles of MUC16 mutations in gastric tumor biology especially in the context of tumor mutational burden.
Palavras-chave: MUC16, Gastric Cancer, missense variants, structural analysis, molecular mechanisms.
#1126099

Transcriptomic screening strategies for non-target sequences: A case study of a cleptoparasitic bee and its host

Autores: Heraldo Mauch,Paulo Cseri Ricardo,Maria Cristina Arias
Apresentador: Heraldo Mauch • heraldomauch1@gmail.com
Resumo:
Transcriptomics and genomics have become pivotal in understanding molecular and evolutionary issues of bees. Sequences from organisms associated with the target species—such as bacteria, fungi, viruses, or even plants visited by bees—are often inadvertently captured. The standard practice has been to discard these “contaminant” sequences. However, recent studies have highlighted the profound impact that these associated organisms can have on the target species, potentially offering valuable insights into ecological relationships. In this context, we aimed to establish an effective pipeline for identifying non-target sequences in bees using RNA-Seq data. For this purpose, we analyzed publicly available datasets of the cleptoparasitic bee Coelioxoides waltheriae and its hosts, Tetrapedia diversipes, comprising three pools of samples per species. Two distinct screening strategies were compared. In the first, referred to as Total Assembly (TA), all trimmed reads from each pool were de novo assembled using Trinity. The resulting contigs were then subjected to BLASTN alignments against custom-built sequence databases representing NCBI taxonomic groups considered potential contaminants (Acari, Archaea, Bacteria, Fungi, “SAR supergroup”, Viridiplantae and Viruses). Sequences with significant alignments were further validated for homology by BLASTN alignment against the NCBI nucleotide (nt) database. The second approach, filtered assembly (FA), consisted of mapping reads from each sample against the already custom-built databases using Magic-BLAST. Only the mapped reads were assembled with Trinity, and the resulting transcripts were subsequently aligned to the nt database. Taxonomic annotation for both approaches was performed using TaxonKit. Finally we compared the diversity of organisms found among the bee species. The TA approach recovered 1,297 identified as non-target (“contaminant”) sequences from the C. waltheriae pools (mean percentage identity [PID] = 90.78%) and 7,561 from the T. diversipes pools (mean PID = 86.17%). The FA approach, in turn, yielded 680 transcripts from C. waltheriae (mean PID = 94.91%) and 1,605 from T. diversipes (mean PID = 90.5%). TA recovered a greater diversity of plant families, 77 for T. diversipes and 26 for C. waltheriae, than FA, which recovered 37 and 16 families, respectively. Similarly, for microorganisms, TA outperformed FA by retrieving a higher number of families, 105 for T. diversipes and 111 for C. waltheriae, compared to 12 and 77 recovered by FA. These patterns of diversity between host and parasite are consistent with previous studies. The TA approach proved to be more effective method for broad non-target sequences screening; however, the FA method provided greater resolution in the taxonomic identification of sequences. Ultimately, this study demonstrates that sequences traditionally dismissed as contaminants can unveil ecological networks, offering valuable new insights into species biology.
Palavras-chave: bee, cleptoparasitism, microorganisms, transcriptomics
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126104

Development of a reliable methodology for calling variants on the X chromosome

Autores: Giovanne de Aguiar Barra,Giovanna Câmara Giudicelli,Eduardo Martin Tarazona Santos,Rafael Pereira Tou
Apresentador: Giovanne de Aguiar Barra • giodeaguiar7@gmail.com
Resumo:
Accurate variant calling in X and Y chromosomes using Next-Generation Sequencing (NGS) data remains a great challenge in human studies. Consequently, most current studies prioritize the autosomal chromosomes while excluding the sex chromosome data. Unlike autosomes, the sex chromosomes present some features that complicate the analyses, such as differences in ploidy between XX and XY individuals, and the presence of highly similar regions among X and Y, like Pseudoautosomal Regions (PARs) and the X-Transposed region. These factors can lead to mapping artifacts, misclassification of variants, and an inflated number of false heterozygous calls, especially when we do not consider the non-pseudoautosomal region of the X chromosome as diploid in XY individuals. In this study, we propose to develop a reliable and reproducible methodology for variant calling in the X chromosome that mitigates the early problems mentioned. Preliminary results from our study include a literature review aimed at identifying available tools to address the challenges of sex chromosome variant calling. This search revealed that most existing tools do not fulfill our specific requirements, with XYAlign emerging as the only one aligned with our goals. We tested XYAlign and found no improvements in read mapping and depth over the sex chromosomes; also, the issue of inflated heterozygosity in XY samples remained unresolved. Additional steps have included running the complete variant calling pipeline commonly used in the laboratory (based on GATK’s best practices) to compare outcomes with and without XYAlign preprocessing. Since the XYAlign tool did not present satisfactory results for the intended analysis, we decided to implement a customized pipeline and executed it manually to ensure greater control and accuracy in data processing. We begin by treating the reference genome by applying hard masking to the PAR regions on the Y chromosome to prevent duplicated alignments and improve mapping accuracy. Following this step, we employ a dedicated software tool to infer the chromosomal sex of each sample, enabling accurate variant calling and allowing the procedure to be automatically adjusted according to whether the individual is XX or XY. We tested two complementary approaches to deal with distinct regions: (1) adapting ploidy settings within the variant calling software to treat the non-PAR region of the X chromosome as haploid in XY individuals, and (2) removing redundant reads by identifying sequences that align equally well to both X and Y chromosomes, particularly in the XTR region. Furthermore, we have generated VCF files using both the default ploidy configuration and a customized ploidy setting for the X chromosome, allowing us to assess the impact of this adjustment on heterozygosity rates and variant detection. As perspectives, to validate our results, variant calls derived from whole-genome sequencing will be compared against SNP array data from matched samples generated in laboratory projects. Additionally, the hap.py tool will be used to benchmark our results against a well-established truth set.
Palavras-chave: Sexual chromosomes, Next-Generation Sequencing, Pseudoautosomal Regions, Variant Calling
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126105

Computational Decoding of lncRNA Architecture in Highly Repetitive Hybrid Genome

Autores: Joel Nunes Leite Junior,Helba Cirino de Souza Barbosa,Marie-Anne van Sluys
Apresentador: Joel Nunes Leite Junior • joelnunes@usp.br
Resumo:
Long non-coding RNAs (lncRNAs) interact with different types of molecules and play important regulatory roles in gene expression. However, characterizing them in the complex genome of sugarcane poses significant challenges due to: (1) high levels of chromosome ploidy, (2) a hybrid nature incorporating chromosomes from both parental lineages, (3) the presence of recombinant homeologous chromosomes, and (4) significant gene copy number variation and repetitive sequences. These genomic complexities create substantial obstacles to accurate transcript assembly and annotation. In this study, we performed de novo assembly of RNA-Seq data from a sugarcane cultivar under biotic stress, predicting 148,803 lncRNAs through strict computational filtering. When mapped to the R570 reference genome, 44,502 predicted lncRNAs exhibit 100% coverage and widespread chromosomal distribution. From this subset, 43,017 lncRNAs were found in 1 to 50 copies across the genome. Although the R570 genome contains approximately 12 hom(e)ologous chromosomes, 16,285 lncRNAs were mapped exclusively to a single chromosome. Accordingly, we observed a non-standard presence of lncRNAs among hom(e)ologous chromosomes, indicating that, despite their shared ancestry, these chromosomes possess distinct sets of functional transcripts. These findings offer insights into lncRNA organization in complex genomes and establish a foundation for future functional studies in polyploid genomes.
Palavras-chave: Non model, Poaceae
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126118

Predicting climate change impacts on soybean using machine learning

Autores: Janaina da Silva Fortirer,Adriana Grandis,Carmen Palacios Jara,Débora Pagliuso,Leandro Francisco de Oliveira,Eveline Queiroz de Pinho Tavares,Lauana Pereira De Oliveira,Plínio Barbosa de Camargo,Eny Iochevet Segal Floh,Cibele Maria Russo Novelli,Marcos Buckeridge
Apresentador: Janaina da Silva Fortirer • psjana@usp.br
Resumo:
Soybean (Glycine max) is a globally significant crop due to its contributions to food security, human and animal nutrition, and renewable resources. However, its production and grain quality are increasingly threatened by climate change. This study investigates the isolated and combined effects of elevated atmospheric CO₂ (eCO₂), high temperature, and drought on soybean productivity and grain composition. Experimental data were obtained from Open-Top Chamber (OTC) experiments designed to simulate these stressors in various combinations. To model soybean responses under climate change scenarios, we employed statistical and machine learning techniques, including Generalized Linear Models (GLMs) with a gamma distribution (implemented in R), as well as the Extreme Gradient Boosting algorithm (XGBoost) and CatBoost (implemented in Python using the xgboost and catboost libraries, respectively). Continuous variables were normalized using min–max scaling to ensure consistent feature ranges. Our results show that XGBoost achieved the best overall predictive performance for both soybean production and grain quality. Under the combined stress treatment (eCO₂ + temperature + drought), CatBoost achieved the lowest root mean square error (RMSE = 2.33) for grain production, followed by XGBoost (RMSE = 4.06), while GLM showed lower accuracy (RMSE = 6.78). For grain quality, XGBoost outperformed other models with the lowest RMSE (0.28), particularly under the triple stress scenario, whereas GLM again performed the worst (RMSE = 0.39). All models exhibited reduced performance when temperature was evaluated in isolation. However, the interaction between eCO₂ and temperature improved model accuracy, emphasizing the importance of accounting for synergistic environmental effects. XGBoost proved especially effective in capturing these complex interactions between environmental stressors and plant responses.
Palavras-chave: Modeling, Experimental Data, Grain production, Grain quality
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126131

Draft genome assembly of Fusarium oxysporum f. sp. cubense subtropical race 4 (STR4) and race 1 (R1) isolates from Brazil, causal agents of Fusarium wilt of bananas

Autores: Erica de Castro Costa,Lucas Santos Bastos,Rutiane Moreira de Jesus Costa,Marco Pessoa Filho,Roberto Coiti Togawa,Robert Neil Gerard Miller
Apresentador: Erica de Castro Costa • ericaccosta@outlook.com
Resumo:
Fusarium oxysporum f. sp. cubense (Foc), the causal agent of Fusarium wilt in banana (Musa spp.), is present in approximately 47% of producing regions worldwide and responsible for substantial yield losses. Foc is classified into four races: Race 1 (R1), which is pathogenic to the cultivars ‘Gros Michel’ (AAA), ‘Silk’, and ‘Pome’ (AAB); Race 2 (R2), which affects ‘Bluggoe’ (ABB) cultivars; and Race 4 (R4), which infects all cultivars including the ‘Cavendish’ (AAA) subgroup. Race 4 is further subdivided into Tropical Race 4 (TR4) and Subtropical Race 4 (STR4), with the latter being less aggressive and causing disease in ‘Cavendish’ only under subtropical conditions. A total of 24 Vegetative Compatibility Groups (VCGs) have been identified among Foc isolates. In Brazil, R1, R2, and STR4 are present, while TR4 remains absent and is classified as a quarantine pest. Foc STR4 VCG 0120 is the most widely distributed in Brazilian banana-producing areas. Genomic data for Brazilian Foc isolates, including STR4, remains limited. Given the current distribution of Foc races in Brazil, genome assemblies were generated for the Foc STR4 isolate 218A and Foc R1 isolate 0801 CNPMF. Total DNA was extracted from mycelial tissue using a phenol/chloroform protocol. High molecular weight DNA (>20 Kb) was used to construct PacBio HiFi libraries, and whole-genome sequencing was conducted using the PacBio Revio platform, yielding 4.5 GB of HiFi data per sample. A total of 1,176,886 and 1,731,801 reads were obtained for isolates 218A and 0801, respectively, with average read lengths of 8,925 bp and 8,574 bp, and mean HiFi read quality values of Q32. Genome assemblies were generated using HiFiasm v.0.25.0 [command: hifiasm -o file.asm -t 10 -l0]. Assembly quality was assessed using Benchmarking Universal Single-Copy Orthologs (BUSCO v.5.8.2), employing the fungi_odb12 dataset, euk_genome_min mode, and miniprot as the gene predictor. The final genome assembly of STR4 isolate 218A comprised 234 contigs with a total size of 50.4 Mb, an average contig length of 215,500 bp, an N50 of 4,648,718 bp, and a GC content of 47.12%. The genome of R1 isolate 0801 was 58.1 Mb in size, with 299 contigs, an average sequence length of 194,336 bp, an N50 of 4,467,021 bp, and a GC content of 46.79%. BUSCO analysis indicated genome completeness scores of 99.4% and 99.1% for isolates 218A and 0801, respectively, suggesting high accuracy and complete assemblies. The 12 largest contigs ranged from 1.1 to 6.22 Mb in 218A and from 2.5 to 6.7 Mb in 0801. The remaining contigs ranged from 11.7 to 86.9 Kb in 218A and from 10.8 to 880 Kb in 0801. The smallest contigs likely correspond to non-nuclear DNA fragments, such as mitochondrial or ribosomal sequences, and may represent contaminants. Further analyses will be undertaken to refine the genome assemblies, facilitating comparative studies with reference genomes available in the NCBI database. These data will contribute to the identification and characterisation of candidate effector genes, with a focus on their specificity and differential expression profiles across Foc races during host–pathogen interactions in banana.
Palavras-chave: Musa, WGS, long-read sequencing, bioinformatics.
★ Running for the Qiagen Digital Insights Excellence Awards
#1126134

Genetic Analysis of Rare and Common Variants Associated with Type 2 Diabetes in an Admixed Brazilian Cohort

Autores: Diogo Maciel Duarte da Mota,Raphael Montanari,Alexander Augusto de Lima Jorge,João Carlos Setubal
Apresentador: Diogo Maciel Duarte da Mota • diogomdmota@gmail.com
Resumo:
Type 2 diabetes (T2D) is a multifactorial metabolic disorder with a strong genetic component and growing prevalence worldwide, especially in low- and middle-income countries. In Brazil, the genetic admixture of the population presents a valuable opportunity to investigate the genomic architecture of T2D beyond the traditional Eurocentric perspective. This study has focused on identifying and characterizing both rare and common genetic variants associated with T2D in a subset of the SABE cohort, part of the ABraOM database, comprising 1,171 unrelated elderly individuals from São Paulo, Brazil's most populous city.

We analyzed single nucleotide polymorphisms (SNPs) in three gene groups: those identified in genome-wide association studies (GWAS_GENES), genes linked to monogenic diabetes (MODY_GENES), and genes associated with insulin resistance and lipodystrophy (IR_GENES). Functional annotations were performed using silico prediction tools, prioritizing nonsynonymous variants, loss-of-function (LOF) mutations, and variants with high deleteriousness scores (CADD > 20). Altogether, 6,044 potentially damaging variants were identified across 503 GWAS genes, 48 MODY genes, and 11 IR genes.

Genotypic comparisons between individuals with and without T2D revealed a nominal enrichment of rare variants in specific genes—14 in the GWAS group and 3 in the MODY group. Regarding common SNPs, 63 showed statistically significant odds ratios, with 82.5% aligned with the direction of effect reported in original GWAS. Interestingly, 11 SNPs exhibited reversed associations, potentially reflecting ancestry-related differences or gene–environment interactions specific to the Brazilian population. Notably, the SNPs rs184509201 (in TCF7L2) and rs6723108 (in TMEM163) showed the strongest associations, each with ORs approaching 2.8.

To evaluate the reliability of these findings, a power analysis using the epiR package was conducted. Results indicated that the cohort is underpowered for most rare variant analyses, with only one SNP (rs6026382 in APCDD1L) reaching the conventional power threshold of 80%. Based on this, we estimate that the ideal sample size should be at least 100 times larger to robustly detect rare variant associations.

This study highlights the importance of including genetically diverse populations in genomic research. Our findings contribute to a more inclusive understanding of T2D risk and reinforce the urgency of expanding multiethnic studies to advance precision medicine in diabetes care.
Palavras-chave: Type 2 Diabetes (T2D), Genetic Variants, Genome-Wide Association Studies (GWAS), Single Nucleotide Polymorphisms (SNP), Brazilian Population
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126135

Overcoming AMR Surveillance Limitations: An Explainable Deep Learning Model for Resistance Protein Annotation

Autores: Tiago Cabral Borelli,Ricardo Roberto da Silva
Apresentador: Tiago Cabral Borelli • tiago.borelli@usp.br
Resumo:
Antimicrobial resistance (AMR) is one of the most concerning modern threats as it places a greater burden on health systems than HIV and malaria combined. Current surveillance strategies for tracking AMR work through genomic comparisons and depend on sequence alignment with strict similarity cutoffs (>95%). Therefore, these methods have high false negative rates due to insufficient reference sequences to cover AMR protein diversity. Deep learning (DL) has been used as an alternative to sequence alignment, as artificial neural networks (ANNs) can extract abstract features from data and therefore limit the need for sequence comparisons. Here a convolutional neural network (CNN) was trained to differentiate antimicrobial resistance proteins from nonresistance proteins and functionally annotate them in nine resistance classes. Our model demonstrated higher recall values (>0.9) than the alignment-based approach for all protein classes tested. Additionally, our CNN architecture allowed us to investigate its neurons' firing patterns and explain the model's working principles regarding the importance of protein domain features related to antimicrobial molecule inactivation. Finally, we built an open-source bioinformatic tool that can be used to annotate antimicrobial resistance proteins and provide information on protein domains without sequence alignment.
Palavras-chave: Antimicrobial resistance, Protein annotation, Deep learning, Convolutinal neural network, Explainable AI
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126143

NGRDB: The Nelore Genomic Resource Database for Data Integration and Cattle Genomics Research

Autores: Gabriel Alexander Colmenarez Pena,Mauricio de Alvarenga Mudadu,Juliana Afonso,Jennifer Jessica Bruscadin,Tainã Figueiredo Cardoso,Aline Silva Mello Cesar,Luiz Lehmann Coutinho,Gerson Barreto Mourão,Luciana Correia De Almeida Regitano,Adhemar Zerlotini Neto
Apresentador: Gabriel Alexander Colmenarez Pena • gabrielalexandercolmenarezp@gmail.com
Resumo:
Brazil is one of the world's leading beef exporters. In this context, the Nelore breed (Bos indicus) plays a crucial role in the Brazilian livestock industry, accounting for almost 80% of the national cattle herd. Genomic characterization of Nelore cattle has become a key area of economic interest. However, the fragmented publication of the datasets has limited a comprehensive exploration of the data. To this end, we present the Nelore Genomic Resource Database (NGRDB), a platform developed to promote integrative genomic research specific to the Nelore breed. The database is based on two Bos taurus reference genomes, i.e., ARS-UCD 1.3 and UMD 3.1, with 23,842 and 19,994 annotated coding-genes, respectively. Structural and functional annotations are represented, including more than 785,000 single-nucleotide variants (SNV), along with 94,356 (ARS-UCD 1.3) and 104,326 (UMD 3.1) quantitative trait loci (QTL). In addition, the resource includes information of ~4,000 copy number variants (CNVs), non-coding RNAs, including 18,800 long non-coding RNAs (lncRNAs), 1,014 microRNAs (miRNAs), and more than 11,000 other cRNA genes cataloged. Protein coding regions were annotated through DIAMOND BLAST searches on NCBI and UniProt (Swiss-Prot and TrEMBL), with structural domains identified through InterProScan. The NGRDB, version 1.1.0, is the first centralized repository adapted to Nelore cattle, bringing together general cattle annotation, tissues expression profiles, differentially expressed mRNA and miRNA, SNV variants and QTL regions associated with different phenotypes of agricultural interest. The database integrates 15 curated transcriptomic datasets derived from Nelore cattle tissues (e.g. muscle, liver), allowing users to explore links between genomic features and gene expression profiles associated with traits such as feed efficiency, muscle growth, mineral composition, and intramuscular fat deposition. Built on the open-source Machado framework, the NGRDB supports advanced search capabilities, user-friendly navigation, and programmatic access to data via a dedicated API. A literature module links each dataset to its corresponding article's DOI, promoting traceability and contextual understanding. Feature visualization works through an embedded JBrowse, which allows interactive exploration of the entire genome, including genes, QTLs, SNVs, and CNVs. Users can browse annotated regions, access metadata, and download sequence-level data in FASTA format. Designed for continuous updates and expert curation, the NGRDB aims to serve as a collaborative repository for the Nelore research community. Organizing Nelore genomic data facilitates comparisons between studies and improves data reuse in functional genomics. Finally, the NGRDB provides a fundamental tool for accelerating precision breeding and genomic selection in Nelore cattle. The NGRDB version 1.1.0 is still under development, but version 1.0.0 is available at https://www.machado.cnptia.embrapa.br/nelore/
Palavras-chave: Bos indicus, database, regulatory annotation, genome browser.
#1126144

Differential Expression Analysis in Clear Cell Ovarian Cancer Data

Autores: Douglas Nunes da Silva,Epitácio Dantas de Farias Filho,Rodrigo Juliani Siqueira Dalmolin
Apresentador: Douglas Nunes da Silva • douglas.nunes.132@ufrn.edu.br
Resumo:
Clear cell ovarian carcinoma (OCCC) is a rare and aggressive histological subtype of epithelial ovarian cancer, accounting for approximately 5 to 10% of cases of this neoplasm. It is characterized by a highly chemoresistant phenotype, highlighting the urgent need to identify more effective therapeutic approaches. Unlike other ovarian cancer subtypes, OCCC presents a distinct molecular profile, frequently associated with mutations in the ARID1A, PIK3CA, and PTEN genes, as well as alterations in the Hippo pathway and HIF-1α signaling, suggesting specific mechanisms of oncogenesis and tumor progression. The scarcity of effective therapies for this subtype underscores the need for a deeper understanding of its molecular biology in order to enable the development of new therapeutic strategies. In this study, we performed a differential expression analysis using RNA-seq data from the Genotype-Tissue Expression (GTEx) project as controls and from patients with gene expression profiles of epithelial ovarian cancer (PRJNA783540), both available in public databases. Data processing included normalization, filtering, and statistical analysis using the DESeq2 package, considering an adjusted p-value threshold of < 0.05. Preliminary results showed that differentially expressed mRNAs are linked to pathways involved in hypoxia response, chromosome segregation, xenobiotic response, and tissue metabolism. On the other hand, lncRNAs were mainly associated with the regulation of lipase and phospholipase activity. These findings suggest a complex molecular profile in OCCC, where hypoxia promotes cell survival and treatment resistance, xenobiotic response emerges as a key mechanism of chemotherapy resistance, and lncRNAs regulating lipases and phospholipases indicate a dependence on lipid metabolism for tumor maintenance. These combined factors make OCCC a particularly challenging cancer, but also open avenues for new therapeutic strategies, including the modulation of lipid metabolism and resistance mechanisms. Experimental validation and correlation with clinical data will be essential to elucidate the impact of these molecular signatures on disease progression and the development of personalized treatments for patients with OCCC.
Palavras-chave: Keywords: Ovarian Cancer, Clear Cell Carcinoma, Gene Expression, RNA-seq, Biomarkers
#1126155

IMMUNOINFORMATICS ANALYSIS APPLIED TO SURFACE PROTEINS OF Leishmania infantum

Autores: Davi Salles Xavier,João Marcelo Pereira Alves
Apresentador: Davi Salles Xavier • davisalles528@gmail.com
Resumo:
Considering the relevance of surface proteins in parasitic development and direct interaction with the host organism, certain approaches are intended to utilize these components as potentiating vaccines for diseases of global importance. Some in silico screening strategies can facilitate the selection of these targets and circumvent the challenge of antigenic complexity, selecting targets that will be more easily recognized by the immune system (such as externalized proteins, for example) and constructing multi-epitope compounds. Leishmaniasis is a neglected pathology responsible for nearly two thousand new cases and between 20 and 30 thousand deaths annually. The visceral manifestation is characterized as a more severe form of the disease and is caused mainly by Leishmania infantum, which still does not have a well-established therapeutic or vaccination model. That said, this project presents a reverse vaccinology pipeline for the initial screening of protein targets with vaccine potential and subsequent submission of these proteins to epitope prediction for the construction of a chimeric protein. For this, the proteome of L. infantum was selected from the UniProt database (UP000008153), and submitted to a succession of tools aimed at selecting (i) surface or extracellular proteins ( performed by DeepLoc), (ii) wide distribution of orthologs (performed by OrthoFinder), which shows (iii) low similarity with the host (performed by BLASTp), and which have (iv) negative prediction for allergenicity (performed by AlgPred) and (v) positive for antigenicity (performed by VaxiJen). Seven proteins were identified and some of them have already been experimentally associated with essential functions in the survival of the parasite and in establishing the relationship with the host cells, according to the literature. Others are hypothetical proteins that may have novel vaccine applicability. The chimeric construction from these proteins was first performed by cleavage of signal peptides (performed by SignalP) and subsequent prediction of T CD4+ (performed by netMHCII), T CD8+ (performed by netCTL) and B (performed by BepiPred) epitopes, using GDGDG linkers to join them and EAAAK linker for adding an adjuvant (resuscitation promoting factor (RpfB) from M. tuberculosis) at the N-terminal region. After that, some structural analysis was performed, such as modeling (performed by I-TASSER), conformational epitopes prediction (performed by ElliPro), docking with a toll-like receptor (performed by ClusPro), and molecular dynamics (performed by GROMACS). The constructed chimeric protein was predicted to be capable of generating both cellular and humoral immune response and has a high probability of creating a stable interaction with the toll-like receptor. Those features are consistent with a potential novel vaccine for Leishmania infantum.
Palavras-chave: reverse vaccinology, immunoinformatics, Leishmania infantum
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126157

Beyond the immune microenvironment: a preliminary exploration of malignant cells and tumor antigens in scRNA-seq data

Autores: Thayana Vieira Tavares,Jorge Estefano de Santana Souza,André Luis Fonseca Faustino
Apresentador: Thayana Vieira Tavares • thayana.vt@gmail.com
Resumo:
In contrast to bulk transcriptomics, single-cell RNA sequencing (scRNA-seq) enables high-resolution characterization of cellular heterogeneity across complex tissues. In cancer research, this technology is commonly applied to study the immune microenvironment; however, malignant cells — despite being present in the same datasets — are often underexplored. This gap limits antigen discovery efforts and the understanding of tumor-intrinsic programs with immunotherapeutic relevance. In this study, we propose a reanalysis of publicly available scRNA-seq datasets to characterize tumor cell populations and explore their potential role in shaping immune responses within the tumor microenvironment. Thus, we leverage data from the Yost et al. (2019) cohort, which comprises 46 tumor samples from 15 patients diagnosed with either basal cell carcinoma (BCC) or squamous cell carcinoma (SCC). To ensure data standardization, we processed raw sequencing files using Cell Ranger (v9.0.0) to generate gene expression matrices. Quality control was conducted at both the sample and cell levels. At the sample level, libraries were filtered based on transcript abundance (mean reads per cell ≥25,000), molecular complexity (median Unique Molecular Identifiers ≥1,000), gene coverage (median of genes per cell ≥900), and cell throughput (≥300 cells per sample). At the cell level, we retained only cells with 300–7,500 detected genes and <25% mitochondrial gene content. After filtering, normalization, and scaling using Seurat (v5.2.1), 67,633 high-quality cells across 31 samples were retained for downstream analysis. Dimensionality reduction and clustering were performed using standard workflows in the Seurat package. Cell annotation was conducted using Celltypist with the “Immune_All_High” pre-trained model, identifying 11 cell types. In summary: T cells (n=39,639), B cells (n=7,247), Epithelial cells (n=6,689), Plasma cells (n=3,434), Macrophages (n=2,545), Fibroblasts (n=1,905), dendritic cells (DC, n=1,911), Innate lymphoid cells (ILC, n=1,736), Plasmacytoid dendritic cells (pDC, n=1,108), Endothelial cells (n=989), and Mast cells (n=430). Among the annotated populations, epithelial cells formed two transcriptionally distinct clusters consistent with a CD45⁻ / CD3⁻ phenotype and expression markers (MART-1, BCAM, and TP63), suggesting their malignant origin. Our next aim is to investigate expression signatures derived from known and potential tumor-associated antigens. This analysis is underway, integrating data from our prior pan-cancer catalog of cancer-testis antigens (Silva & Fonseca, 2017) with established in-house expertise in tumor antigen prioritization. Our preliminary results have revealed the expression of PRAME and SPAG9 within distinct malignant clusters. However, SPAG9 was also expressed across multiple non-malignant cell types, suggesting a broad expression pattern that may limit its clinical applicability. In conclusion, our analysis underscores the importance of reevaluating scRNA-seq datasets beyond immune profiling, with a focus on tumor-intrinsic programs and antigenic features. By leveraging single-cell resolution, this approach circumvents limitations of bulk transcriptomics and offers a refined strategy for identifying tumor antigens with potential clinical relevance.
Palavras-chave: scRNA-seq, malignant cells, basal cell carcinoma, squamous cell carcinoma
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126180

The Use of Machine Learning Algorithms for the Early Detection of Spasticity in HTLV-1 Using Clinical-Laboratory Data and IL-6 Quantification

Autores: Laryssa Bandeira de Melo Silva,Matheus Azevedo Bomfim,Gabriel Freitas Araújo,Patricia Muniz Mendes Freire de Moura,Paula M. Magalhães,José Anchieta de Brito,João Pacifico Bezerra Neto
Apresentador: Laryssa Bandeira de Melo Silva • laryssa.bandeira@upe.br
Resumo:
Infection by the human T-lymphotropic virus type 1 (HTLV-1) can progress to HTLV-1-associated myelopathy/tropical spastic paraparesis (HAM/TSP), characterized by progressive spasticity and motor impairment. Although clinically relevant, the progression of spasticity is poorly assessed. The Functional Independence Measure (FIM) is used to evaluate functionality in these patients, but lacks effective predictive capabilities. Furthermore, some studies have shown that interleukin-6 (IL-6), a pro-inflammatory cytokine associated with neuroinflammation, may serve as a biomarker for HAM/TSP progression. Thus, machine learning (ML) algorithms may be a promising tool to identify individuals at high risk of developing spasticity. This study aims to use clinical-laboratory data and serum IL-6 levels to anticipate the occurrence of spasticity in HTLV-1-infected patients. A total of 120 patients followed at the Oswaldo Cruz University Hospital (HUOC) were evaluated, including 69 with spasticity and 51 asymptomatic. Data collection included a clinical questionnaire and laboratory tests, including IL-6 quantification. The information was organized into a dataframe and underwent cleaning, normalization, and variable transformation. Statistical tests (Spearman, t-test, linear regression) and the SelectBest method were used for feature selection, followed by calculation of the Variance Inflation Factor (VIF) to eliminate multicollinearity. Data splitting was performed using Leave-One-Out cross-validation. To address class imbalance, random under sampling was adopted, selected after testing with the imbalanced-learn library. After preprocessing and Bayesian hyperparameter optimization, nine ML models were trained. Performance was assessed using metrics such as accuracy, precision, recall, F1-score, ROC-AUC, and G-Mean, as well as learning curves and confusion matrices. Additionally, the SHAP (Shapley Additive Explanations) method was applied to identify the relative importance of variables for each model. The best-performing models were the Multilayer Perceptron (MLP) and Logistic Regression (LR), both with high recall values—a prioritized metric to minimize false negatives. MLP achieved a recall of 0.812, with accuracy, precision, and F1-score of 0.783, 0.812, and 0.812, respectively and ROC AUC (0.767). In the confusion matrix, it stood out with 13 false positives (FP) and only 10 false negatives (FN). Its learning curves indicated potential for improvement with more data, despite initial slight overfitting. LR showed a recall of 0.725, accuracy of 0.742, precision of 0.806, and F1-score of 0.763, as well as the highest ROC AUC (0.799). It registered 15 FP and 14 FN in the confusion matrix, with more stable curves suggesting good generalization. MLP highlighted hematocrit (+0.29), pain (+0.12), and urological symptoms (+0.10) as key predictors. IL-6 showed a modest contribution (+0.05), indicating a subtle inflammatory influence. In LR, urological symptoms (+0.30), gastrointestinal symptoms (+0.05), and hematocrit (+0.04) stood out, with IL-6 having a negligible impact. MLP relied more on laboratory markers, whereas LR prioritized clinical symptoms. The results demonstrate that ML models can bridge existing gaps in clinical practice by aiding in the early identification of at-risk individuals and promoting more effective and personalized interventions.
Palavras-chave: Models, variables, progression, interventions
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126181

Predictive Detection of Hemorrhagic Dengue Using Supervised Algorithms and Clinical-Laboratory Profiles

Autores: Gabriel Freitas Araújo,Laryssa Bandeira de Melo Silva,Matheus Azevedo Bomfim,Marília Gabriela Barbosa da Silva,ELIANE CAMPOS COIMBRA,João Pacifico Bezerra Neto
Apresentador: Gabriel Freitas Araújo • gabriel.freitasaraujo@upe.br
Resumo:
Dengue virus remains one of the major global public health concerns today, especially in the Americas, where it causes approximately 390 million infections and 20,000 deaths annually. The severe form, known as hemorrhagic dengue, involves plasma leakage, hemorrhages, and organ dysfunction, requiring immediate clinical intervention. Thus, early detection of patients at risk of worsening is essential to reduce morbidity and mortality. Recent studies have shown that machine learning algorithms can accurately predict the progression to severe dengue using clinical and laboratory data collected in the early stages, outperforming the conventional WHO criteria. Therefore, this study aims to evaluate the use of a predictive model based in machine learning to identify patients prone to the severe form of the disease. The collected data were structured into a dataframe, serving as the basis for preprocessing, model training, and validation. After variable processing, variables such as retro-orbital pain, nausea, pleural effusion, petechiae, ecchymosis, leukocytes and platelets were submitted to nine machine learning algorithms for training. Performance evaluation was carried out using metrics such as accuracy, precision, recall, F1-score, area under the ROC curve (ROC-AUC), and G-Mean. Among the nine classifiers tested, Decision Tree (DT) and Logistic Regression (LR) showed the best performances, respectively. DT presented the highest recall value (0.789), indicating a good ability to correctly identify severe cases of the disease—an essential factor in clinical settings. Furthermore, it showed a good F1-score (0.750) and G-Mean (0.720), reinforcing its effectiveness both in correctly identifying cases and in maintaining class balance. Additionally, LR also showed consistent performance, with the highest ROC AUC value (0.752), suggesting a high discriminative capacity between severe and non-severe cases. Although its recall (0.658) was not as high as that of Decision Tree, maintained good precision (0.725) and F1-score (0.690), highlighting its robustness as a predictive tool. In contrast, Naive Bayes (NB) showed the most unbalanced performance among the tested algorithms. Despite achieving the highest precision (0.964), its extremely low recall (0.355) indicates that it failed to identify most severe cases, making it unsuitable for the study’s main objective, which is the early detection of hemorrhagic dengue. This imbalance is further reflected in the low F1-score (0.519) and the lowest G-Mean (0.592) among all models. The results demonstrate that ML algorithms—especially Decision Tree and Logistic Regression—have great potential to aid in the early detection of hemorrhagic dengue, outperforming conventional approaches when dealing with clinical and laboratory data, enabling faster and more effective medical interventions and significantly reducing the rates of complications and mortality associated with the disease.
Palavras-chave: Health, Prediction, Models, Disease, Variables
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126186

Mathematical Modeling of Angiogenesis: Differentiation Between Tip and Stalk Cells

Autores: Júlia Vitória Ribeiro,José Carlos Merino Mombach,Pedro Ravalha Lorenzoni,Luciana Renata de Oliveira
Apresentador: Júlia Vitória Ribeiro • julia.vitoria@acad.ufsm.br
Resumo:
This study aims to construct a logical-mathematical model to investigate the molecular mechanisms regulating angiogenesis, with a focus on the differentiation between tip and stalk cells. Angiogenesis, the process of forming new blood vessels from pre-existing ones, is essential for both wound healing and pathological conditions such as cancer and cardiovascular diseases. During this process, endothelial cells respond to molecular stimuli and differentiate into tip cells—highly migratory cells located at the leading edge of the sprout that guide new vessel formation—and stalk cells, which form the tubular structure and promote its stability through intense proliferation. An open problem in this area is how a cell differentiates into stalk or sprout. In this work, we propose a logical model of the molecular interactions involved in the process of decision. Cellular differentiation is primarily triggered by the binding of vascular endothelial growth factor A (VEGF-A) to its receptor VEGFR2, which activates tip cells. These tip cells, in turn, express the protein DLL4 (Delta-like ligand 4), which binds to the NOTCH1 receptor on adjacent cells, activating the Notch signaling pathway. This mechanism inhibits VEGFR2 expression in neighboring cells, preventing them from adopting the tip cell phenotype and promoting their differentiation into stalk cells. Other key proteins involved include VEGFR1, a negative regulator that sequesters VEGF-A, and JAG1 (Jagged1), which competes with DLL4 in activating the Notch pathway and contributes to the maintenance of the stalk cell phenotype. In addition to protein-mediated regulation, microRNAs play an important role in modulating endothelial cell fate. In particular, miR-221 has been shown to inhibit tip cell behavior by downregulating VEGFR2 expression and targeting other components of the VEGF signaling pathway. By attenuating VEGF responsiveness, miR-221 contributes to the repression of tip cell features and facilitates stalk cell differentiation, adding another layer of post-transcriptional regulation to the angiogenic switch. To represent these complex interactions, a logical network was developed based on literature data and enriched with information extracted from curated databases containing experimental results. The modeling was performed using binary activation and inhibition rules (Boolean logic), with support from GINsim, used for the analysis of the network’s stable states, and MaBoSS, which enables probabilistic and time-resolved simulations based on the Gillespie algorithm. The resulting model identified multiple possible final states for endothelial cells—such as tip, stalk, or undifferentiated—and allowed the estimation of the probability of each cell fate based on the initial molecular signals. The outcomes were consistent with experimental findings reported in the literature and provide a foundation for more advanced analyses that incorporate the temporal dynamics of molecular interactions. Thus, this work offers a significant contribution to the understanding of the biological mechanisms controlling angiogenesis, with potential applications in the development of therapies aimed at modulating blood vessel formation in clinical contexts such as oncology, tissue regeneration, and biomedical engineering.
Palavras-chave: Angiogenesis, Tip cells, Stalk cells, VEGF signaling, Logical modeling, Cell fate
#1126199

Unveiling Pathogens on the Frozen Continent: Metagenomic Insights from Gentoo Penguin Feces in Antarctica

Autores: Maithê Magalhães,Daniel Andrade Moreira,Marília Alves Figueira Melo,Audrien Andrade,Aline dos Santos Moreira,Luciana Trilles,Lucas Machado Moreira,Wim Maurits Sylvain Degrave,Thiago Estevam Parente
Apresentador: Maithê Magalhães • magalhaes.maithe@gmail.com
Resumo:
Antarctica, the world's largest desert, remains largely unexplored, harboring a wealth of unidentified microorganisms with potential biological activities and biotechnological applications. Monitoring wildlife microbiomes offers valuable insights into environmental health and the spread of infectious agents across ecosystems. The FioAntar program (Fiocruz in Antarctica) aims to enhance health surveillance in the Antarctic region, contributing to the global understanding of microbial circulation.
In this study, we performed shotgun metagenomic sequencing on three fecal samples of Pygoscelis papua (Gentoo penguins) and metatranscriptomic analysis on a pooled sample collected during the 41st Brazilian Antarctic Operation (OPERANTAR XLI). Sequencing depths ranged from 20 to 45 million paired-end reads. Taxonomic classification, using Kraken2/Bracken and validated with Diamond BLAST comparisons against RefSeq, revealed a diverse microbial community.
We detected several high-priority pathogens listed by the World Health Organization, including Klebsiella pneumoniae, Escherichia coli, Acinetobacter baumannii, Mycobacterium tuberculosis, Enterococcus faecium, Pseudomonas aeruginosa, Staphylococcus epidermidis, Staphylococcus capitis, Neisseria meningitidis, Enterobacter hormaechei, Proteus mirabilis, and Salmonella enterica. These findings underscore the potential role of Antarctic wildlife as reservoirs or sentinels for the global circulation of pathogenic bacteria, reinforcing the importance of continuous microbial surveillance in extreme environments.
Our work demonstrates the utility of metagenomic approaches for pathogen detection in polar regions and contributes to integrated health strategies under the One Health perspective.
Palavras-chave: Antarctica, metagenomics,Pygoscelis papua, One Health
#1126202

Mapping Microbial Species Involved in Mucin Degradation in Neurodegenerative Diseases: A Metagenomic Analysis

Autores: Gabriela Castilho Martins,Lívia Soares Zaramela
Apresentador: Gabriela Castilho Martins • gabriela.castilho.martins@usp.br
Resumo:
The degradation of mucin by the intestinal microbiota plays a crucial role in maintaining the integrity of the gut barrier and, consequently, systemic health. Recent studies have shown that patients with neurodegenerative diseases, such as Alzheimer’s disease (AD) and Parkinson’s disease (PD), exhibit significant alterations in gut microbiota composition. However, the contribution of mucin-degrading bacteria in the context of these diseases remains poorly understood. This study aims to explore the involvement of such bacteria in the pathogenesis of neurodegenerative disorders by identifying microbial species and quantifying genes associated with mucin degradation. For this, we analyzed publicly available metagenomic shotgun sequencing data from AD and PD patients and matched healthy controls. Datasets were selected through extensive searches across multiple public repositories. Enzymes involved in mucin degradation were identified based on prior literature, and their amino acid sequences were retrieved from the NCBI Protein database. We developed a bioinformatics pipeline to process and analyze these datasets. Initial quality control of raw reads was performed using fastp, followed by removal of human-derived sequences through alignment against the human genome using Bowtie2 and samtools. Overlapping reads were merged using FLASH v1.2.11, and these were then used to guide the assembly of paired reads with MEGAHIT. Gene prediction was conducted using Prodigal, and binning was performed with MetaBAT 2 (quality of bins was assessed with CheckM). To quantify gene abundance, the predicted coding DNA sequences (CDS) were aligned back to the original reads using Bowtie2, and read counts per CDS were extracted with samtools idxstats. Functional annotation was achieved by aligning the predicted proteins against a curated database of Carbohydrate-Active enZymes (CAZymes) and Polysaccharide Utilization Loci (PULs) using DIAMOND v0.8.24. For each CDS, the best hit was selected, and relative abundance was calculated using the RPKM (Reads Per Kilobase of transcript per Million mapped reads) normalization method. Data organization and statistical comparisons between disease and control groups were conducted using Pandas and the spicystats libraries in Python. Our preliminary analysis of Alzheimer’s datasets revealed over 150 mucin-degrading enzymes or PULs with differential relative abundance between AD patients and healthy controls. Notably, the majority of these hits corresponded to proteins within PULs, suggesting a potential shift in glycan degradation capacity in the AD-associated gut microbiome. These findings provide initial evidence that mucin degradation may be altered in neurodegenerative diseases and highlight specific microbial functions that warrant further investigation as potential modulators or biomarkers of disease progression.
Palavras-chave: mucin degradation; gut microbiota; metagenomics; neurodegenerative diseases
★ Running for the Qiagen Digital Insights Excellence Awards
#1126208

LOGICAL MODELING OF EPITHELIAL MESENCHYMAL TRANSITION AND ITS RELEATIONSHIP WITH PLURIPOTENCY

Autores: Gabriel Vitorello,Daner Acunha Silveira,José Carlos Merino Mombach,Pedro Lorenzoni
Apresentador: Gabriel Vitorello • gabrielvitorello.s@gmail.com
Resumo:
Cancer metastasis can be understood through the processes of epithelial-mesenchymal transition (EMT) and mesenchymal-epithelial transition (MET), which are essential for the migration and invasion capabilities of cancer cells. These processes are governed by intricate molecular signaling pathways that enable cancer cells to adapt and survive in various microenvironments. The role of pluripotency factors and microRNAs (miRNAs) in regulating EMT and MET remains an open question, as these factors could potentially influence the progression and reversibility of these transitions. A core regulatory network drives both EMT and MET, centered around transcription factors like ZEB1 (zinc finger E-box binding homeobox 1) and SNAI1 (zinc finger protein SNAI1). These factors orchestrate the expression of genes associated with mesenchymal and epithelial characteristics. EMT and MET are dynamic processes that are intrinsically linked to cellular reprogramming, as they influence the efficiency of inducing pluripotency through Yamanaka factors (OCT4, SOX2, KLF4, c-MYC). Notably, MET stabilizes the reprogrammed state by restoring epithelial traits necessary for maintaining pluripotency marker expression. This dual relationship between EMT and MET modulates both the efficacy and stability of cellular reprogramming. Despite the description of a linear progression from EMT to MET, key aspects of reversibility and feedback mechanisms remain unclear. To address this, we propose a Boolean gene regulatory network model that incorporates miRNAs such as miR145 and miR302-367, which play a crucial role in linking EMT/MET subnetworks. Our model employs asynchronous update dynamics to capture the reversibility of transitions induced by the overexpression of Yamanaka factors, using Monte Carlo simulations to evaluate the stability of attractors in the system.
Palavras-chave: EMT , MET, YAMANAKA , Boolean , Regulatory , Network , Dynamics
#1126218

Deep Learning for Antimicrobial Resistance Modeling: Calibrating Kinetic Parameter through Physical-Informed Neural Networks (PINNs)

Autores: Thaís Madruga de Oliveira Mendonça,Emanuelle Arantes,Gustavo Barbosa Libotte,Marisa Fabiana Nicolás
Apresentador: Thaís Madruga de Oliveira Mendonça • thaismm@posgrad.lncc.br
Resumo:
Mathematical and computational modeling is widely used to simulate real systems contributing to a greater understanding of phenomena and predicting their behavior. This approach is essential for tasks such as forecasting epidemics, understanding ecological dynamics, and so on. It typically involves developing a mathematical model to describe the phenomenon, and in some cases, the process of calibrating the model’s parameters. However, the estimation of parameters has significant challenges in modeling biological systems, mainly due to the scarcity of experimental data.
The rise of machine learning over the past decade has introduced powerful new tools for mathematical modeling, particularly in scenarios where the underlying physical laws are unknown, too complex to describe analytically, or when data is too noisy for traditional methods. Among these tools, Physics-Informed Neural Networks (PINNs) stand out by embedding the governing equations of a system directly into the structure or loss function of a neural network, enabling it to approximate numerical solutions while enforcing physical consistency.
In this work, we employ PINNs technique to address this challenge by calibrating a key parameter in an ordinary differential equation (ODE) that models a typical behavior in a antimicrobial resistance scenario over the time. Our approach combines PINNs with uncertainty quantification method. The PINNs framework simultaneously learns from artificial data while respecting the underlying physical laws by the ODE, where the parameter represents our target rate. This dual approach ensures that our model maintains physical plausibility while fitting the observed data. To implement this framework, we developed a neural network training pipeline comprising the following steps: (i) data prepossessing, including normalization and train/validation splitting; (ii) network architecture definition, embedding dense layers and physical constraints; (iii) hybrid loss function, combining data-driven loss and physics-based loss; (iv) optimization, using gradient-based methods; (v) uncertainty quantification, via Monte Carlo dropout; and (vi) model evaluation, using appropriate performance metrics.
The proposed framework has significant implications for pharmacokinetic modeling and antibiotic treatment optimization, where accurate estimation is critical for designing effective dosing regimens.
Palavras-chave: antimicrobial resistance, mathematical modeling, deep learning, calibration, uncertainty quantification
#1126219

Machine Learning as a Tool to Anticipate Gastrointestinal Outcomes in HTLV-1 Patients

Autores: Laryssa Bandeira de Melo Silva,Matheus Azevedo Bomfim,Gabriel Freitas Araújo,Marília Gabriela Barbosa da Silva,Patricia Muniz Mendes Freire de Moura,Paula M. Magalhães,José Anchieta de Brito,João Pacifico Bezerra Neto
Apresentador: Laryssa Bandeira de Melo Silva • laryssa.bandeira@upe.br
Resumo:
Human T-cell lymphotropic virus type 1 (HTLV-1) is responsible for causing a neglected infection associated with various clinical manifestations, including gastrointestinal disorders such as constipation, fecal incontinence, and abdominal pain. These symptoms are more common in patients with HTLV-1-associated myelopathy (HAM/TSP) and are related to the severity of the neurological condition, with individuals with HAM/TSP being more likely to present constipation compared to seronegative individuals. Thus, the early identification of patients at high risk of developing gastrointestinal complications is crucial for better clinical management. In this context, the present study aims to evaluate the use of machine learning techniques to predict gastrointestinal complications in HTLV-1 patients, seeking to identify individuals at higher risk early and improve clinical management. A total of 125 patients diagnosed with HTLV treated at the Infectious and Parasitic Diseases Service of the Oswaldo Cruz University Hospital (DIP-HUOC) were analyzed, including 80 with gastrointestinal symptoms and 45 without symptoms. Data collection included clinical information from medical records (neurological symptoms, gastritis, and dysphagia) and biological samples to assess the hematological profile of the patients, including platelets, MCH and basophils. The dataset was initially structured in a dataframe and then subjected to preprocessing steps, including data cleaning, normalization, and variable transformation. For the identification of relevant predictive features, statistical analyses such as Spearman, T-student, and linear regression were applied, alongside the SelectKBest technique. Subsequently, the Variance Inflation Factor (VIF) was computed to detect and reduce multicollinearity. The dataset was partitioned using the Leave-One-Out cross-validation approach. To mitigate issues related to class imbalance, random under sampler was implemented. After preprocessing, nine machine learning models were trained. Model performance evaluation was carried out using metrics such as accuracy, precision, recall, F1-score, ROC-AUC, and G-Mean. In the results obtained, the Random Forest (RF) and K-Nearest Neighbors (KNN) models showed the most balanced and reliable performance, with high metrics of accuracy, precision, recall, and specificity. RF stood out with an F1-score of 0.969, recall of 0.963, and G-Mean of 0.888, while KNN obtained a recall of 0.938 and ROC AUC of 0.900, demonstrating robustness and good generalization capacity — crucial aspects in clinical contexts. Although the Gradient Boosting (GB) model achieved a value of 1.0 in all metrics, this result is atypical and suggests possible overfitting. Thus, despite the perfect values, its performance should be interpreted with caution. In conclusion, this study demonstrates the potential of machine learning techniques in predicting gastrointestinal complications in patients infected with HTLV-1, highlighting their applicability in the presented clinical context. In this way, in clinical practice, the implementation of predictive models could aid in the early identification of patients at higher risk of developing gastrointestinal symptoms, even before the overt manifestation of these signs. Based on these insights, it becomes possible to adopt early therapeutic interventions, make nutritional adjustments and implement appropriate supportive strategies, which may significantly improve patient’s quality of life.
Palavras-chave: Complications, techniques, performance, symptoms
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126236

Multi-Omic Autophagy Signatures Reveal Prognostic Targets in Cancer

Autores: Victor Dos Santos Lopes,Emanuell Rodrigues de Sozua,Higor Almeida Cordeiro Nogueira,Enrique Medina-Acosta
Apresentador: Victor Dos Santos Lopes • victor94santos.lopes@gmail.com
Resumo:
Integrating multi-omic data is essential for elucidating the molecular mechanisms underlying oncogenesis, enabling the identification of genotypic and phenotypic patterns that support tumor progression, immune evasion, and functional heterogeneity. In this study, we investigated autophagy as a regulated cellular process with a direct impact on tumor progression and the dynamics of the immune microenvironment, aiming to identify gene signatures with prognostic and functional relevance in cancer. We adopted an integrative approach based on seven molecular layers, focusing on the construction of autophagy-related multi-omic signatures. Initially, 2,739 autophagy-related genes were compiled from specialized public databases. For integrative analysis, we used UCSCXenaShiny within the RStudio environment, accessing data from 33 TCGA cancer types and matching normal tissues from GTEx. The multi-omic analysis encompassed seven genotypic layers: copy number variation (CNV), DNA methylation, mRNA expression, miRNA expression, somatic mutations, protein abundance, and transcript levels. These variables were correlated with five major phenotypic profiles: tumor mutational burden (TMB), microsatellite instability (MSI), stemness scores, hazard ratio-based survival risk, and immune characteristics, including tumor-infiltrating immune cells (TILs) and tumor microenvironment composition. The analyses were organized into four stages: (i) Spearman correlation between genotypic and phenotypic variables, with Holm adjustment for multiple testing and differential expression analysis via Wilcoxon tests between tumor and normal tissues; (ii) Cox regressions to estimate hazard ratios for four survival metrics: overall survival (OS), disease-specific survival (DSS), progression-free interval (PFI), and disease-free interval (DFI); (iii) immune infiltration analysis based on correlations between gene expression and expression profiles of immune cells, with a functional categorization of genes as antitumoral, protumoral, or dual-role, and tumor classification into “hot,” “cold,” or “variable” immune phenotypes; (iv) Kaplan-Meier analysis stratifying patients by molecular alterations using the log-rank test. Following individual analyses, genes were grouped into composite signatures specific to each cancer type based on expression patterns, prognostic impact, and immune profile. Signature prioritized the genes with robust genotype-phenotype correlations, significant differential expression, and consistent effects across survival metrics. Each signature was further refined according to the immune functional profile of its genes and re-evaluated in a fresh round of multi-omic analysis. Final prioritization was conducted using a ranking system that integrated prognostic relevance, association with favorable tumor microenvironments, and cross-omic consistency. In total, 20,171 multi-omic signatures were generated, of which 6,192 showed significant clinical associations with at least one survival endpoint. Among them, 239 signatures were clinically consistent, showing significance across all four survival metrics. Of these, 23 were classified as risky in seven cancer types: cervical squamous cell carcinoma (CESC), kidney renal papillary cell carcinoma (KIRP), lower grade glioma (LGG), lung adenocarcinoma (LUAD), pancreatic adenocarcinoma (PAAD), prostate adenocarcinoma (PRAD), and stomach adenocarcinoma (STAD). Another 19 exhibited a protective profile in five types: breast invasive carcinoma (BRCA), KIRP, LGG, liver hepatocellular carcinoma (LIHC), and PAAD. These 42 signatures correlated with immunologically active (“hot”) tumor microenvironments, marked by infiltration of effector T cells and elevated expression of immunomodulatory markers. Such features reinforce their translational relevance as promising candidates for prognostic stratification and personalized immunotherapy in oncology.
Palavras-chave: Autophagy, Cancer bioinformatics, Multi-omics integration, Prognostic gene signatures, Tumor immune microenvironment
★ Running for the Qiagen Digital Insights Excellence Awards
#1126241

Unveiling Omicron’s Evolution in Espirito Santo through Bioinformatics and Phylogenetics in the Context of Genomic Surveillance

Autores: Juliana Santa Ardisson,Aura Marcela Corredor Vargas,Renata Torezani da Silva,Sandra Ventorin von Zeidle,Teodiano Freire Bastos Filho,Greiciane Gaburro Paneto
Apresentador: Juliana Santa Ardisson • julianasardisson@gmail.com
Resumo:
Since the emergence of SARS-CoV-2, genomic surveillance has been pivotal in tracking viral spread and identifying novel variants. This study aims to evaluate the genomic diversity of the Omicron variant in Espírito Santo, Brazil, using publicly available sequences from the GISAID database collected between November 2021 and March 2024. A total of 2,609 high-quality genomes were analyzed. Phylogenetic analyses were conducted using MAFFT for multiple sequence alignment and IQ-TREE for maximum likelihood inference with 1,000 bootstrap replicates. Genomic data were cross-referenced with patient metadata, including age and sex. Lineage identification was performed using Nextclade, and sublineages with fewer than 10 samples were grouped for clarity. A total of 21 major Omicron lineages were detected, including BA.1, BA.2, BA.5, BQ.1, and emerging lineages such as FE.1, JD.1, and XBB.1. The phylogenetic tree revealed significant genomic divergence among sublineages, suggesting independent evolutionary paths and selective pressures. The dominance of Omicron over Delta occurred rapidly after the initial detection of BA.1 in late November 2021. BA.1 remained predominant until April 2022, followed by BA.5 and subsequently BQ.1. From early 2023 onward, multiple lineages circulated concurrently, reflecting a complex viral landscape. Epidemiological profiling revealed a higher detection rate in female patients (59%) and a predominant age range between 20 and 59 years. Notably, several sublineages showed increased prevalence among elderly individuals, emphasizing potential immunological heterogeneity and highlighting the importance of continuous surveillance. These findings underscore the critical role of bioinformatics tools in real-time monitoring of SARS-CoV-2 evolution and support their application in shaping effective public health strategies.
Palavras-chave: SARS-CoV-2, Omicron variant, Genomic surveillance, Phylogenetic analysis, Lineage diversity
★ Running for the Qiagen Digital Insights Excellence Awards
#1126250

Identification of Distinct Structural Conformations of PPARγ from Cryo-EM Maps Using a Shotgun Molecular Dynamics Flexible Fitting and Normal Mode Analysis (MDFF_NM) Approach

Autores: Layla Alves Rodrigues da Silva,Roberto carlos navarro quiroz,SIMONE QUEIROZ PANTALEAO,David Perahia,Yolanda M. B. Marcello,Ana Ligia Scott
Apresentador: Layla Alves Rodrigues da Silva • layla.laah@gmail.com
Resumo:
Peroxisome proliferator-activated receptors, also known as PPARs, are responsible for regulating gene expression and participating in activities related to inflammation, lipids, and glucose. In this work, we apply a shotgun molecular dynamics flexible fitting and normal mode analysis methodology (MDFF_NM) to explore the conformational landscape of peroxisome proliferator-activated receptor γ (PPARγ) based on information from the cryo-electron microscopy (cryo-EM) density maps. MDFF_NM is a multi-replica strategy in which independent simulations sample diverse transition pathways from an initial to a target conformation by combining the restraining potential of MDFF with intrinsic harmonic motions derived from normal modes. This dual-constraint framework not only yields high-quality flexible fits into experimental densities but also generates a plausible ensemble of intermediate meta-states. We have implemented clustering of replica trajectories to identify predominant conformational basins and used the mean density of each cluster to refine fitting performance. Preliminary results demonstrate that (i) clustering increases convergence of flexible fitting and reduces overfitting in highly mobile regions, (ii) fitting to mean maps of top-ranking clusters produces atomic models with improved cross-validation scores, and (iii) combining meta-states with elevated Q-scores into consensus density maps enables de novo reconstruction of flexible loops that are otherwise unresolved. The success of MDFF_NM in recovering multiple accessible meta-states of PPARγ highlights its utility not only for structural refinement but also for mapping dynamic ensembles that underpin receptor function and ligand specificity. This approach promises to advance our understanding of allosteric regulation in nuclear receptors and to facilitate structure-based drug design targeting conformationally heterogeneous proteins.
Palavras-chave: PPAR? Cryo-EM Molecular Dynamics Flexible Fitting (MDFF) Normal Mode Analysis (NMA) MDFF_NM Structural conformation Meta-states Nuclear receptors Flexible fitting Conformational dynamics
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126262

Genomic Surveillance of SARS-CoV-2 BQ.1.1 in Brazil Using Nanopore Sequencing and Bayesian Phylogeography

Autores: Juliana Santa Ardisson,Mariane Monfardini,Brena Ramos Athaydes,Renata Torezani da Silva,Aura Marcela Corredor Vargas,Liliana Spano,Edson Oliveira Delatorre,Greiciane Gaburro Paneto,Sandra Ventorin von Zeidle,Teodiano Freire Bastos Filho
Apresentador: Juliana Santa Ardisson • julianasardisson@gmail.com
Resumo:
The post-Omicron period marked a shift in SARS-CoV-2 transmission dynamics, driven by the highly transmissible BQ.1.1 lineage. This study investigated the phylogenetic structure and genomic dispersion of BQ.1.1 in Espírito Santo, Brazil, using high-throughput sequencing and computational methods. Twenty-one local samples with ≥90% genome coverage, collected between November and December 2022, were sequenced using Oxford Nanopore Technology following the ARTIC protocol with customized primers to enhance Omicron genome recovery. Raw reads were processed using Guppy for basecalling, followed by consensus assembly and quality filtering through the ARTIC bioinformatics pipeline. Lineage assignment was performed using Pangolin and Nextclade, while sequences with high similarity to local genomes were retrieved from GISAID using the Audacity tool, resulting in a representative dataset of 4,627 sequences. Multiple sequence alignment was conducted with MUSCLE, and maximum likelihood trees were built in IQ-TREE (TIM+F+R2 model) with 1,000 bootstrap and SH-aLRT replicates. Bayesian phylogeographic reconstruction was carried out using BEAST v1.10 with a relaxed log-normal molecular clock and non-parametric skyline coalescent model. Posterior distributions were validated using Tracer, with ESS > 200. The resulting trees revealed low genetic diversity among BQ.1.1 sequences, forming star-like topologies consistent with rapid global spread and limited evolutionary divergence. Key transmission clusters presented recent tMRCAs, concentrated between early October and mid-November 2022. Despite limited phylogenetic resolution in some branches, high overall bootstrap support confirmed the robustness of major clades. This study highlights the utility of integrated bioinformatics workflows for real-time variant monitoring and demonstrates how rapid lineage turnover can challenge cluster resolution. It reinforces the need for high-quality sequence data and adaptive surveillance frameworks during fast-evolving pandemic phases.
Palavras-chave: SARS-CoV-2, Omicron variant, BQ.1.1, Genomic surveillance, Phylogenetic analysis, Lineage diversity
★ Running for the Qiagen Digital Insights Excellence Awards
#1126274

Incorporating Receptor Flexibility in DockThor: Development and Evaluation of Ensemble Docking Approaches

Autores: Ana Luiza Martins Karl,Aaron Leao,Dr. Laurent Emmanuel Dardenne
Apresentador: Ana Luiza Martins Karl • almkarl@posgrad.lncc.br
Resumo:
Molecular recognition is a fundamental process in biological systems, primarily governed by induced fit and conformational selection theories. These theories describe how ligands interact with dynamic receptor structures to achieve specificity and affinity. Molecular docking is a computational technique that predicts the preferred orientation and binding affinity of a ligand to its biological target. It plays a crucial role in understanding and modeling molecular recognition processes. However, accounting for receptor flexibility remains a major challenge in docking methodologies, even among the most advanced approaches in the field. Based on the conformational selection theory, the Ensemble Docking technique offers an effective solution by efficiently representing protein structural flexibility, especially when experimental data on conformational changes are available.
DockThor is a protein-ligand docking software developed by Brazilian students and researchers associated with Grupo de Modelagem Molecular de Sistemas Biológicos at the Laboratório Nacional de Computação Científica (GMMSB, LNCC/MCTI). It is freely available via a web server (www.dockthor.lncc.br) and has established itself as a competitive tool in protein-ligand docking studies, especially for highly flexible ligands. DockThor demonstrates robust performance in re-docking, peptide docking, and cross-docking experiments, with results comparable to the most widely used tools in the literature. However, DockThor has traditionally treated the receptor as a rigid structure.
Our previous studies have shown that applying the ensemble docking strategy improves the accuracy of DockThor in cases where protein flexibility is crucial for the molecular recognition between ligand and receptor. This work presents recent advancements in DockThor by integrating the ensemble docking technique into experiments using different approaches. The goal is to enhance docking accuracy in scenarios where protein flexibility is critical. We evaluate two strategies:(i) Combined Grid (CG), which generates a unified potential grid by linearly combining grids from multiple protein conformations, and (ii) Multiple Grids (MG), which docks ligands independently against grids from different conformations. Validation experiments demonstrate that both strategies maintain or improve docking accuracy while significantly reducing computational costs. These enhancements improve the applicability of DockThor in more realistic molecular recognition experiments, reinforcing its value as a tool for drug discovery and computational biology. These new features will increase the capability of DockThor to model molecular recognition more accurately and enhance pose generation.
Palavras-chave: Molecular docking, Receptor flexibility, Ensemble Docking, DockThor
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126283

Genetic Biomarkers in the Exome of Brazilian COVID-19 Patients and Their Correlation with Disease Severity.

Autores: Vitor Gregorio,Caroline Viana Bastos,Robert de Souza Costa,Kamila Peronni,Isabela Medeiros de Oliveira,Adriane Feijó Evangelista,Luis Gustavo Morello,David Figueiredo
Apresentador: Vitor Gregorio • vitor-gregorio@hotmail.com
Resumo:
The COVID-19 pandemic, caused by the SARS-CoV-2 virus, has generated significant concern due to its rapid transmission and the increasing number of cases and deaths. In response, there has been a major scientific advancement in the search for solutions for the development of therapies and vaccines. Recent studies indicate that human genetic variability can partly explain the differences in susceptibility to and progression of the disease. Through whole-exome sequencing (WES), it is possible to identify rare genetic variants associated with COVID-19. In a study by Benetti et al. (2020), it was concluded that the identification of rare and common variants could aid in determining the clinical evolution of COVID-19. In 2021, the Guarapuava Cancer Research Institute (IPEC), in partnership with UNICENTRO, UEL, and UFPR, performed whole-exome sequencing (WES) on 158 participants with COVID-19 to investigate the association of variants in the inflammasome pathway and how these polymorphisms might influence the course of COVID-19. All patients answered a questionnaire related to their clinical status to record their medical history and were divided into three groups according to COVID-19 severity (mild/asymptomatic, moderate, or severe). The exome analysis of a Brazilian patient cohort aims to elucidate the underlying genetic determinants of COVID-19 pathogenesis, with the goal of providing relevant information for the development of precision medicine strategies potentially applicable to other viral diseases. Sequencing data in FASTQ format were used. Sequencing quality was checked using the MultiQC tool, and subsequently, the data were aligned to the reference genome using the BWA-MEM software. With the aligned exomes, variant calling was performed using the GATK tool, specifically the HaplotypeCaller algorithm, which identifies SNPs and indels. Based on the identified genetic variants and the previously filtered clinical history spreadsheet with consistent sociodemographic data, neural network approaches were employed to predict the patient group according to the response to infection based on their variants and clinical history, with the aim of identifying high-risk patients, as well as detecting genetic biomarkers associated with a higher risk of exacerbated responses. Finally, we developed a pipeline that enables the reproducibility of the research and can be replicated for the study of other diseases, enabling early diagnoses. The data obtained allowed for the evaluation and classification of patients according to the evaluated criteria.
Palavras-chave: COVID-19, Whole-Exome Sequencing, Genetic Biomarkers, Neural Networks
★ Running for the Qiagen Digital Insights Excellence Awards
#1126284

Evaluation of Clustering Algorithms for Metagenomic Binning

Autores: Pedro Barría Valdebenito,Marco Mora,Sara Cuadros Orellana
Apresentador: Pedro Barría Valdebenito • pedro.barria.valdebenito@gmail.com
Resumo:
Binning is a fundamental step in workflows for metagenomic data analysis, aiming to group DNA sequences—typically assembled contigs—into representative taxonomic units known as bins. These bins attempt to reconstruct, as accurately as possible, individual genomes or coherent genomic groups, enabling a more precise characterization of the organisms present in complex microbial communities.
One of the most popular tools for performing metagenomic binning is MaxBin2, which has been widely adopted due to its strong performance across a variety of datasets. MaxBin2 operates based on the Expectation–Maximization (EM) algorithm, a probabilistic method capable of modeling complex data distributions using features such as GC content and contig coverage across multiple samples. This capability allows MaxBin2 to effectively assign sequences to bins, which has established its use in numerous studies. However, despite its accuracy, the EM algorithm used by MaxBin2 entails a high computational cost.
In this work, we propose a comparative evaluation of different clustering algorithms specifically applied to the problem of metagenomic binning. In particular, we analyze the performance of Self-Organizing Maps (SOM), k-means, EM, and K-Medoids. These methods were selected due to their methodological diversity, ranging from centroid-based approaches to techniques inspired by artificial neural networks. The analysis focuses on two key aspects: the accuracy of the resulting bins and the computational time required by each algorithm under controlled experimental conditions.
To perform this evaluation, we used standardized synthetic datasets that follow the benchmarking specifications proposed by CAMI (Critical Assessment of Metagenome Interpretation), which enables objective comparisons across algorithms. Our results suggest that some lighter-weight clustering methods can achieve performance levels comparable to EM, while requiring significantly fewer computational resources. This represents a viable alternative for optimizing large-scale metagenomic workflows.
Palavras-chave: Metagenomic binning, Clustering algorithms, Bins
#1126289

Prediction and Characterization of Cancer Stem Cells in PDAC Through Multiomics Analysis

Autores: Renan de Lima Santos Simões,Murilo Henrique Anzolini Cassiano,Tathiane Maistro Malta
Apresentador: Renan de Lima Santos Simões • renan.lima.simoes@gmail.com
Resumo:
Cancer stem cells (CSCs) are a subpopulation of tumor cells with self-renewal and differentiation capabilities, strongly implicated in metastasis, therapy resistance, and disease progression. Here, we aimed to identify CSCs in pancreatic ductal adenocarcinoma (PDAC) and develop a predictive model to evaluate tumor stemness across different datasets and platforms. We analyzed publicly available single-cell RNA-seq (scRNA-seq) data from primary PDAC samples. Using a combination of cell type identification algorithms, differential gene expression analyses, and curated CSC markers (e.g., EPCAM, PROM1, DCLK1, ABCB1, CD24), we annotated a subpopulation of tumor cells with a CSC-like profile. Based on the transcriptional signature of these cells, we trained a machine learning model using One-Class Logistic Regression (OCLR) to compute a Cancer Stem Cell Stemness Index (CSCsi), representing the similarity of samples to the CSC transcriptional state. Applying the CSCsi model to bulk RNA-seq data from pancreatic tumors in The Cancer Genome Atlas (TCGA-PAAD), we found that CSCsi levels stratified tumor samples by histological grade, with higher CSCsi values associated with more aggressive tumors. Kaplan–Meier survival analysis revealed that patients with high CSCsi had significantly worse overall survival. Furthermore, continuous CSCsi values showed a negative correlation with survival outcomes, supporting the prognostic relevance of tumor stemness. Gene expression correlation analysis identified 165 genes positively associated with CSCsi in the TCGA cohort. Among these, eight genes were significantly linked to overall and/or progression-free survival, suggesting their potential as novel biomarkers or therapeutic targets in stemness-driven PDAC. We further validated our CSCsi model using scRNA-seq data from both primary and metastatic PDAC samples. Notably, metastatic tumor cells (liver lesions) exhibited significantly higher CSCsi values compared to primary tumors, consistent with the well-established role of CSCs in metastasis. Within these datasets, CSCsi scores were significantly enriched in ductal tumor cells compared to adjacent normal or immune cells, underscoring the model’s ability to distinguish malignant CSCs from the tumor microenvironment. Finally, we applied the CSCsi model to spatial transcriptomics data from PDAC samples. Regions with elevated CSCsi overlapped with areas enriched for canonical cancer-associated pathways. This spatial association reinforces the biological relevance of stemness within the tumor ecosystem and highlights the utility of our model in spatially resolving aggressive tumor niches. Altogether, our integrative approach combining single-cell, bulk, and spatial transcriptomic data enabled the robust identification and characterization of CSCs in PDAC. The CSCsi model provides a valuable tool for assessing tumor aggressiveness and may inform patient stratification, prognostic, and the development of stemness-targeted therapies.
Palavras-chave: Cancer stem cells, Pancreatic ductal adenocarcinoma, Stemness, Single-cell RNA-seq, Machine learning
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126309

Evaluating the influence of hydrophobicity and electrostatic forces at the interaction interface of apolipoprotein E and the beta-amyloid peptide

Autores: Mateus Veiga de Araújo,Mychael Vinícius da Costa Lourenço,Pedro Henrique Monteiro Torres
Apresentador: Mateus Veiga de Araújo • mateusveiga@biof.ufrj.br
Resumo:
The apolipoprotein ε4 (ApoE4) allele is a major risk factor for sporadic Alzheimer's disease (AD) and was shown to promote amyloid-β (Aβ) accumulation and mediate pathophysiological processes in AD. Although the molecular interaction between Aβ and ApoE has been acknowledged, the precise nature of this interaction remains unclear. This study aims to explore the biophysical and biochemical nature of the interaction between Aβ and the ε3 and ε4 isoforms of ApoE. We initially reverted 5 point mutations to generate the original ApoE3 structure from its full-length structure (PDB: 2L7B) using the Swiss-Model software. We then inserted the C112R mutation to generate ApoE4. We further used a previously published open-state structure of ApoE4. Three distinct Aβ42 structures (PDB: 1IYT, 6SZF, 1Z0Q) were used. Docking preparations were performed using ChimeraX with PDB2PQR and APBS used to generate electrostatic surfaces. ClusPro and Haddock software were used to perform rigid and flexible molecular docking, respectively. This study performed nine docking simulations in total, selecting six suitable complexes for 55 μs coarse-grained molecular dynamics in triplicate using GROMACS. After evaluating stability, the coarse-grained structures will be converted to all-atoms for 1.5 μs molecular dynamics to investigate the hydrophobicity and electrostatic contributions for the protein-protein interaction. The Aβ42 structure that emulates a membrane-associated environment (1IYT) provided the best docking poses, followed by the pre-transition to β-sheet structure (1Z0Q). The “intermediate structure” (6SZF), a stage between the two other structures, resulted in unspecific docking. Coarse-grained molecular dynamics from the promising structures demonstrated two main interfaces of ApoE-Aβ42 interaction, with differences in stability between ApoE3 and ApoE4. All-atoms molecular dynamics are being carried out to elucidate the different behavior and stability at pH 7.4. We predict two key interaction interfaces between Aβ42 and ApoE isoforms. This interaction is influenced by the presence of phospholipids, which appear to compete with Aβ42 for interactions with ApoE. Further results are expected to elucidate molecular details of this interaction and hold the potential to consolidate our understanding of AD pathophysiology.
Palavras-chave: Alzheimer’s Disease, ApoE, Amyloid-beta, Molecular Docking, Molecular Dynamics
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126311

Absence of Functional CRISPR-Cas Systems in Staphylococcus aureus Isolates from Brazil and Its Implications for Phage Therapy

Autores: ISRAEL SANTOS DA SILVA,ANNA CAROLINA SOARES ALMEIDA,João Pacifico Bezerra Neto
Apresentador: ISRAEL SANTOS DA SILVA • israel.santoss@upe.br
Resumo:
Staphylococcus aureus is a Gram-positive bacterium of high relevance to both human and veterinary medicine. It is the main etiological agent of bovine mastitis, causing significant impacts on the dairy industry and leading to various economic losses. S. aureus has been classified by the World Health Organization as a high-priority pathogen and is part of the ESKAPE group, which includes the main etiological agents of infections caused by multidrug-resistant pathogens. Due to the growing inefficacy of conventional antimicrobials, new treatment strategies have been developed to combat this pathogen, among which phage therapy stands out. This approach uses bacteriophages as therapeutic or prophylactic agents. However, bacterial defense mechanisms specialized in antiviral protection pose a potential challenge to the development of this strategy. An example is the CRISPR system, which functions as an immune memory against viral infections, providing acquired immunity against foreign nucleic acids. The aim of this study was to assess the CRISPR systems of Staphylococcus aureus isolates from Brazil, based on data available in the NCBI database, and to evaluate the presence of prophages in their genomes. The online tool CRISPRCasFinder was used to identify CRISPR sequences, while PHIGARO v2.4.0 software was employed for prophage analysis. As a result, 32 isolates from various sources (human, bovine, caprine, and swine) in ten Brazilian states were identified, belonging to distinct sequence types (STs) and clonal complexes (CCs). The results showed that none of the isolates possessed a complete and functional CRISPR system. Absence of spacers and coding sequences for Cas proteins were observed. All identified CRISPR systems were classified as level 1 evidence, indicating a low probability of functional activity. Regarding the prophage analysis, only lysogenic viruses from the Siphoviridae family were detected. Lytic viruses are preferred for phage therapy due to their higher infectivity and bacteriolytic capacity. Reduced interaction between these viruses and the bacterial host may indicate greater susceptibility of S. aureus to phage action. Nevertheless, the challenge of identifying effective bacteriophages against S. aureus remains. Despite the high genetic diversity observed—evidenced by the different STs and CCs and their diverse evolutionary backgrounds—the absence of functional CRISPR elements was a consistent feature across all isolates. These findings suggest that, throughout the evolution of this species, alternative defense mechanisms against bacteriophages may have been selected over the CRISPR-Cas system. Among these alternatives, molecular changes in core metabolic genes that result in structural modifications of the cell wall—thus preventing viral adsorption—have been highlighted in previous studies. A deeper investigation into these alternative mechanisms, including their mode of action and degree of conservation, could provide valuable insights into phage resistance in S. aureus, and contribute to the development of phage therapy as a viable treatment strategy against infections caused by this pathogen.
Palavras-chave: bacterial imunity, defense systems, bacteria
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126317

RNAseq evidence support for gene prediction: a case study in a non-model bee species

Autores: Francisco do Rego,Felipe Cordeiro Dias,Maria Cristina Arias
Apresentador: Francisco do Rego • dorego.francisco@usp.br
Resumo:
High-quality genome annotations are essential for understanding the genetic basis of traits related to behavior and ecological adaptation. This is particularly critical for non-model species, where limited genomic resources hinder the discovery of lineage-specific gene functions. Integrating RNA-seq data into annotation pipelines has been shown to improve gene model accuracy. Oxytrigona tataira, a stingless bee from the tribe Meliponini, exhibits a unique defensive behavior: spitting a toxic secretion when threatened. Investigating the genetic basis of this behavior requires precise gene predictions that capture both conserved and species-specific genes. In this study, we compare genome annotations generated with and without RNA-seq data to assess the impact of transcriptomic evidence on gene prediction accuracy. By focusing on O. tataira, we emphasize the importance of incorporating expression data into annotations to enable downstream analyses of ecological and behaviorally relevant genes in non-model bees. We generated two genome annotations for O. tataira, one integrating RNA-seq data and the other without transcriptomic support. The genome used was a purged assembly generated from long-read sequencing. For the RNA-seq-supported annotation, paired-end RNA-seq libraries were aligned to the genome using HISAT2 (v2.2.1) with default settings. The aligned reads were processed into BAM format and used as input for BRAKER3 (v2.1.6), run in ETP mode, which integrates protein homology (via Arthropoda proteins from OrthoDB) and RNA-seq from O. tataira samples as evidence for gene prediction. The second annotation was generated by running BRAKER in EP mode, using only the protein database without RNA-seq data. Gene model quality was assessed using BUSCO (v5.4.6) with the hymenoptera_odb10 dataset. We compared the number of predicted genes, completeness scores, and gene structure features (e.g., exon number, its length distributions and isoforms) between the two annotations. The RNA-seq-supported annotation (ETP mode) predicted 14,091 genes, consistent with bee genome expectations (~ 12,000–13,000), while the annotation without RNA-seq (EP mode) predicted 27,894, suggesting overprediction, due to pseudogenes or artefacts. Gene model completeness improved from 94.0% in EP to 98.0% in ETP. Structurally, the RNA-seq-supported annotation had a higher average number of exons per gene (8.3 vs. 4.2), slightly longer (337 bp vs. 298 bp), and a modest increase in isoforms per gene (1.29 vs. 1.07), and in the mean size of the genes by 1,5x, supporting the generation of more complete and realistic models. These improvements indicate that RNA-seq data enhance annotation accuracy, reducing gene fragmentation and overprediction while capturing splicing complexity. Oxytrigona tataira displays a highly specific and poorly studied behavior, and as a non-model organism, high-quality genome annotations are essential to uncover the genetic and molecular underpinnings of such distinctive traits. These annotations can ultimately help reveal the evolutionary pathways linked to this unique behavior, highlighting the broader value of integrating transcriptomic data in studies of understudied species.
Palavras-chave: RNAseq, Genome Annotation, Stingless bees, Oxytrigona tataira
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126318

Tumor Microenvironment Heterogeneity in High-Grade Serous Ovarian Cancer and It's Relationship with Treatment Resistance

Autores: Gabriela Rapozo Guimarães,Nayara Gusmão Tessarollo,Alessandra Serain,Diego Gomes,Luciana de Castro Moreeuw,Cláudia Bessa,Andreia Melo,João Viola,Mariana Boroni
Apresentador: Gabriela Rapozo Guimarães • gabrielarapozog@gmail.com
Resumo:
Ovarian cancer is the deadliest gynecological malignancy among women, with high-grade serous ovarian carcinoma (HGSOC) being its most frequent and aggressive histological subtype. In Brazil, it leads to over 3,000 deaths annually, primarily due to late diagnosis and a lack of effective early detection strategies. Given HGSOC’s pronounced molecular and cellular heterogeneity, understanding the tumor microenvironment (TME) is essential for uncovering mechanisms of therapy resistance and guiding personalized treatment approaches. In this study, we performed single-nucleus RNA sequencing (snRNA-Seq) on fresh-frozen tumor samples from six HGSOC patients undergoing adjuvant chemotherapy, three with favorable and three with unfavorable treatment responses. Nuclei were isolated, fixed, and barcoded using the 10x Genomics Chromium Next GEM Single Cell 3’ v3.1 platform, followed by Illumina NovaSeq 6000 system sequencing. Raw sequencing reads were mapped with CellRanger (GRCh38 reference genome), and count matrices were analyzed in R using the Seurat package. Pre-processing steps included mitochondrial content filtering and doublet removal using the scDblFinder package. We benchmarked state-of-the-art data integration tools, including Harmony, Scanorama, scVI, and scANVI to address batch effects and maximize integration accuracy with the de scib package. Based on biological conservation scores and clustering metrics, scANVI outperformed the other methods, preserving cell identity while minimizing batch artifacts. Thus, scANVI was selected for downstream integration and analysis. Clustering was performed using the Leiden algorithm (resolution = 0.3), followed by UMAP for dimensionality reduction and visualization. Manual cell-type annotation was based on canonical markers and differentially expressed genes identified using the MAST. From ~60,000 input nuclei, 40,730 passed quality control filters and were retained for analysis. Integration identified seven major cellular compartments, including malignant epithelial cells, immune cells, fibroblasts, vascular and lymphatic endothelial cells, as well as ovarian and fallopian tube-derived populations. Notably, malignant cell subpopulations displayed distinct transcriptional signatures and copy number variation (CNV) profiles between responders and non-responders. CNV scores and Reactome pathway enrichment analysis further highlighted differential biological programs linked to therapeutic response. Our single-nucleus transcriptomic approach reveals key features of the HGSOC microenvironment associated with chemoresistance. Future directions include expanding the patient cohort and incorporating spatial transcriptomics (Xenium platform) to map spatial context and cell-cell interactions, aiming to inform more personalized and effective therapeutic strategies.
Palavras-chave: scRNA-Seq; Ovarian Cancer; Chemoresistance
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126320

Degradation of methylnaphthalenes by shrimp (Penaeus vannamei) laccase: insights from fungal laccase binding site comparison and molecular docking

Autores: FRANCISCO DANILO MORAIS DA SILVA (1985563),Pablo Andrei Nogara,Diego de Souza Buarque,Rogério de Aquino Saraiva
Apresentador: Rogério de Aquino Saraiva • rogerio.saraiva@ufca.edu.br
Resumo:
Polycyclic aromatic hydrocarbons (PAHs) are persistent pollutants that severely threaten ecosystems and human health due to their toxicity, mutagenicity, and ability to bioaccumulate. Overcoming the environmental persistence of PAHs remains a major scientific challenge. Among promising biotechnological solutions, laccases—multicopper oxidases found across fungi, plants, bacteria, and arthropods—stand out for their remarkable ability to degrade a wide range of organic pollutants. In this context, the shrimp (Penaeus vannamei) laccase (Pv-LAC) emerges as a promissor candidate for environmental bioremediation, yet its molecular behavior against PAHs remains poorly understood. Here, we investigated the structural and functional properties of Pv-LAC in degrading 1-methylnaphthalene and 2-methylnaphthalene, using protein modeling, molecular docking, and comparative binding site analysis with fungal laccases. Protein structures were generated via Swiss-Model and AlphaFold3 pipelines. Comparative validation demonstrated the superior quality of the AlphaFold3 model, evidenced by a lower MolProbity score (1.34), reduced clash score (4.59), and a higher percentage of residues in favored regions of the Ramachandran plot (97.45%). This model was selected for docking studies. Docking simulations revealed that Pv-LAC binds methylnaphthalenes through a network of stabilizing interactions: π-cation contacts with Arg207, π-anion interactions with Asp214, π-π stacking with Phe401, and van der Waals contacts with His208, Asn264, Asp324, Val426, Asn427, Glu497, and Leu573. These interactions align the ligands close to His576, a residue coordinating the T1 copper, essential for catalytic oxidation. Comparison with fungal laccases (Trametes versicolor and Trametes trogii) showed conservation of critical catalytic residues, particularly histidines and cysteines involved in copper coordination. However, Pv-LAC exhibited significant structural variations compared to fungal enzymes, suggesting evolutionary adaptations that may influence substrate affinity and catalytic performance. Multiple sequence alignment supported these observations, revealing conserved, conservative, and semiconservative regions that could impact enzyme flexibility, substrate binding, and environmental stability. Interestingly, docking analysis indicated that 2-methylnaphthalene achieves a more stable binding orientation compared to 1-methylnaphthalene, supported by additional T-shaped π-π interactions involving Leu263 and Phe575. These findings align with previous experimental observations of higher degradation rates for 2-methylnaphthalene, highlighting the predictive power of the computational approach. Altogether, this study offers new molecular insights into the environmental potential of shrimp laccase for PAH degradation. By unveiling key structural features and binding mechanisms, we contribute to the broader understanding of arthropod-derived laccases as valuable tools for sustainable bioremediation strategies in contaminated marine environments.
Palavras-chave: Bioremediation, protein modeling, molecular ecotoxicology
★ This work is running for the Next Generation Bioinfo Award
#1126326

Metagenomic Analysis in Investigating the Role of Gut Microbiota Composition in Patients with Crohn's Disease and Ulcerative Colitis

Autores: Bianca Cristiane Ferreira Santiago,João Vitor Ferreira Cavalcante,Gleison Medeiros De Azevedo,Tibério Azevedo Pereira,JACINTA CEZAR,Epitácio Dantas de Farias Filho,Julia Apolonio de Amorim,Rodrigo Juliani Siqueira Dalmolin
Apresentador: Bianca Cristiane Ferreira Santiago • bianca.santiago72@gmail.com
Resumo:
It is widely accepted within the scientific community that the human body hosts more bacterial cells than human cells. However, only with the advent of next-generation sequencing technologies has it become possible to conduct large-scale investigations into the role of the microbiota in human health. Crohn’s Disease (CD) and Ulcerative Colitis (UC) are classified as Inflammatory Bowel Diseases (IBD), characterized by symptoms such as abdominal pain, diarrhea, weight loss, anemia, and fatigue. CD is distinguished by transmural inflammation of the gastrointestinal tract, affecting all layers of the intestinal wall from the mouth to the perianal region, whereas UC primarily affects the colonic mucosa. IBD diagnosis is often delayed and primarily relies on laboratory tests and imaging techniques. Most IBD patients exhibit chronic dysbiosis. The main objective of this study is to identify the key differences and compositional patterns of the microbiota in patients diagnosed with IBD and healthy individuals, through metagenomic analysis and machine learning techniques. It is expected that, based on this characterization, it will also be possible to identify potential disease subgroups and eliminate confounding factors that may interfere with diagnosis, such as age, sex, ethnicity, geography, and diet. Using publicly available shotgun sequencing data from the iHMP Project, analyses are being conducted focusing on the taxonomic classification and functional annotation of organisms identified in samples from individuals with and without IBD. These analyses are performed using the Euryale pipeline, which provides the necessary tools to identify and classify microbial communities. Through taxonomic classification, we investigated the abundance, diversity, richness, and interactions of microorganisms in each sample. Initial results showed that diversity (Shannon index) was highest in individuals with ulcerative colitis (UC), followed by the control group (nonIBD), and lowest in individuals with Crohn’s disease (CD). However, the Wilcoxon test indicated that the comparison between CD and nonIBD showed greater statistical significance than the comparison between CD and UC. This result may be explained by the lower variability and reduced overlap between the diversity values of CD and nonIBD, making these groups more distinct in terms of distribution. In contrast, greater data dispersion and overlap between CD and UC reduced the observed statistical significance. Thus, the partial results indicate no significant difference in Shannon diversity between UC and nonIBD, while comparisons between CD and UC, as well as CD and nonIBD, revealed statistically significant differences. These findings suggest that ulcerative colitis largely preserves the microbial diversity observed in healthy individuals, whereas Crohn’s disease is associated with a more pronounced reduction in microbiota diversity, indicating a potential microbial imbalance characteristic of this condition. Subsequently, machine learning methods such as Random Forest will be employed to explore patterns and associations between microbial compositions and IBD. This approach is expected to reveal specific taxonomic and functional shifts in the microbiota associated with the diseases, offering new insights into the role of these microorganisms in the onset and progression of Crohn’s Disease and Ulcerative Colitis.
Palavras-chave: Shotgun Metagenomics, Inflammatory Bowel Diseases, Crohn’s Disease, Ulcerative Colitis, Machine Learning
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126347

Cuticular color variation in Solenopsis saevissima (Smith, 1855) (Hymenoptera, Formicidae) in distinct soil usage areas

Autores: Nathalia Sampaio da Silva,David Aciole Barbosa,Victor Hideki Nagatani,Vitória Pereira Piconi,Bruno de Faria Assis,Juliana Aparecida Calisto Vaz,Brenda Miranda de Moraes,Otávio Guilherme Morais Da Silva,Gabriela Procópio Camacho,Maria Santina de Castro Morini
Apresentador: Nathalia Sampaio da Silva • nathy-sam@hotmail.com
Resumo:
Cuticular coloration of insects has biological importance for these organisms, such as camouflage, UV protection and thermoregulation. Although this subject has gained prominence in recent years, there are still gaps in knowledge, especially for ants. In this sense, we sought to extract and quantify the cuticular coloration of the species Solenopsis saevissima (Smith, 1855), in different soil usage areas (e.g.: forest, crop and urban) in the state of São Paulo, Brazil. S. saevissima belongs to the group of fire ants, known for causing impact on public health, when its sting causes allergic reactions. We built a R notebook workflow based on R and python languages to analyze high resolution images (one for each specimen), from a multifocus overlay system, of frontal view of the specimens' heads. Specimens belong to biological collections from different institutions (Hymenoptera Collection of the Zoology Museum, Padre Jesus Santiago Moure Entomological Collection, UFV Entomology Museum and Silvia Sayuri Suguituru Reference Collection). For strictly well-preserved specimens’ positions, previous studies apply machine learning to analyze ant morphology and color on extremely homogeneous, selected datasets. Our images, otherwise, presented heterogeneous positions, thus, underwent prior treatment to select the ROI (“region of interest”): (i) remove background, with the aid of rembg Python module, which has high precision to discriminate background of any image; (ii) manual refinement to remove residual structures (e.g., legs, gaster and dorsal region) and non-target elements (e.g., scale bars), with GIMP software. Color-based image segmentation was conducted with recolorize R package into a single cluster to obtain the average color of ant heads in the RGB colorspace (Red, Green, Blue) system, later converted to the HSV (Hue, Saturation, Value) system with grDevices package. Principal component analysis (PCA) analysis of HSV variation, using stats and ggplot2 packages, explained most of the intraspecific color variation of S. saevissima collected in São Paulo State with first two dimensions. The first dimension highlights H, S and V; ants’ heads range from light, high-saturated-caramel to dark, low-saturated-cocoa brown tint. In the second dimension, H and S contrapose V, with ants ranging from dark to light. Further approaches will investigate the possible causes of this variation, which may be associated with abiotic variables, i.e., temperature, canopy cover and UV in the different areas sampled. It is known that some ants have different coloration due to thermal melanism and/or photoprotection. Understanding if and how color shapes ant communities, will potentially aid further studies of its biology, conservation and public policies.
Palavras-chave: colorimetry analysis, colour pattern, coloration variability
#1126371

Identification of clinically relevant variants from Whole Exome Sequencing of Hereditary Breast and Ovarian Cancer syndrome probands

Autores: Manoela Laender Recife,Marcus Vinícius Gonçalves Antunes,Thalia Queiroz,Luciana Lara dos Santos,Eduardo Martin Tarazona Santos
Apresentador: Manoela Laender Recife • manu.recife@outlook.com
Resumo:
Hereditary Breast and Ovarian Cancer (HBOC) syndrome is a genetic condition in which individuals present a higher risk of developing breast and ovarian cancer due to the inheritance of germline mutation in genes such as BRCA1 and BRCA2. Whole Exome Sequencing (WES) has played a crucial role in cancer diagnosis due to its ability to identify mutations in genes directly associated with the disease, as well as to investigate additional genes across the genome. This approach enables a more comprehensive analysis of genetic data, providing deeper insight into an individual's predisposition to developing cancer. This study aimed to report clinically relevant variants in prioritized genes from WES of 7 Hereditary Breast and Ovarian Cancer probands, patients of Associação de Combate ao Câncer do Centro-Oeste de Minas (ACOM). Sequencing data were obtained using the NovaSeq X platform (Illumina), and initial quality control and secondary analysis such as reads alignment and variant calling were done through GATK (v.4.6) pipeline. Analyzed samples presented the following quality metrics: number of reads (91M), percentage of reads with quality above 30 (93%), vertical coverage (36X) and percentage of horizontal coverage 20X (58%). The VCF files were first filtered using a BED file containing the positions of the genes of interest, and then filtered by depth ≥ 20, mutated allele frequency between 0,3 and 0,7 and minimum number of mutated allele ≥ 5. ANNOVAR (v.20191024) was used for clinical annotation with RefSeqGene (oct 2021), Clinvar (sep 2024) and gnomAD (v4.1 exome) databases. Although pathogenic and likely pathogenic variants were identified, none showed a clear association with the HBOC phenotype. Further analysis of variants, particularly those of uncertain significance, is necessary in the search for potentially significant findings. The findings of this study emphasize the importance of genetic screening as a preventive healthcare strategy. Genetic testing allows patients to adopt preventive measures such as increased surveillance, chemoprevention and prophylactic surgeries. As we increase our understanding of the genetic architecture of diseases, including HBOC, genetic screening remains a great tool for improving patient outcomes and guiding personalized medicine.
Palavras-chave: Next-Generation Sequencing, Whole Exome Sequencing, hereditary breast and ovarian cancer syndrome
#1126378

Differential Gene Expression Profile in Umbilical Cord Tissues from Pregnant Women Exposed to COVID-19

Autores: Jéssica Macedo Rafael de Arruda,Bianca Moreira,Pethersen José Moraes Dos Reis Bueno Rocha,Mariana da Silva Mendonça,PAULA MAGNELLI MANGIAVACCHI,Milton Masahiko Kanashiro,Álvaro Fabrício Lopes
Apresentador: Jéssica Macedo Rafael de Arruda • jessica.crei@gmail.com
Resumo:
Fetal development is a highly delicate and organized process involving gene programming mechanisms and cellular differentiation to ensure the proper formation of tissues and organs. Studies have associated adverse environmental factors during pregnancy — such as poor maternal nutrition, drug and alcohol use, exposure to pollutants and chemicals, inappropriate use of medication, and contact with pathogens — with congenital anomalies, especially during the first trimester, and/or functional defects from the second trimester onward. In this context, the SARS-CoV-2 virus, responsible for the COVID-19 pandemic, is considered a potential disruptive factor that may affect key molecular pathways involved in fetal development. The analysis of maternal-fetal interface tissues, such as the placenta and umbilical cord, may help clarify how SARS-CoV-2 infection during pregnancy influences these processes. In this study, we analyzed 68 RNA-Seq samples obtained from the Sequence Read Archive (SRA) database of the National Center for Biotechnology Information (NCBI), including 20 from the control group (CTRL) and 48 from the COVID-19 group (37 symptomatic and 11 asymptomatic). Samples were pre-processed using TrimGalore and quality-checked with FastQC. The reads were aligned to the human reference genome (hg19) and quantified using SeqMonk v.1.48.0. Differential expression analysis was performed using DESeq2, and the results were visualized through a Venn diagram, identifying genes commonly differentially expressed in the umbilical cord of symptomatic and asymptomatic COVID-19 cases. A total of 11 genes and eight lncRNAs were found to be differentially expressed in both subgroups. Notably, SLC7A1 and FAT1, previously linked to immune and inflammatory responses in COVID-19, were among them, as well as SHANK3 and MAOA, which have been associated with psychiatric disorders such as Autism, Schizophrenia, and Attention Deficit Hyperactivity Disorder (ADHD). It is concluded that SARS-CoV-2 influences the regulation of genes related to immune response and neurodevelopment, as evidenced in maternal-fetal interface tissues. However, as the involved pathways are not yet fully understood, further studies — particularly experimental ones — are needed to validate these findings.
Palavras-chave: COVID-19, umbilical cord, fetal development, RNA-Seq, differential gene expression, SARS-CoV-2, immune response, neurodevelopment, lncRNA, DESeq2, maternal-fetal interface, TrimGalore, SeqMonk
★ Running for the Qiagen Digital Insights Excellence Awards
#1126388

Genomic Landscape of Mansonia titilans and Mansonia wilsoni: A closer look at repetitive elements

Autores: Adenilton José da Silva Júnior,Elverson Soares de Melo,Gabriel da Luz Wallau
Apresentador: Adenilton José da Silva Júnior • Adenilton.sjunior@gmail.com
Resumo:
Transposable elements (TEs), or transposons, are repetitive DNA sequences that can move or replicate within a genome via transposition, forming the mobilome. These elements are widespread in eukaryotic organisms and often represent a significant portion of their genomes. In Aedes aegypti, for example, TEs account for about 55% of the genome. Recently, TEs studies have gained increasing attention. However, in mosquitoes, most research has focused on the Aedes and Anopheles genera, while mosquitoes of the Mansonia genus have yet to have their genome or mobilome characterized. Therefore, we aimed to gain insight into the genome composition, with a detailed focus on the mobilome of two Mansonia species: Ma. titillans and Ma. wilsoni, using two different methods for characterization of the mobilome, similarity-based and de novo approaches. We conducted a low-coverage genome sequencing from mosquitoes collected in the metropolitan region of Recife, using ILLUMINA MiSeq platform. The reads underwent a pre-processing step that included the removal of low-quality regions using the Trimmomatic software. For similarity-based characterization, processed reads were initially assembled using Megahit and subjected to redundancy removal with Redundans. Assembly completeness was then assessed using BUSCO. Subsequently, TE families were annotated with RepeatMasker. For de novo characterization, additional pre-processing steps were performed in which mitochondrial DNA was filtered out using Bowtie2. The resulting nuclear DNA reads were then processed with the DnaPipeTE pipeline. Based on the comparison between two methods for characterizing repetitive elements in both Mansonia genomes, the de novo approach proved to be more sensitive than the similarity-based annotation. In Ma. wilsoni, the similarity-based method identified 11.8% of LINEs, 3.1% of LTRs, and 5.2% of DNA transposons, while the de novo analysis detected higher proportions of 16.44% LINEs, 3.33% LTRs, and 8.24% DNA elements. These findings demonstrate that the de novo method identifies a greater number of transposon families, including elements that may be absent or highly divergent in reference databases used by similarity-based tools. Using the de novo approach we observed that the genome of Ma. titillans is composed of approximately 55% single-copy DNA, while in Ma. wilsoni, this value is only around 38%. Although the repetitive fraction of the genome is abundant, a large fraction of the genome (24% in titilans and 30% in wilsoni) remained unclassified which could represent highly divergent or fragmented TEs. The genomes are also poor in satellite DNA (1.35% and 1.38%, respectively) and low complexity regions (0.28% and 0.41%,respectively). Both species possess many young TE families, some of which are still potentially active. Class I elements are more abundant (11.1% in titillans and 19.9% in wilsoni) than Class II elements (7.51% in titilans and 8.43% in wilsoni). The higher TE content in Ma. wilsoni is mainly due to RTE superfamily expansion in this species. A comparative analysis of TE families shared between the species revealed significant variation in the proportion of the genome occupied by each family, suggesting that different TE families have followed distinct evolutionary trajectories in each species.
Palavras-chave: Transposable DNA elements. Retroelements. Culicidae. Insect Genome. Genomic Components.
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126396

Unveiling tissue inhibitors of metalloproteinases gene regulation in dental caries through transcriptomic profiling

Autores: Juliana Soares,SIMONE GOMES DE OLIVEIRA,Flávio Henrique Baggio Aguiar,Rodrigo Jardim
Apresentador: Juliana Soares • jusooares123@gmail.com
Resumo:
The dentin extracellular matrix (ECM) is a dynamic structure essential for the integrity of dental tissues. Its remodeling is regulated by a balance between matrix metalloproteinases (MMPs) and their endogenous inhibitors, the tissue inhibitors of metalloproteinases (TIMPs). While the role of MMPs in ECM degradation is well established, the regulatory function of TIMPs in dental tissues, particularly in the context of dental caries, remains underexplored. This study aims to analyze the differential expression profile of the TIMP1, TIMP2, TIMP3, and TIMP4 genes in sound and carious human teeth using RNA-Seq data. Biological samples underwent total RNA extraction, rRNA depletion library preparation, and sequencing on the Illumina NovaSeq 2000 platform. Data processing followed a pipeline comprising FastQC for pre- and post-alignment quality control, Flexbar for adapter removal and trimming of reads with a Phred score below 20, HISAT2 for alignment to the human reference genome GRCh38/hg38, StringTie for transcript assembly and quantification, and Ballgown for differential expression analysis. In Ballgown, differential expression was considered significant for genes with an adjusted p-value (FDR) < 0.05 and a log2 fold change ≥ |1|. Only transcripts with FPKM > 1 were retained. Preliminary results indicate variations in TIMP gene expression between the analyzed groups, with a trend of increased TIMP1 expression in carious samples and reduced TIMP4 expression in the same group. TIMP2 and TIMP3 showed little or no apparent variation. These findings, although preliminary, suggest a possible compensatory role of TIMPs in ECM modulation in response to caries. This study highlights the potential of transcriptomic approaches combined with bioinformatics to unravel molecular mechanisms in dental tissues and provides novel insights into the role of TIMPs in human dentin.
Palavras-chave: TIMPs, dental caries, RNA-seq, differential gene expression, transcriptomics
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126399

DUF6160 and a Putative Hydrocarbon Degradation Pathway in Acinetobacter

Autores: Bruna Sayuri Cardoso Ogusku,Anacleto Silva de Souza,Robson Francisco de Souza
Apresentador: Bruna Sayuri Cardoso Ogusku • bscogusku@gmail.com
Resumo:
Bacteria of the Acinetobacter genus belong to the class Gammaproteobacteria and the order Moraxellales. They are Gram-negative, strictly aerobic, and coccobacilli. This genus is most famous for its nosocomial species (A. baumannii, A. calcoaceticus, A. nosocomialis, A. pittii and A. seifertii), which make up the Acinetobacter baumannii-calcoaceticus complex (ABC complex). The World Health Organization has included the ABC complex in the ESKAPEE group of priority, which lists the main multiresistant bacteria responsible for the high mortality rates in nosocomial infections. This genus's status as an opportunistic pathogen is recent, with some species gaining antimicrobial resistance and virulence factors in the last five decades, now being responsible for infections that are hard to treat, extending the hospital stay and elevating mortality rates of already immunocompromised patients. Before its ascension as a pathogen, Acinetobacter was being studied for use in the biotechnology field due to its high potential for genetic transformation, but that line of study has been deprecated in favour of clinical research. While studying the secretion systems of the ABC complex, we came across an abundant domain of unknown function, DUF6160, with neighboring genes that indicate a possible hydrocarbon degradation pathway, opening up the possibility of studying Acinetobacter’s potential in biotechnology once more. DUF6160’s presents a β-barrel structure, with a hydrophobic and electrostatically neutral inner surface, thus being a possible entryway for hydrocarbon molecules and the start of a degradation pathway. After a structural homology search, we found DUF6160 present in many different species, such as Marinobacter lutaoensis, Alcanivorax sp. DP30 (aquatic) e Alkanindiges hydrocarboniclasticus (terrestrial). Our main objectives now are exploring the taxonomic distribution of DUF6160 and, through genomic context, infer its function.
Palavras-chave: Acinetobacter, hydrocarbon, genomic context, domain of unknown function, Gammaproteobacteria
★ Running for the Qiagen Digital Insights Excellence Awards
#1126407

SpectraAnalysis - A rapid, low-cost and automatized program for spectrofluorometer analysis.

Autores: Pedro Felipe de Sousa Queiroz,Sophia Garcia de Resende,Thiago Rabelo,Ernesto Breciani dos Santos Marques Taveira,MIGUEL CUNHA NUNES,Priscila Grynberg,Roberto Togawa,Cintia Marques Coelho
Apresentador: Pedro Felipe de Sousa Queiroz • pedrofsqueiroz@gmail.com
Resumo:
Avian influenza, caused by viruses from the Orthomyxoviridae family, includes both high and low pathogenic subtypes. The highly pathogenic H5N1 subtype has affected poultry and wild birds with significant mortality rates. Swine and, more recently, other mammals—including humans—can also be infected. Brazil, the world's second-largest chicken producer, is on alert following confirmed cases. Additionally in 2013 there was an outbreak in China, which lead to a major economic and social impacts. Therefore, it is crucial to monitor and diagnose the disease early to implement effective control measures. Current methods such as PCR are effective but expensive and require complex equipment. ELISA, another widely used method, is more affordable but can produce false positives and is typically only effective in later stages of infection. These limitations hinder mass testing, which is especially important in countries like Brazil, where there are large poultry populations and increasing concern about human infection. This project aims to develop a rapid, low-cost detection method that does not require enzymatic activity, based on simple molecules. Utilizing hammerhead ribozymes and DNA-harpins for FRET-HCR, transported as dry samples, this approach has the potential to eliminate the complexity of transport and storage, as it does not require refrigeration during these stages. To eliminate the need for highly specialized labor and ensure broader distribution, the development of a computational program was proposed. This program is designed to automatically analyze data generated by spectrophotometers and simplify the interpretation of results obtained through fluorimeters. This method has the potential to significantly improve avian flu diagnostics, enabling a quicker and more effective response to outbreaks. The data analysis and processing program was developed using Python programming language, along with libraries such as Pandas, NumPy, and Flet. Its development has been structured into three distinct phases. The first phase involved an initial analysis of the format, volume, and quality of the tabular data produced by the fluorimeters. The main objective was to gain a comprehensive understanding of the data to be processed. In this stage, a graphical data output was also established to help the general public better understand the results. The second phase consisted of developing and analyzing the program’s modules. In this stage, various components of the software were designed, rigorously tested, and debugged using appropriate tools. Real fluorimeter data were also used to validate the results. Currently, the program is functional and features an intuitive graphical interface. Finally, the third phase focused on adapting the program for distribution across multiple platforms. The primary deployment method was through a Flet application, enabling compatibility with Linux and Windows, and in the future we intend to do a web version, enabling the program to work on any platform. The program can be accessed at the following link: http://lbi.cenargen.embrapa.br/SpectraRNA/
Palavras-chave: Diagnostic Methods, Avian Influenza, H5N1, FRET-HCR, Python, Flet, interface
★ Running for the Qiagen Digital Insights Excellence Awards
#1126410

Development and Application of Hidden Markov Model-Based Approaches for the Identification of mitoviruses in Fungi

Autores: Aijalon Junior,Lucas Yago Melo Ferreira,João Pedro Nunes Santos,Paulo Eduardo Ambrosio,Eric Roberto Guimarães Rocha Aguiar
Apresentador: Eric Roberto Guimarães Rocha Aguiar • ericgdp@gmail.com
Resumo:
Mycoviruses, which infect fungi, play a crucial yet poorly understood role in ecosystems. Although many mycoviral infections exhibit no apparent symptoms in their fungal hosts, studying them is essential due to their potential impact on fungal interactions. This research focuses on the Mitoviridae family, aiming to enhance the understanding of its structure, lifecycle, infection mechanisms, and diversity. Traditional methods for mycovirus identification and classification rely on sequence similarity analyses, such as the Basic Local Alignment Search Tool (BLAST). These approaches compare unknown sequences to databases to identify novel viruses based on homology. However, they have limitations when detecting new species with low similarity to known ones. To overcome this restriction, we propose the use of Hidden Markov Models (HMMs) as a robust and complementary alternative. HMMs analyze biological sequences through stochastic processes with hidden states, enabling the detection of underlying probabilistic patterns rather than relying solely on direct sequence similarity. This makes HMMs particularly effective in identifying highly diverse or unusually structured mycoviruses, such as multisegmented mitoviruses. For Mitoviridae mycovirus identification, we employed HMMER based on HMM structures. A total of 105 species in the Mitoviridae family, spanning the Unuamitovirus, Duamitovirus, and Triamitovirus genera, were selected for model creation. All unique sequences deposited assigned to Mitoviridae family were collected from the NCBI nucleotide database, aligned using MAFFT, and refined with Aliview, resulting in HMM models for each genus and segment as well as a general Mitoviridae model. To validate our model, we assessed 41 previously characterized mycoviral sequences recently published, including eight mitoviruses with only 1 species classified at the genus level. All mitoviral sequences were identified by our HMM-based approach, with five classified under the Unuamitovirus genus, including one known Unuamitovirus, and the three others were assigned to Duamitovirus genus. Further comprehensive phylogenetic analysis reinforced our result. In conclusion, the HMM-based methodology proved highly effective in mitovirus identification and characterization. The pattern recognition capabilities of HMMs enabled differentiation between genera and facilitated the identification of conserved domains. This approach not only improved classification accuracy but also advanced our understanding of viral diversity and structure.
Palavras-chave: Mitoviridae, Narnavidae, Hidden Markov Models, Bioinformatics, Viruses
#1126419

RNA-Seq-based gene expression profiling during dormancy in non-model native bees

Autores: Larissa Nunes do Prado,Priscila Karla Ferreira dos Santos,Paulo Cseri Ricardo,Maria Cristina Arias
Apresentador: Larissa Nunes do Prado • lnprado@usp.br
Resumo:
Understanding transcriptional changes during physiological states such as dormancy in non-model organisms requires both biological insight and robust bioinformatics. In this study, we implemented an RNA-Seq pipeline to investigate gene expression during dormancy in three stingless bee species (Melipona subnitida, Melipona marginata, and Plebeia remota), integrating quality control, redundancy reduction, contaminant filtering, and multi-database annotation to ensure high-confidence transcriptome data. The reads were de novo assembled using Trinity with preprocessing parameters that included adapter trimming, quality filtering (Phred < 30), read length filtering (<31 bp), and normalization. After assembly, redundancy was reduced using CD-HIT, which improved BUSCO scores by decreasing the number of duplicated genes and increasing the detection of single-copy orthologs. Functional annotation was performed using Trinotate with the SwissProt database and enhanced through local Diamond alignments against UniRef90. To ensure taxonomic specificity, we developed a custom script for contaminant removal based on taxonomy IDs from annotation results. Transcripts matching known contaminants (e.g., Bacteria, Fungi, Viridiplantae, Viruses) with an e-value ≤ 1e-5 were excluded, substantially improving downstream analyses. Quantification was performed with RSEM after mapping reads using Bowtie2, and expression values were filtered by a TPM ≥ 0.25 threshold to retain reliably expressed transcripts. The final datasets contained over 93% of transcripts retained post-filtering across all species. EdgeR was used for differential gene expression analysis, revealing 1,487 DEGs in M. subnitida, 1,165 in M. marginata, and 1,787 in P. remota. Differential expression analysis revealed distinct transcriptional responses to dormancy across species. To further explore expression dynamics, we applied K-means clustering, which produced groupings consistent with those observed in the heatmap analyses, reinforcing the validity of the observed patterns. In addition, we developed a custom script to compare enriched Gene Ontology (GO) terms across species, enabling systematic identification of shared biological processes associated with dormancy. These findings highlight the value of robust bioinformatic pipelines in uncovering conserved and species-specific transcriptional responses to dormancy, offering a reliable framework for comparative studies in non-model organisms.
Palavras-chave: RNA-Seq, transcriptome, dormancy
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126424

Integrating Deep Learning Models for Accelerated Antibody Design

Autores: Eduardo Menezes Gaieta,Matheus do Vale Almeida,Jean Vieira Sampaio,Diego da Silva Almeida,Andrielly Henriques dos Santos Costa,João Herminio Martins Da Silva,Geraldo Rodrigues Sartori
Apresentador: Geraldo Rodrigues Sartori • geraldo.sartori@ifsc.usp.br
Resumo:
Antibodies are versatile biomolecules of high pharmaceutical relevance, particularly due to their success in cancer immunotherapy. Their modular architecture enables diverse therapeutic formats, including monoclonal antibodies, nanobodies, bivalent constructs, CAR-T therapies, TCR-mimic antibodies, and antibody-drug conjugates (ADCs). A key feature of antibodies is their capacity to bind virtually any antigen with high specificity— a property rooted in the structural and chemical diversity of the paratope. This diversity is primarily driven by the variability of the complementarity-determining regions (CDRs) and the relative orientation between the VH and VL domains. While these features underpin their remarkable functional versatility, they also present substantial challenges for computational antibody design. Recent advances in deep learning, particularly diffusion models and protein language models, have enabled the development of novel tools for antibody engineering. These approaches are accelerating the rational design and optimization of antibody structures and sequences for targeted ligand binding, offering new opportunities in the development of next-generation immunotherapeutics. We aim to establish a robust antibody development pipeline by integrating state-of-the-art methods to efficiently address the inherent conformational and chemical space of antibodies. First, the RFdiffusion algorithm was integrated and evaluated to explore the relative positioning of the framework with respect to the epitope, the VH/VL orientation, and the conformational diversity of the CDRs. Sequence generation and optimization via inverse folding are being performed using protein and antibody language models, such as ESM2-IF and AntiFold. All generated sequences are subsequently evaluated for humanness using AbNatiV and OASis, and for structural plausibility using ABodyBuilder3. Despite limitations in handling multiple chains, we developed a routine capable of generating de novo antibody structures targeting specific epitopes, as well as optimizing the size and conformation of CDRs from existing antibodies. By combining cutting-edge deep learning models with robust structural evaluation tools, this pipeline holds great promise for accelerating the design of highly specific, optimized antibodies for therapeutic applications.
Palavras-chave: Protein engineering, antibody, deep learning, pipeline
#1126426

Uncovering Tissue-Specific Cachexia Mediators through the Cancer Cachexia Omics Database (CaCaOdb)

Autores: Amanda Piveta Schnepper,Victória Larissa Schimidt Camargo,Caio Fernando Ferreira Mussatto,jeferson dos santos souza,IAGO DIOGO SILVEIRA,Fernando de Souza Leite,Ana Luiza Labbate Bonaldo,Jakeline Santos Oliveira,Sarah Santiloni Cury,Robson Francisco Carvalho
Apresentador: Amanda Piveta Schnepper • amanda.schnepper@unesp.br
Resumo:
Cancer cachexia is a complex syndrome characterized by progressive loss of muscle mass and adipose tissue. It is a significant cause of morbidity and mortality in cancer patients. Cachexia mediator factors (CMFs) are tumor-derived genes that, through various secretion pathways or vesicle transport, reach other tissues and contribute to the progression and worsening of cachexia. Despite advances in the study of cancer-associated cachexia and the increased investigation of CMFs in the tumor microenvironment, their systemic effect across different tissues remains underexplored. To better understand the underlying mechanisms and mediators of cachexia, we developed the Cancer Cachexia Omics Database (CaCaOdb), which allows researchers to comprehensively explore transcriptomic data of cancer cachexia in humans and mice. It includes reanalyzed public transcriptomic data (cachectic compared to normal) from over 1,000 RNA-Seq and microarray samples covering skeletal muscles (gastrocnemius, quadriceps, rectus abdominus, diaphragm, and tibialis anterior), adipose tissue (visceral and epididymal), the heart, and the liver in humans and mice. In addition, CaCaOdb provides the expression landscape at the single-cell level of cachexia-associated factors in human cancers (~ 1 million cells, 240 samples, and 12 tumor types). To investigate the role of CMFs in cancer-associated cachexia, hierarchical clustering was performed on RNA-seq data using 122 CMFs previously identified in scRNA-seq. Analysis of these CMFs in 208 RNA-seq samples from mouse cachexia models revealed significant variability in gene expression profiles across tissues. Muscle tissue shows lower CMF gene expression than the adipose tissue, heart, and liver. Moreover, a subset of 7 CMFs (Alb, Hc, Crp, Azgp1, AI182371, Igfbp2, and Ttr) exhibits increased expression exclusively in the liver. The different cachexia models also seem to play a significant role in CMF expression. In total, 10 distinct models were analyzed, including eight primary and two metastatic ones. Understanding the role of CMFs in different tissues is crucial for uncovering the systemic mechanisms underlying cancer-associated cachexia. Variability in the expression profiles of these factors across tissues highlights the complexity of this syndrome and underscores the need for comprehensive approaches to fully understand its systemic impact. These findings suggest that cachexia may affect tissues in a specific manner, emphasizing the limitations of muscle-focused analyses. By developing an online and publicly accessible database, we aim to advance research into the mechanisms of cachexia, paving the way for strategies to improve the prognosis and quality of life of affected patients.
Palavras-chave: cancer cachexia, transcriptomics, database, system biology
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126427

In silico development of a multi-epitope protein for Immunodiagnostics Application in Rhodococcus equi infection

Autores: Rodrigo Alex Henriquez Arancibia,Gustavo Pires Ramos Cerqueira dos Santos,Roberto Meyer,Nubia Seyffert,Sandeep Tiwari,Ricardo Wagner Dias Portela,Thiago Luiz de Paula Castro
Apresentador: Rodrigo Alex Henriquez Arancibia • rodrigo.h.arancibia@gmail.com
Resumo:
Rhodococcus equi is an opportunistic bacterium commonly found in soil and causes severe pyogranulomatous bronchopneumonia in foals, resulting in significant mortality and morbidity. Infection by R. equi poses a substantial threat to the equine industry both in Brazil and globally. Diagnosis of R. equi typically relies on culturing and molecular testing through tracheobronchial aspirate (TBA) sampling, which can be invasive for compromised animals. The development of immunodiagnostic methods for R. equi marks a significant advancement due to the ease of sample collection, high sensitivity, rapid response times, and efficient processing capabilities. Since a definitive immunodiagnostic test is still absent due to sensitivity and specificity limitations, this study utilized innovative in silico approaches, including comparative genomics, subtractive genomics, immunoinformatics, and structural bioinformatics, to design a multi-epitope protein for immunodiagnosis. We obtained 18 proteomes from the R. equi complete genomes in the NCBI database and performed core proteome analysis to select the proteins that are common to all strains considered. We found 803 proteins in the R. equi core proteome that are also non-homologous to the host Equus caballus, making these potential targets to achieve greater sensitivity and specificity in immunodiagnosis. We performed further analyses involving subcellular localization, role in cell function, virulence involvement, and antigenic properties to filter the four best protein candidates for B-cell epitope prediction. The best epitope candidates to design a chimeric protein for immunodiagnosis were selected based on parameters such as antigenicity, variability, stability, hydrophilicity, and conservation among bacterium and host. We designed a 22 kDa multi-epitope protein using specific residue linkers and attending strict parameters for greater solubility, stability, and antigenicity. We generated a high-quality three-dimensional structure for this protein that presented 90.8 % of its residues in the favorable region and only 1.4% in disallowed regions, according to the Ramachandran plot. This work represents an important step towards the development of a new immunodiagnostics assay for rhodococcosis. Future directions involve in vitro production of the recombinant protein and standardization of immunoenzimatic assays using sera from affected and non-affected equine.
Palavras-chave: Immunodiagnostics , Rhodococcus equi,Immunoinformatics ,Equine Health
#1126440

Integrating Network Pharmacology and Molecular Docking to Evaluate the Therapeutic Potential of Tangeretin against Medulloblastoma

Autores: Nicolly Clemente de Melo,Lucas Miguel de Carvalho
Apresentador: Nicolly Clemente de Melo • nicolly.melo@mail.usf.edu.br
Resumo:
Tangeretin is an antioxidant flavone with anticancer effects capable of inhibiting the development and progression of cancer cells. Due to these properties and the statistical relevance of cancer in the central nervous system, with 11,490 cases per 100,000 inhabitants between 2023 and 2025, the study of natural compounds applied to brain tumors emerges as a promising approach. Due to the challenges in early diagnosis and treatment, with metastasis being the leading cause of mortality, medulloblastoma, primarily pediatric, requires further research focused on developing new therapies that could reduce metastasis cases and the side effects of conventional therapies. For this purpose, a network pharmacology approach was applied. Importants genetic databases were using for search the molecular targets of flavone and Medulloblastoma. Also, Absorption, Distribution, Metabolism and Excretion (ADME) criteria were applied to the flavone. For the construction of the Protein-Protein Interaction (PPI) network, a reliable database that retrieved all known and predicted protein associations, including physical and functional associations, was consulted. The analysis of enrichment biology was applied, searching for biological processes, molecular functions and cellular components that could explain the carcinogenesis process in this specific cancer, also a pathway analysis was employed to clarify all these processes. PPI network analyses revealed therapeutic targets such as EGFR, AKT1, SRC, GSK3B, PARP1, MMP9, PTGS2, MCL1, ABCB1. After clustering, molecular docking was realized with EGFR, AKT1, PARP1 and SRC. The best result of binding was between SRC protein and tangeretin, which presented a satisfactory binding energy of -6.33 kcal/mol and an RMSD of 0, indicating high affinity and therapeutic potential, and even showing similarity of 72% of intermolecular interactions between the original ligand from the PDB file and tangeretin. Functional enrichment of the signaling pathways indicated the relevance of the EGFR-TKI, PI3K-Akt, Chemical Carcinogenesis - ROS, Estrogen Signaling, Ras Signaling, MAPK Signaling, and FoxO Signaling pathways. The modulation of these pathways by tangeretin suggests a positive therapeutic approach in reducing carcinogenesis progression and improving the response to chemotherapy. Further, verification of the action of the hub genes in the PPI was realized using public databases about genomic expression, evaluating their response in different biological conditions.
Palavras-chave: flavone, tangeretin, medulloblastoma, network pharmacology, bioinformatics
★ Running for the Qiagen Digital Insights Excellence Awards
#1126446

Arginase I Pathway-Associated microRNAs as Prognostic Biomarkers in Acute Myeloid Leukemia (AML)

Autores: Isabela Emilia da Silva Oliveira,Maria Alice Ferrari Souza,JAQUELINE FRANÇA COSTA,Sandeep Tiwari
Apresentador: Isabela Emilia da Silva Oliveira • isabelaemilia@gmail.com
Resumo:
Acute myeloid leukemia (AML) is a hematologic malignancy that presents significant reductions in normal blood cells due to the accumulation of leukemic blasts in the bone marrow, blood, and other tissues. The low 5-year survival rate and the risk of relapse are among the main reasons for poor outcomes due to the biological diversity of the patient and intratumoral heterogeneity. Metabolic and cellular reprogramming are hallmarks of neoplastic initiation and progression in AML. L-arginine metabolism regulates the suppressive activity of AML blasts, which express and release arginase. Increased arginase enzymatic activity in the plasma of AML patients suppresses T-cell proliferation and increases the ability of blasts to create an immunosuppressive microenvironment. Increased arginase I expression and enzymatic activity in AML patients may be related to the expansion of myeloid blasts and immunosuppression of the immune response. MicroRNAs (miRNA), responsible for controlling post-transcriptional gene expression, give us the relevance of their potential mechanisms in the relationship of their expression in aggressive variables and clinical prognosis, increasing the importance of their role in tumorigenesis as oncogenes or suppressors. The aim of the research was identify miRNAs differentially expressed in serum, blood and plasma samples as new biomarkers that regulate Arginase I activity in AML patients for application in prognosis. The search for dysregulated miRNAs for AML was performed using public databases, prioritizing studies between the years 2015 and 2025 that utilized samples from AML patients. In the later stages of the study, specialized software will be used to assign Gene ontology (GO) terms to miRNAs using a reverse annotation strategy and to analyze the network interactions between miRNA-TF-miRNA or TF-miRNA-TF. Within the context of the Arginase pathway in AML, potential biomarkers will be identified through the analysis of these interaction networks. Therefore, for the partial results were found a total of 126 articles were retrieved from the PubMed database using the keywords: “miRNAs”, “acute myeloid leukemia”, “serum”, “peripheral blood” and “plasma” focusing on the studies published between 2015 to 2025 that compared AML patients with control subjects. Patient ages ranged from 3 to 87 years. The most frequently used method to measure expression levels was qRT-PCR using the Applied Biosystems 7500 Fast platform (Applied Biosystems, CA, USA). Induction chemotherapy was the most commonly reported treatment among the studies. The majority of studies were conducted in China. From these, 28 dysregulated miRNAs were identified in serum samples, 17 downregulated and 11 upregulated. In peripheral blood samples, 19 miRNAs were identified, with 11 upregulated and 8 downregulated. For plasma samples, we have 31 miRNAs with 15 downregulated and 16 upregulated. In this analysis, using an integrated reverse-transcriptomics-based bioinformatics approach, we aim to identify key transcription factors that may contribute to the development of pathway-specific biomarkers within the Arginase I axis in acute myeloid leukemia (AML).
Palavras-chave: microRNAs, Acute Myeloid Leukemia (AML), Arginase I
★ Running for the Qiagen Digital Insights Excellence Awards
#1126462

Stratification and Multi-Tissue Transcriptomic Profiling Uncover Molecular Drivers of Cancer Cachexia

Autores: Sarah Santiloni Cury,Amanda Piveta Schnepper,Kaltinaitis Santos,Jakeline Santos Oliveira,Ana Luiza Labbate Bonaldo,Victória Larissa Schimidt Camargo,Caio Fernando Ferreira Mussatto,Patricia Pintor dos Reis,Geysson Javier Fernandez Garcia,Juan Camilo Calderon Velez,Robson Francisco Carvalho
Apresentador: Robson Francisco Carvalho • robson.carvalho@unesp.br
Resumo:
Cancer cachexia, a systemic wasting syndrome driven by tumor-host interactions, progresses through stages requiring precise molecular and phenotypic characterization. To dissect its progression, skeletal muscle, visceral adipose tissue, liver, heart, and tumor samples were analyzed from Lewis Lung Carcinoma tumor-bearing mice, stratified into four groups: control (n = 10), pre-cachexia (n = 12), cachexia (n = 11), and severe cachexia (n = 8). Stratification was based on skeletal muscle (gastrocnemius, tibialis anterior, extensor digitorum longus, soleus) and visceral adipose depots (mesenteric, epididymal, perirenal, subcutaneous), normalized by tibia length. Hierarchical clustering using Euclidean distance and complete linkage, combined with supervised machine learning (Random Forest, CART, SVM, LDA, KNN), achieved robust classification, with Random Forest and SVM reaching accuracies above 85% and strong agreement (Cohen’s Kappa >0.8). Mesenteric and epididymal fat depots emerged as key predictors of severity. Total RNA sequencing was performed on the NovaSeq X platform. The reads were processed through a bioinformatics pipeline, which included quality control with FastQC, alignment with STAR, quantification with Salmon, normalization, and differential expression modeling with DESeq2. Principal component analysis revealed a robust separation of groups, with PC1 and PC2 explaining approximately 46% of the transcriptomic variance in the gastrocnemius. TRADEseq revealed tissue-specific processes driving cachexia: proteasomal degradation, oxidative phosphorylation, and extracellular matrix remodeling in muscle; calcium ion signaling, transcriptional stress response, and T-cell migration in adipose tissue; inflammatory signaling, synapse organization, and cell surface protein localization in heart; TNF regulation, mitotic control, leukotriene transport, and collagen metabolism in liver; and actin filament polymerization and myeloid leukocyte differentiation in tumors. Cluster analysis identified 34 tumor-upregulated genes, with Cebpb, Map4k4, and IL-33 emerging as potential therapeutic candidates driving muscle atrophy (Cebpb and Map4k4) and adipose wasting (IL-33). Plasma proteomics via liquid chromatography-tandem mass spectrometry identified modulated proteins, including Hemopexin, Serotransferrin, and Alpha-1-antitrypsin isoforms, as biomarkers of cachexia severity. Gene set enrichment analysis using EnrichR and inter-tissue ligand–receptor mapping with the LIANA database supported these insights, suggesting therapeutic targets. This study pioneers cachexia stage stratification using anatomical parameters, with transcriptomic and proteomic data revealing therapeutic targets across syndrome progression.
Palavras-chave: Transcriptomics, Proteomics, Multi-Omics, Cachexia, Therapeutic Targets
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126479

Binning of Metagenomic Data with Unsupervised Extreme Learning Machine

Autores: Jair José Herazo Álvarez,Sara Cuadros Orellana,Marco Mora,Karina Vilches,Kevin Esmeral
Apresentador: Jair José Herazo Álvarez • jair.herazo@alu.ucm.cl
Resumo:
Unsupervised metagenomic binning is a key task in bioinformatics aimed at clustering DNA fragments (reads or contigs) that originate from the same organism. This step is essential for studying complex microbial communities, enabling the identification and characterization of species, many of which are still uncultivable and not individually sequenced. To represent the sequences, descriptors such as k-mer frequencies and GC content are typically used, resulting in high-dimensional data with intrinsic non-linear geometric structures.

Since clustering algorithms like k-means or Expectation Maximization (EM) assume simple structures and Euclidean metrics, it becomes necessary to transform the original data into more suitable spaces for grouping. In this context, techniques are required that can extract low-dimensional embedded representations while preserving the topological structure of the data. One neural network capable of performing this task is the Unsupervised Extreme Learning Machine (US-ELM).

US-ELM is an unsupervised extension of the Extreme Learning Machine (ELM) model, a single-layer feedforward neural network known for its extremely fast training. Unlike the original ELM, which is used for classification, US-ELM integrates tools such as manifold regularization, spectral graph theory, and generalized eigenvalue decomposition to project the data into a latent space that better captures its similarity structure.

This work presents an implementation of US-ELM applied to real metagenomic datasets. Its performance is evaluated in comparison with classical clustering methods like k-means and Expectation–Maximization (EM), using metrics such as Accuracy, and computational time.

The results show that US-ELM generates more useful representations for binning, outperforming traditional methods in clustering precision while being significantly faster than EM and comparable in speed to k-means. The emerging role of unsupervised neural networks in metagenomic binning is highlighted, along with current challenges such as the design of biologically informed similarity functions and the scalability to large metagenomic datasets.
Palavras-chave: Metagenomic Binning, Dimensionality Reduction, Unsupervised Extreme Learning Machine
#1126528

HufflePlots and SlytheRINs: Webtools for dynamic protein structure analysis

Autores: Laura Shimohara Bradaschia,Matheus Epifane-de-Assunção,Dr. João Paulo Matos Santos Lima
Apresentador: Laura Shimohara Bradaschia • laura.shimohara.041@ufrn.edu.br
Resumo:
The intricate relationship of protein structure and its folding dynamics remains elusive, despite the optimal growth structural biology has experienced over the past years. This is due to the growing understanding of the non-static nature of proteins, whose functions often depend on conformational changes and complex interactions occurring over a conformational ensemble. Among the computational approaches seeking to understand these dynamic properties, Residue interaction network (RIN) analysis emerged as a powerful tool. By representing proteins as networks - where residues become nodes, and its chemical and physical interactions with elements of their structures become edges - one can rapidly deduce key residues influencing structure and function using network parameters such as degree, betweenness and clustering coefficient. These analyses provide relevant insights into its dynamics comprehension. However, conventional RIN analysis is usually limited by its reliance on single, static protein structures, failing to capture the inherent flexibility of dynamic transitions proteins undergo through its folding. To address such matters, we present two bioinformatics web tools to provide insights of protein dynamics.
Huffleplots is a python-based webtool built to accept multiple protein structure files from molecular dynamics (MD) simulation trajectory as input, and utilizes mey metrics like Root Mean Square Deviation (RMSD) and Root Mean Square Fluctuation (RMSF) to track overall structural deviation over time across varied ensemble or residues, indicating conformational stability and transitions through the molecule trajectory. Due to its multiple file input alternative, HufflePlots is a powerful tool for comparative analysis, allowing users to visualize differences in stability and flexibility between different proteins, providing direct insights into functional impact of sequence or ligand binding changes on overall protein dynamics.
While HufflePlots provides an overview of conformational changes and flexibility over proteins, it does not detail the interaction changes on a residue interaction level, and how it impacts these dynamics. To address this matter, the second tool SlytheRINs is built upon the RINs concept, enabling dynamic ensembles analysis through the breakdown of interaction network data from multiple conformations of a single protein generated by the RING 3.0 tool. The integration and comparison of multiple conformations of a single protein allows the expansion from static to dynamic residue-residue interaction analysis, and SlytheRINs provides a detailed mapping through comparative plots of residue interactions across the conformational changes.
Together, HufflePlots and SlytheRINs offer a harmonious workflow for protein dynamics analysis. While HufflePlots provides initial comparative insights of conformational and flexibility changes of given proteins, SlytheRINs enables deeper insight utilizing a dynamic approach for residue interaction analysis of varied conformations from a single protein. The combination of global dynamic visualization with detailed RINs of these tools seeks to confer further comprehension of protein dynamics and their biological roles.
Palavras-chave: protein-folding; residue Interactive network (RIN); molecular dynamics (MD); Bioinformatics Webtools
★ This work is running for the Next Generation Bioinfo Award
#1126540

The complete genome sequence of the world’s 3rd smallest deer: the Southern Pudu (Pudu puda)

Autores: Maria Bárbara Borges de Santana,Eduardo Javier Pizarro Gonzalez,Juan Pablo Silva,Francisco Pinilla,Domenica Marchese,Vinicius Maracaja Coutinho,Juliana vianna
Apresentador: Maria Bárbara Borges de Santana • mbarborgess@gmail.com
Resumo:
The Southern Pudu (Pudu puda) has a diploid genome of 70 chromosomes, and is the world’s third smallest deer (35-45cm tall and 6.4-13.4kg), slightly larger than its two other related south-american species: the peruvian Yungas Pudu (Pudella carlae) and the Northern Pudu (Pudella mephistophiles). It is a reclusive, crepuscular herbivore dwelling in the temperate woodlands of southern Argentina and Chile, categorized as Near Threatened by the IUCN. Recent escalations in vehicular collisions, canid predation, and wildfires have markedly affected the species, necessitating intervention from firefighters and wildlife rehabilitation facilities. DNA from the striated quadriceps muscle of a deceased Pudu puda was extracted and sequenced on DNBSeq G400 sequencer (MGI Tech) and PromethION P2 Solo (ONT), to obtain paired-end short-reads and single-end long-reads of the genome. Our raw dataset comprises ~288.8 million short-reads and ~23.7 million long-reads. For quality control, we implemented FastQC, NanoComp, Fastp, Fastplong and MultiQC tools. For short-reads, data was filtered considering a minimal Q20 and trimming MGI adapters, followed by 21-mer counting with Meryl. For long-reads, we considered a minimal Q10 and trimmed ONT ligation adapters, followed by 21-mer counting with Jellyfish. Then, we implement the following genome assembly strategy: (1) a long-read assembled genome corrected and polished by high quality and depth short-reads was performed using the assemblers Flye and Hifiasm for comparison; (2) short-reads were then mapped to these long-read assemblies with Minimap2 for subsequent polishing with Pilon. Gene prediction and assembly evaluation were performed using Augustus, BUSCO, seqtk and gfastats. Before polishing, Flye assembly revealed 5,110 primary contigs, from which 106 are telomeric, N50 of ~8MB, total length of 2,528,222,429 bases, and 23X coverage. On the other hand, Hifiasm assembled 12,429 primary contigs where 175 are telomeric, N50 of ~678KB, total length of 3,720,188,236 bases and 22X coverage. BUSCO analysis considering the Artiodactyla order revealed that the assemblies generated with Flye and Hifiasm achieved 97.3% and 93.8% completeness, respectively. Additionally, Miniprot predicted 13,819 and 13,827 putative protein-coding sequences from these assemblies. Initial assemblies indicate Flye yielded superior continuity and BUSCO completeness. Short-read polishing of these assemblies with Pilon is currently underway. Subsequent scaffolding will be performed to enhance contiguity, targeting near chromosome-level resolution. It is expected that these refinement steps will improve overall genome assembly coverage, quality and completeness, leading to more precise gene and protein annotations. Finally, this high-quality genomic resource is anticipated to substantially support future conservation genetics, population studies, and evolutionary analyses of the near-threatened Southern Pudu.
Palavras-chave: Pudu. Genome assembly. Long-reads. Short-reads.
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126574

Nirmatrelvir-oxazolidinone derivative as a potential covalent inhibitor of SARS-CoV-2 Mpro

Autores: Pablo Andrei Nogara,Kaylane V.F de Freitas,Rodrigo Chibiaque,Henry Gabriel de Belo Barcellos,Leandro da Silva Camargo,Rogério de Aquino Saraiva
Apresentador: Pablo Andrei Nogara • pablonogara@ifsul.edu.br
Resumo:
The covid-19 pandemic, caused by the SARS-CoV-2, has been required a global effort to combat it efficiently. In this context, the present study aims to find new potential drugs capable of inhibiting the virus replication, through virtual screening, molecular docking simulations and semi-empirical quantum chemistry calculations. The main protease (Mpro) of SARS-CoV-2 was selected as the macromolecular target. Ligands were designed based on structural similarities to well-known Mpro inhibitors (such as nirmatrelvir, ebselen, and disulfiram) and peptide-based substrates of this protease. First, the molecules were designed and virtually created to evaluate their ADMET properties (absorption, distribution, metabolism, excretion, and toxicity), by SwissADME and pkCSM tools. Only the molecules without apparent toxicity and in compliance with the Lipinsky’s and Weber’s rules were selected for the next step, the molecular docking simulations using AutoDock Vina. From over 800 molecules, only 20 were used in the docked with Mpro. Among these 20 molecules, 4 presented docking results similar and/or better than the reference drug nirmatrelvir. For example, the molecule Nir-oxazolidinone, which has an ester group in the place of nitrile group in nirmatrelvir, exhibited binding pose and interactions pattern similar to the standard drug, interacting by hydrogens bonds and hydrophobic interactions with the His41, Met49, Phe140, Gly143, Ser144, Cys145, His164, Glu166, Leu167, Thr190, Gln192 residues from the active site. In addition, the predicted binding free energy of Nir-oxazolidinone (-9.6 kcal/mol) was slightly more favorable than the nirmatrelvir (-9.3 kcal/mol). Considering that the nitrile group of nirmatrelvir forms a covalent bond with the thiol group of the Cys145 residue in the Mpro active site, the ester moiety of Nir-oxazolidinone may also have this potential, supported by a short S···C=O distance of 4.5 Å. To explore this, semi-empirical calculations were performed to evaluate covalent bond formation. The reaction energies for nirmatrelvir and Nir-oxazolidinone were -8.9 and -8.5 kcal/mol, respectively, suggesting that both compounds could inhibit Mpro via covalent interaction. Thus, this methodology demonstrated to be effective in identifying potential covalent inhibitors of Mpro with possible antiviral activity. However, further in vitro and in vivo studies are necessary to evaluate the real therapeutic potential of these designed molecules.
Palavras-chave: in silico, antiviral, docking, thiol
#1126644

Mapping the Expression Landscape of Cachexia Mediators through Single-cell Analysis of Human Cancers

Autores: Victória Larissa Schimidt Camargo,Amanda Piveta Schnepper,Jakeline Santos Oliveira,Ana Luiza Labbate Bonaldo,Caio Fernando Ferreira Mussatto,Afonso Martin Melazzo,Sarah Santiloni Cury,Robson Francisco Carvalho
Apresentador: Victória Larissa Schimidt Camargo • victoria.schimidt@unesp.br
Resumo:
Cancer cachexia is a severe syndrome characterized by progressive weight loss and muscle wasting, significantly impairing the quality of life, response to treatment, and survival of cancer patients. Its prevalence varies across tumor types, affecting up to 80% of advanced cancer patients and responsible for 20% of cancer-related deaths. The syndrome is driven by cachexia mediators (CM) secreted by cells within the tumor microenvironment (TME). These mediators, including proinflammatory cytokines and growth factors, correlate with cachexia prevalence and weight loss rates. Single-cell RNA sequencing (scRNA-seq) enables high-resolution transcriptional profiling of TME cells; however, no studies have systematically mapped the CM gene expression at the single-cell level across cancers with varying cachexia prevalence. To address this gap, we aimed to establish a comprehensive atlas of CM gene expression to uncover cell-type-specific expression patterns and TME contributions in human cancer with different cachexia prevalence. By integrating large-scale scRNA-seq data, our study overcomes limitations of bulk RNA-seq, providing unprecedented resolution of cellular CM transcriptional profile. First, we selected CM genes using text-mining tools Geneshot and Open Targets. Geneshot, with the term “cancer cachexia,” returned 202 genes, and Open Targets, with “cachexia,” returned 332 genes. After combining, removing duplicates, and filtering by the Human Protein Atlas secretome list, we identified 123 secreted CM genes. Then, we reanalyzed scRNA-seq data from 12 tumor types publicly available in Cancer Curated Atlas (3CA). Using the Seurat v.4.0.6 package in R, we processed raw counts, normalized data, clustered TME cells, and quantified 123 CM gene expression per cell type. Datasets from the same tumor type were integrated using the IntegrateData function from Seurat. In total, we analyzed 245 tumor samples and 829,984 unique cells. Our findings revealed that CM gene expression is cell-type-specific, independent of tumor type. In malignant cells, genes such as ITLN1, IHH, and CCK showed positive correlations with cachexia prevalence, suggesting a role in promoting tissue wasting. In endothelial cells, ANGPTL4, IL15, and MSTN were similarly upregulated, while IL2 and CSF2 were prominent in B cells. Conversely, genes like HCRT in mast cells, IGFBP5 and FST in malignant cells, and TGFB1 in endothelial cells exhibited negative correlations, indicating potential protective effects against the syndrome. These patterns were consistent across tumor types, highlighting universal TME-driven mechanisms in cachexia. This atlas represents a novel bioinformatics resource that advances our understanding of the TME contribution to cachexia and provides a robust framework for hypothesis generation. It will be publicly available at Cancer Cachexia Omics Database (www.cacaodb.com.br), allowing researchers to explore CM expression across cell and tumor types. This work underscores the power of bioinformatics in unraveling complex disease mechanisms, paving the way for precision therapies targeting specific CMs or cell types.
Palavras-chave: cancer cachexia, scRNA-seq, muscle loss, pan-cancer, cachexia mediators
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126666

DECODING THE APOPTOTIC microRNA SIGNATURE ACROSS AGE GROUPS IN LEUKEMIA PATIENTS

Autores: Maria Alice Ferrari Souza,Isabela Emilia da Silva Oliveira,Sandeep Tiwari,JAQUELINE FRANÇA COSTA
Apresentador: Maria Alice Ferrari Souza • alice.ferrari152@gmail.com
Resumo:
Leukemias are a group of hematologic malignancies characterized by evasion of apoptosis. The subtypes of the disease have an distinct age ranges, with acute lymphoblastic leukemia (ALL) being predominant in individuals under 15 years of age, and acute myeloid leukemia (AML), typically affecting individuals with an average age of 68 years. This age-related shift may be associated with a decline in apoptotic activity, which can impact the response to treatment. MicroRNAs (miRNA) are characterized by their regulatory role in the apoptotic pathway and by their distinct expression profiles,which are associated with age. These biomolecules make an important tool for understanding the modulation of apoptosis in leukemia.. This project aims to evaluate, in silico, how the expression profile of miRNAs is involved in apoptosis during acute lymphoblastic leukemia and acute myeloid leukemia as well as the potential relationship between chronological age and the modulation of apoptosis. The data collected was realized in the Pubmed database between December 2024 and January 2025 using as criteria samples of the bone marrow, blood, serum, and plasma from adults (18 to 91 years old) and pediatric (1 to 16 years old) patients with acute myeloid leukemia. We used the keywords "miRNAs", and "acute myeloid leukemia". The miRNAs will be classified by reverse annotation according to the target genes involved in - the apoptotic pathway. Furthermore, gene enrichment analysis will be performed to prioritize candidate genes, and the miRNAs will be grouped based on the functional annotation. Subsequently, interaction networks between transcription factors and miRNAs will be constructed and key nodes will be identified, where those that present themselves consistently will be considered significant.In our previous results, we found 147 dysregulated miRNAs in acute myeloid leukemia, among them 138 founders in samples of adult patients, these 80 were founder upregulated, and 58 downregulated. Moreover, in blood samples 5 miRNAs were found, 4 downregulated and 1 upregulated, in serum samples 16 were found, 8 downregulated and 8 upregulated, we found 18 miRNAs in plasma samples, 10 downregulated, and 8 upregulated, in bone marrow samples 52 founded, 18 downregulated and 35 upregulated. Some studies used two sample types, 1 miRNA was found upregulated in bone marrow and serum samples, and in bone marrow and blood samples 54 were found, 24 downregulated and 30 upregulated. Using an integrated reverse-transcriptomics-based bioinformatics approach, this study decodes the apoptotic miRNA signature across age groups in Leukemia patients to better understand leukemogenesis and expand new perspectives for more efficient treatment.
Palavras-chave: microRNAs, apoptosis, acute myeloid leukemia, acute lymphoblastic leukemia
★ Running for the Qiagen Digital Insights Excellence Awards
#1126667

Assessment of Kaistella jeonii esterase conformational dynamics bound to poly(ethylene terephthalate)

Autores: Arthur Tonietto Mangini,Éderson Sales Moreira Pinto,Lorenzo Chaves Costa Novo,Fernando Guimarães Cavatão,Mathias J. Krause,Marcio Dorn
Apresentador: Arthur Tonietto Mangini • arthurttmm@gmail.com
Resumo:
Plastic pollution poses a significant threat to biodiversity and human health. Bioremediation presents an eco-friendly and cost-effective approach to mitigate this issue, particularly through enzymes capable of degrading plastic waste. In this study, we investigate PET30, a novel cold-active esterase from Kaistella jeonii, which exhibits activity against polyethylene terephthalate (PET) despite lower catalytic efficiency than Ideonella sakaiensis PETase. Using molecular dynamics simulations, we explored the binding interactions between PET30 and PET. The enzyme structure (PDB ID: 7PZJ) was refined via comparative modeling, and a PET trimer was modeled with force field parameters for molecular docking. Docking focused on key catalytic residues (Ser153, Asp198, His230), followed by ensemble docking to identify optimal protein-ligand complexes for microsecond-scale simulations. Two systems were analyzed: unbound PET30 (PET30F) and PET30 bound to PET (PET30B), each simulated for 1 μs across five replicas (totaling 10 μs). Simulations revealed that PET30 undergoes conformational adjustments upon PET binding without significant structural disruption. Notably, the absence of a disulfide bond near the catalytic pocket appears to impact coordination between Asp198 and His230, potentially affecting catalytic performance. Additionally, residues lining the binding site form a road-like structure, which in some cases facilitates PET movement. These results provide valuable insights into the structural dynamics and potential evolutionary adaptations of PET30 in plastic degradation.
Palavras-chave: Molecular dynamics simulation, PETase, Bioremediation, Biodegradation, PET
#1126702

Integrative Multi-Omic Modeling of Regulated Cell Death Reveals Clinically Actionable Biomarkers Across 33 Cancer Types

Autores: Emanuell Rodrigues de Sozua,Higor Almeida Cordeiro Nogueira,Enrique Medina-Acosta
Apresentador: Emanuell Rodrigues de Sozua • 202512120026@pq.uenf.br
Resumo:
Regulated cell death (RCD) is fundamental to tissue homeostasis and cancer progression, directly influencing therapeutic responses across tumor types. Despite the extensive individual characterization of each RCD modality, the field still lacks a unified methodological framework capable of integratively analyzing these pathways, thereby limiting the systematic identification of clinically actionable biomarkers. To address this gap, we developed the multi-optosis model, incorporating 25 distinct forms of RCD and integrating multi-omic and phenotypic data across 33 cancer types. Using data from 9,185 tumor samples from The Cancer Genome Atlas (TCGA) and 7,429 normal tissue samples from the Genotype-Tissue Expression (GTEx) database (via UCSCXena), we analyzed 5,913 RCD-associated genes, encompassing 62,090 transcript isoforms, 882 mature miRNAs, and 239 cancer-associated proteins. Seven omic layers—somatic mutations, copy number variations (CNVs), CpG methylation, protein abundance, mRNA expression, miRNA expression, and transcript isoform profiles—were systematically correlated with seven clinical and phenotypic features: tumor mutational burden (TMB), microsatellite instability (MSI), tumor stemness (TSM), hazard ratios, survival outcomes (OS, DSS, DFI, PFI), tumor microenvironment (TME), and tumor immune infiltration (TIL). Over 27 million pairwise correlations were computed, generating 44,641 multi-omic signatures. These signatures revealed both unique and overlapping RCD-associated patterns, establishing statistically significant correlations between molecular features and clinical or phenotypic endpoints. Many signatures demonstrated strong associations with survival metrics, underscoring their prognostic relevance across diverse cancer types. All results are accessible through the CancerRCDShiny platform, providing an interactive resource for biomarker exploration and therapeutic prioritization. To extend the prognostic analysis into a predictive modeling framework, we constructed a machine learning pipeline to evaluate the predictive capacity of clinically informative multi-omic signatures. Recognizing the presence of missing data across omic and clinical layers, we harmonized the datasets to preserve biological consistency and clinical validity. We then applied six imputation strategies—mean, random, k-nearest neighbors (kNN), missForest, XGBoost, and LightGBM—and benchmarked their performance through Cox regression, focusing on preservation of biological effects, p-value robustness, and concordance index (C-index) integrity. The missForest method exhibited the highest statistical fidelity and was selected for downstream modeling. Using the imputed datasets, RandomForest algorithms were employed independently across cancer types and omic layers, aiming to stratify patients based on survival outcomes (OS, DSS, DFI, PFI). In total, 2,960 predictive multi-omic signatures were identified, particularly enriched for transcript isoforms and mRNAs with limited prior clinical annotation. Among these, 418 signatures achieved C-index and AUC values above the upper quartile (Q3) across multiple tumor types, including ACC, BLCA, BRCA, CESC, COAD, ESCA, GBM, HNSC, KIRP, LGG, LIHC, LUAD, LUSC, PAAD, PRAD, READ, SKCM, STAD, THCA, THYM, and TGCT. These results show the robust prognostic capacity of the selected multi-omic signatures in predicting patient survival. By integrating multi-omic RCD signatures with clinical and phenotypic endpoints, we lay a robust foundation for precision oncology strategies that exploit RCD pathways as therapeutic targets. Future directions include external validation in independent cohorts, resolution of RCD signature cell-type specificity using single-cell transcriptomic data, and integration of pharmacogenomic profiles to speed up the clinical translation of RCD-based candidate biomarkers.
Palavras-chave: Multi-Omics Integration, Precision Oncology, Predictive Modeling, Prognostic Biomarkers, Regulated Cell Death
★ Running for the Qiagen Digital Insights Excellence Awards
#1126710

Metagenomic characterization of a Leptogium cyanolichen from Amazon biome in Maranhão state

Autores: José Isaias Pimentel Barros,Anna Letícia Silva Da Costa,Hivana Patricia Melo Barbosa Dall'Agnol,Jouko Rikkinen,Tania Keiko Shishido,Leonardo Teixeira DallAgnol
Apresentador: Leonardo Teixeira DallAgnol • leonardo.td@ufma.br
Resumo:
Lichenic ecosystems are complex symbiotic associations between fungi, algae, cyanobacteria, and bacteria, playing a fundamental role in nutrient cycling, especially in poor environments. These organisms produce secondary metabolites with relevant ecological functions, and metagenomics has proven to be an essential tool for investigating their diversity and functionality. This study aimed to analyze the taxonomic and functional diversity of a cyanolichen from the Amazon biome with an emphasis on identifying symbionts and the biosynthetic potential of the associated microbiome. The sample was collected in the municipality of Anajatuba, in the state of Maranhão, Brazil, during the transition period between the rainy and dry seasons (May 2024). The morphological identification of the cyanolichen symbionts was based on integrative taxonomy analysis. For the metagenomic analysis, DNA from the ANA13 sample was extracted using the DNeasy Plant Pro Kit (QIAGEN) and subsequently sequenced on the AVITI platform. The quality of the reads was assessed using FastQC and Trimmomatic, and the metagenome was assembled using MetaSPAdes. Taxonomic classification was conducted with Kaiju and GOTTCHA2 on the Kbase server. Gene prediction was carried out with AUGUSTUS, while functional annotation was used with Prokka and RASTtk on the Galaxy platform. Biosynthetic gene clusters were identified with antiSMASH and fungiSMASH. The QUAST, CheckM, BUSCO, and Assembled Contig Distributions tools were used to assess the quality of the metagenome and MAGs. Binning was conducted with MetaBAT2, MaxBin2 and CONCOCT, and the results were refined with the DAS Tool. The MAG corresponding to the main photobiont was classified with GTDB-Tk and TYGS and then subjected to genomic similarity analysis (ANI) and in silico hybridization (GGDC). Metagenomic analysis of the ANA13 sample revealed a highly diverse microbiome, with an extensive capacity for secondary metabolite production. A total of 102 biosynthetic clusters of bacterial origin and 116 of fungal origin were identified, with a predominance of NRPS, T1PKS, and terpene types. Taxonomic analyses confirmed the presence of the mycobiont genus Leptogium and the photobiont Nostoc and indicated a complex heterotrophic community. The binning process allowed the recovery of a MAG (Bin.001), related to the photobiont, with 99.56% completeness. Annotation with Prokka revealed a high-quality genome containing 7,461 predicted genes, while RASTtk identified 8,649 coding sequences associated with various metabolic pathways. Phylogenomic and similarity analyses confirmed that MAG belongs to the genus Nostoc, but with divergences suggesting a possible new species. This study represents an unprecedented contribution to the knowledge of lichens from Amazon and Brazil, being the first record of the occurrence of the genus Leptogium. The results highlight the rich metabolic diversity of these symbionts and their biotechnological potential, broadening our understanding of local biodiversity and offering new possibilities for the bioprospecting of secondary metabolites in lichen ecosystems that have yet to be explored.
Palavras-chave: Gene clusters, lichen ecosystem, lichenized fungi, MAG’s
★ Running for the Qiagen Digital Insights Excellence Awards
#1126717

In silico investigation of selenazoil-peptides as SARS-CoV-2 Main protease inhibitors

Autores: Pablo Andrei Nogara,Rogério de Aquino Saraiva,João Batista Texeira da Rocha
Apresentador: Pablo Andrei Nogara • pablonogara@ifsul.edu.br
Resumo:
The development of antiviral drugs is importance for the eradication of viral pathogens, such as SARS-CoV-2. The main protease (Mpro) of SARS-CoV-2 plays a crucial role in the viral life cycle, as it performs post-translational modifications on viral proteins, making them 'functional', and thus it represents a therapeutic target in the search for antivirals. Studies have shown that the organoselenium compounds ebselen and disulfiram are effective Mpro inhibitors, as they can covalently bind to the catalytic cysteine residue (Cys145) via their selenium and sulfur atoms, respectively. This characteristic can be explored in the search for new Mpro inhibitors. In this sense, the aim of this study was to design and perform virtual screening of hybrid peptide-organoselenium inhibitors, as this approach combines in a single molecule the selectivity provided by the similarity of substrate residues (peptides) with the chemical reactivity of certain chemical species. Initially, several amino acid sequences, which are substrates for Mpro, were tested by attaching reactive groups derived from ebselen and disulfiram, resulting in dozens of molecules. The screening was first performed by predicting toxicity and applying Lipinski’s and Weber’s rules, and the most promising molecules were tested in molecular docking simulations using the AutoDock Vina program. The best results were obtained using the Leu-Gln-Ser-Gly (LQSG) sequence, which contained the 1,2-selenazoil group. To improve the interaction between the thiol group of Cys145 and the electrophilic selenium atom, the selenazoil group was positioned in different position in the peptide chain. Docking simulations for the LQSG-Se-3 molecule demonstrated that it has a binding mode similar to that of the LQSG peptide, where the Leu residue forms hydrophobic interactions with Met49 and His41, the side chain of Gln forms hydrogen bonds with His163 and Glu166, and the carbonyl of the isoselenazole ring forms hydrogen bonds with the amino groups of Gly143 and Ser144 residues. It is important to highlight that the conformation adopted by the molecule could facilitate the attack of Cys145 on the Se atom, as the Se···S distance was 4.9 Å. Thus, the LQSG-Se-3 compound showed promising results, making it a good candidate as an Mpro inhibitor due to the Se···S interaction with Cys145. The synthesis and in vitro/in vivo evaluation of this molecule should be carried out to assess its true efficacy.
Palavras-chave: covid-19, ebselen, docking, selenium
#1126718

Metagenomic Analysis of the Cyanolichen Coccocarpia: Symbiotic Diversity And Biosynthetic Potential In A Brazilian Amazon-Cerrado Ecotone

Autores: Anna Letícia Silva Da Costa,José Isaias Pimentel Barros,Hivana Patricia Melo Barbosa Dall'Agnol,Tania Keiko Shishido,Jouko Rikkinen,Leonardo Teixeira DallAgnol
Apresentador: Leonardo Teixeira DallAgnol • leonardo.td@ufma.br
Resumo:
Lichens are symbiotic associations between fungi and other microorganisms, forming self-sustaining communities that are widely distributed and adapted to contrasting environments. The genus Coccocarpia, typically found in tropical ecosystems, shows potential for nitrogen fixation and the production of bioactive metabolites. This study presents the metagenomic analysis of Coccocarpia sp., a cyanobacterial lichen reported for the first time in São Luís Island, Maranhão, Brazil. The sample was collected in July 2024 on the campus of the Federal University of Maranhão (2°33'17.2"S 44°18'28.7"W), located in an ecotonal region between the Amazon and Cerrado biomes. The specimen was stored in a paper bag, cleaned, and subjected to DNA extraction using the DNeasy Plant Pro kit (Qiagen), following a protocol adapted for dry bead disruption. DNA quality was assessed using NanoDrop 1000, Qubit 4 Fluorometer, and the 5200 Fragment Analyzer. Metagenomic sequencing, conducted in partnership with the University of Helsinki, was performed using the AVITI platform (Element Biosciences). Raw data were pre-processed with FastQC v0.12 and Trimmomatic v0.39. Assembly was performed using MetaSPAdes v4.0.0 and evaluated with QUAST v4.4. Taxonomic classification was carried out using Kraken2, Kaiju (NCBI BLAST nr+euk), GOTTCHA2, and MetaPhlAn. MAGs were reconstructed with MetaBAT2, MaxBin2, and CONCOCT, refined with DASTool, filtered with CheckM, and extracted using BinUntil. Genome annotation was performed with RASTtk, taxonomic classification of MAGs with GTDB-Tk v1.7.0, and functional analysis with DRAM. Biosynthetic gene cluster (BGC) identification was carried out using AntiSMASH v6.1.1 for bacteria and FungiSMASH for fungi. All analyses were conducted on remote servers (KBase, Galaxy EU) and at the CSC server of the University of Helsinki. The metagenomic assembly yielded 155.7 Mbp across 89,942 contigs, with an N50 of 3,020 bp and GC content of 58.56%, including 60 contigs longer than 100 kbp. Kaiju demonstrated the highest accuracy in taxonomic classification, identifying the mycobiont as Coccocarpia palmicola (family Coccocarpiaceae) and the photobiont as Nostoc sp., consistent with results from Kraken2 and GOTTCHA2. The associated microbial community showed significant diversity, including fungi from the families Parmeliaceae and Cladoniaceae, bacteria from the genera Methylobacterium, Rhizobium, and Actinoplanes, cyanobacteria (Nostoc), and viruses from the Microviridae family. A total of 397 biosynthetic gene clusters were identified: 251 in bacteria (AntiSMASH) and 146 in fungi (FungiSMASH), with a predominance of NRPS (105), PKS (78), and terpenes (79). Notable metabolites included anabaenopeptin (8 clusters), rhizomide (4), and fusarin (3). MAG reconstruction highlighted three key genomes: a cyanobacterium (Iningainema) with 10.88 Mb and 99.52% completeness, containing nif genes for nitrogen fixation; a heterotrophic Bacteroidota (Cyclobacteriaceae, 4.03 Mb); and an Actinoplanes genome (8.86 Mb, 71.1% GC) with strong potential for antibiotic production. This study reveals the complex symbiotic community of the lichen, its ecological role, and potential as a source of bioactive compounds, emphasizing metagenomics as a strategic tool for bioprospecting.
Palavras-chave: Bioactive Metabolites, Bioindicators, Biosynthetic Gene Clusters, Lichenic Community, Nitrogen Fixation
★ Running for the Qiagen Digital Insights Excellence Awards
#1126783

Integrative Structural Dynamics: Enhanced Sampling of Protein Conformational Landscapes Using Experimental Data

Autores: Larissa de Oliveira Bastos,Mauricio Costa
Apresentador: Larissa de Oliveira Bastos • laribastos2212@gmail.com
Resumo:
Accurately sampling the conformational landscape of biomolecules remains a significant challenge in computational structural biology. Due to intrinsic time scale limitations, conventional molecular dynamics (MD) simulations often fail to capture large-scale motions essential for biological functions. To address this gap, we are developing an integrative framework that expands the Molecular Dynamics with Excited Normal Modes (MDeNM) methodology, enhancing sampling efficiency through the dynamic incorporation of experimental data.
Building on kinetically excited low-frequency collective motions, our approach integrates heterogeneous experimental observables, such as cryo-electron microscopy (Cryo-EM) and small-angle X-ray scattering (SAXS), to guide conformational exploration. This integration allows simulations to be steered toward experimentally relevant regions of the conformational space without modifying the underlying physical model.
The protocol is being implemented as an open and modular computational workflow compatible with widely used MD programs. Test cases involving different proteins of diverse sizes and structural properties were selected to validate the methodology. These proteins serve as model systems to assess the ability of the integrative strategy to reproduce experimentally observed structural variability and to explore conformational transitions inaccessible to standard MD.
Our preliminary results demonstrate that incorporating experimental restraints during kinetic excitation significantly improves sampling coverage and enhances the physical relevance of the generated ensembles in a short computational time. This method provides a promising solution for bridging the gap between molecular simulations and experimental structural biology. It offers a powerful tool for characterizing complex biomolecular dynamics across different scales and systems, instilling optimism for its potential impact in the field.
Palavras-chave: Integrative Structural Biology, Molecular Dynamics, Normal Mode Analysis, Enhanced Sampling, Cryo-EM, SAXS.
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126807

De novo design of scaffold proteins for a peptide derived from the Non-Structural Protein 1 of Dengue virus serotype 1

Autores: Patrícia da Silva Antunes,Junior Olimpio Martins,Carlos Henrique Bezerra da Cruz,Luiz Mário Ramos Janini,Ricardo Durães-Carvalho
Apresentador: Patrícia da Silva Antunes • patriciabiotecnologia@gmail.com
Resumo:
A major challenge in distinguishing among the endemic and highly circulating
Dengue virus (DENVs) serotypes 1-4 and other viruses from the same family in Brazil stems from their genomic similarity, which leads in controversial and imprecise outcomes when using commercially available diagnostic kits. This cross-reactivity phenomenon undermines the accuracy of epidemiological surveillance and may lead to ineffective decision-making, ultimately impacting clinical management and patient outcomes. To enhance diagnostic precision and address cross-reactivity challenges, here, we first retrieved representative viral sequences from specialized public databases and conducted deep-clustering analysis, followed by Bayesian phylodynamic modeling to reconstruct and map the evolutionary relationships and population dynamics of DENV-1 over time. Subsequently, we performed sequence alignments among different viruses within the same family, such as DENVs, Zika (ZIKV) and Yellow Fever (YFV). This approach enabled identification of virus-specific peptides from Non-Structural protein 1 (NS1) of DENV-1, which were subsequently evaluated for MHC binding affinity, glycosylation potential, and selection analyses. The top-ranked candidate was incorporated into protein scaffolds designed computationally using RFdiffusion, generating 30,000 three-dimensional backbone models of the target-specific peptide using the high-performance computing (HPC) resources of the Santos Dumont Supercomputer (LNCC). The successfully designed scaffolds were then filtered based on the lowest radius of gyration and the highest number of contacts per residue. Primary sequences for the scaffold models were selected based on structural parameters and generated using ProteinMPNN. The resulting structures were then predicted and validated with AlphaFold3, successfully cloned into the pET28a(+) plasmid, and purified to ≥ 90%. Finally, we implemented a computational pipeline to design RNA aptamer candidates capable of binding the peptide, which hold potential for further exploration as raw materials in diagnostic applications. Further experimental validation will involve ELISA assays using flavivirus recombinant proteins and serological samples from infected patients, with results benchmarked against commercially available diagnostic kits.
Palavras-chave: dengue, cross-reactivity, scaffold, aptamers, phylodynamic, RFdiffusion
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126830

Isoform Variability of Glyceraldehyde-3-Phosphate Dehydrogenase (GAPDH) in Trypanosoma cruzi and Its Potential as a Therapeutic Target in Chagas Disease

Autores: Aila Maria Melo Correia,Julia Geyziana Oliveira Costa Araújo,Miguel Victor Bringel Sales,Antonio Edson Rocha Oliveira
Apresentador: Miguel Victor Bringel Sales • mivibrisa@gmail.com
Resumo:
Trypanosoma cruzi is a unicellular eukaryotic protozoan from the family Trypanosomatidae, responsible for Chagas disease, which affects both humans and animals. Its complex life cycle includes three distinct forms: trypomastigotes, amastigotes, and epimastigotes. Transmission primarily occurs through the bite of the blood-feeding insect Triatoma infestans. If left untreated, Chagas disease can be fatal. Current treatments are limited by low efficacy and parasite resistance, highlighting the urgent need for more effective and targeted therapies. One promising target is the enzyme glyceraldehyde-3-phosphate dehydrogenase (GAPDH), essential for T. cruzi metabolism. As the parasite depends heavily on glycolysis for energy, GAPDH plays a crucial role in sustaining its growth and virulence. Importantly, the GAPDH found in T. cruzi's glycosome has structurally distinct binding sites compared to its human counterpart, making it a strong candidate for selective drug development. Despite its therapeutic potential, no previous study has thoroughly investigated the genomic and transcriptomic variability of GAPDH across multiple T. cruzi strains. This project aims to perform a genomic and transcriptomic analysis of GAPDH in various T. cruzi strains to identify potential differences that could influence its use as a therapeutic target for Chagas disease. The analysis of the genomes of 20 different Trypanosoma cruzi strains, corresponding to the six distinct DTUs (1 Tcbat, 5 TcI, 8 TcII, 1 TcIII, 1 TcV, and 4 TcVI), available in the TriTrypDB database, revealed that the GAPDH gene presented three different protein isoforms, named GAPDH I, II, and III (with 338, 347, and 359 amino acids, respectively), showing variation in copy number across strains. For GAPDH I, each strain contains between 1 to 3 copies, except for the Tcbat strain, which showed no copies. GAPDH II is present as a single copy in all strains, except for the Dm28c 2018 strain, which has two copies. GAPDH III generally appears with two copies per strain, except for strain YC6, which has three copies, and strains Brazil A4 and Strain G, which each have only one copy. Genomic organization analysis of 12 strains showed that GAPDH is syntenic across all analyzed strains. Among genes of GAPDH type I, two subgroups were identified based on their flanking genes: Group A, flanked by the short-chain 3-hydroxyacyl-CoA dehydrogenase and transportin-2-like protein genes, and Group B, flanked by the tyrosine aminotransferase gene. The genes of GAPDH type II are flanked by the regulator of chromosome condensation (RCC1) and cyclophilin genes, while the genes of GAPDH type III are flanked by the DNA topoisomerase III gene. Multiple sequence alignment revealed that GAPDH protein sequences are highly conserved within each type, reinforcing their potential as therapeutic targets. Transcriptomic analysis throughout the T. cruzi life cycle showed differential expression of the GAPDH gene: GAPDH I and II are more highly expressed in the epimastigote and trypomastigote stages, while GAPDH III is more expressed in the amastigote stage. In conclusion, the genomic and transcriptomic characterization of GAPDH in T. cruzi strains highlights its potential as a promising therapeutic target for more effective treatments against Chagas disease.
Palavras-chave: Trypanosoma cruzi, Glyceraldehyde-3-Phosphate Dehydrogenase, Therapeutic Targets
#1126853

Environmental Dissemination of Multidrug-Resistant Klebsiella pneumoniae in Brazil: Evidence of High-Risk Clone Circulation

Autores: Amanda Oliveira dos Santos Melo,Ludmila de Carvalho Correia,Matheus Sales,Joice Neves Reis Pedreira
Apresentador: Amanda Oliveira dos Santos Melo • amandaosm@icloud.com
Resumo:
The environmental dissemination of multidrug-resistant Klebsiella pneumoniae in Brazil was assessed by analyzing 33 genomes of isolates from environmental sources available in the NCBI database. Genomic characterization was performed using the Kleborate and ABRicate tools to detect resistance and virulence genes and multilocus sequence typing. The results revealed that most isolates originated from water samples (75.75%), followed by soil (9%), vegetables (9%), and hybrid environments (6%). Considerable genotypic diversity was observed, identifying 17 distinct sequence types (STs). Clonal Complex 258 (CC258) stood out, with ST11 being the most prevalent (33.33%), followed by ST340 (12.12%), ST437 and ST6326 (6% each). Overall, 78.8% of the genomes were classified as multidrug-resistant. CC258 clones exhibited the most critical resistance profiles, mainly four ST437 isolates and two ST11 isolates with a resistance score of 3, as well as four ST340 and two ST6326 genomes with a score of 2. Additionally, two ST307 strains, one ST101, and one ST5236, all non-CC258, also presented a resistance score of 2. No genome exhibited a hypervirulence genotype. Three genomes simultaneously harbored the blaNDM-1 and blaCTX-M-15 genes, all recovered from water samples and belonging to CC258 (one ST340 and two ST6326). Furthermore, two water-derived genomes (one ST437 and one ST11) concurrently carried mgrB mutations and blaCTX-M-15. One ST437 genome (resistance score 3), also from a water sample, harbored both a pmrB mutation and blaCTX-M-15. The blaNDM-4 and blaCTX-M-2 genes were each identified in one genome. Mutations in the chromosomal genes mgrB and pmrB, associated with colistin resistance, were detected in five genomes in total: four with mgrB mutations (three ST11 and one ST437) and one with pmrB mutation (ST437). All of these isolates were also obtained from water samples. The dataset revealed resistance determinants, including extended-spectrum β-lactamases, metallo-β-lactamases, aminoglycoside-modifying enzymes, fluoroquinolone, fosfomycin and sulfonamides-trimethoprim resistance genes, and chromosomal loci conferring polymyxin resistance. The most prevalent resistance genes were oqxA and oqxB in 32 (97%) genomes and fosA6 in 27 (81.8%). Among extended-spectrum β-lactamases (ESBL) and carbapenemases, blaCTX-M (18 genomes) and blaNDM (4 genomes) were particularly prevalent, respectively. These findings highlight the presence of high-risk K. pneumoniae clones in the Brazilian environment, with the potential to disseminate resistance genes to last-resort antibiotics. Being a significant threat to public health, it reinforces the need for integrated actions from a One Health perspective.
Palavras-chave: Klebsiella pneumoniae, Antimicrobial Resistance, One Health
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126901

Transcriptomic Risk Score Predicts Alzheimer’s Disease: A Multi-Cohort, Cross-Tissue Experimental Study.

Autores: MARCELLA VITORIA BELEM SOUZA,Gustavo Barra Matos,Gilderlanio Santana de Araujo
Apresentador: MARCELLA VITORIA BELEM SOUZA • marcellabelem12@gmail.com
Resumo:
Alzheimer’s disease (AD) is the most common form of dementia and has a strong genetic component, though its underlying mechanisms remain poorly understood. AD progressively affects multiple brain regions, including the hippocampus, entorhinal cortex, and cingulate cortex, each showing distinct gene expression profiles. The recently proposed Transcriptional Risk Score (TRS) quantifies disease risk by integrating the expression levels of genes associated with AD into a single score. Because it reflects gene activity, TRS can serve both as a predictor of disease risk and a marker of disease progression. By accounting for tissue-specific gene expression, TRS improves predictive accuracy and aids in identifying potential biomarkers for AD. Through a multi-cohort, integrative approach, we aim to validate TRS as a robust predictor of AD and uncover candidate genes for further functional and clinical investigation. We identified differentially expressed genes in four cohorts—ROSMAP, MAYO, MSBB, and GSE125283—using the edgeR package. To prioritize potential target genes, we integrated AD-related genome-wide association study (GWAS) summary statistics with expression quantitative trait locus (eQTL) data by using Summary data-based Mendelian Randomization (SMR) analysis. We then calculated Transcriptional Risk Scores (TRS) from brain transcriptome data across four independent cohorts (N = 1,970), covering eleven brain regions. To prioritize genes for TRS calculation, we performed Summary data-based Mendelian Randomization (SMR) using the SMR Portal, integrating GWAS summary statistics from Bellenguez et al. (2024) with 14 pre-built eQTL datasets from GTEx and BrainMeta. Gene prioritization revealed a total of 117 unique genes across nine tissues, with 19 genes shared between multiple tissues. TRS was tested for association with AD diagnosis, and we applied the XGBoost algorithm to construct a model for classifying cases and controls based on TRS values. Our findings show that the ability to distinguish AD from controls varies across brain tissues. TRS was computed using expression data from these nine tissues, which had prioritized genes. Of these, four tissues—the Posterior Cingulate Gyrus, Superior Temporal Gyrus, Parahippocampal Gyrus, and Cerebellum—showed significant differences (willcoxon, p-value=0.035, p-value=1.3e-05, p-value=1.3e-13 and p-value=0.0011, respective) in TRS between AD and CT groups, indicating their potential for distinguishing disease status. To assess the predictive power of the TRS, we used the XGBoost algorithm with TRS, sex, and age as input features. Among the nine tissues analyzed, the model based on TRS from the Parahippocampal Gyrus achieved the best performance, with an AUC of 0.74. These findings suggest that TRS derived from the Parahippocampal Gyrus has moderate but promising potential as a diagnostic marker for Alzheimer’s disease. While the results are encouraging, further refinement is needed to enhance classification performance—such as incorporating additional clinical or imaging features, integrating multi-tissue TRS signals to capture complex patterns in the data.
Palavras-chave: Alzheimer’s disease, Transcriptional Risk Score, Brain tissues, Differential expression, Transcriptomics.
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126910

In Silico Development of a Synthetic Multi-Epitope Antigen Derived from the E5, E6, and E7 Oncoproteins of High-Risk HPVs for a Therapeutic Vaccine Against Cervical Cancer

Autores: Davi Emanuel Ribeiro,Maria da Conceição Viana Invenção,Matheus Gardini Amâncio Marques de Sena,Anna Jéssica Duarte Silva,Antônio Carlos de Freitas
Apresentador: Davi Emanuel Ribeiro • davi.ribeiro@ufpe.br
Resumo:
Persistent infection with high-risk types of Human Papillomavirus (HPV) constitutes the primary etiology of cervical cancer, with the greatest impact in developing countries where prevention programs remain insufficient. In 2022, approximately 660,000 new cases and 350,000 deaths were attributed to persistent infections, highlighting the urgent need for more effective vaccination strategies. In this context, multi-epitope therapeutic vaccines emerge as a promising approach for infected patients, as they can stimulate T-cell responses against highly immunogenic regions of the viral oncoproteins E5, E6, and E7 of HPV. These oncoproteins are present in tumor cells and are essential for viral replication, tumor maintenance, and apoptotic evasion. This study aimed to develop a synthetic antigen through the prediction of T-cell epitopes for Major Histocompatibility Complex (MHC) class I, targeting the HPV genotypes 16, 18, 31, 33, 45, 52, and 58-the seven most prevalent globally and in the Americas. Initial steps included predictions of CD8+ T-cell epitopes for MHC I, followed by immunogenicity analyses, sequence similarity clustering, conservation studies, and population coverage assessments. These processes employed diverse immunoinformatic tools to screen, analyze, and select epitopes with optimal performance in the described assays. Three epitopes per genotype-one corresponding to each oncoprotein (E5, E6, and E7)-were selected, totaling 21 epitopes. The multi-epitope construct incorporated AAY linkers, widely used in multi-epitope vaccines due to their recognition and cleavage by the cellular proteasome, thereby enhancing epitope presentation. A TAT adjuvant domain was included to improve antigen internalization by antigen-presenting cells, amplifying immunogenicity. Additionally, restriction enzyme sites were integrated to enable recognition and precise cleavage for inserting the gene of interest into expression vectors. The selected epitopes exhibited high immunogenicity, indicating a strong potential to activate CD8+ T lymphocytes. Each sequence demonstrated 70% conservation across at least two other target genotypes, achieved a global population coverage of 96.68%, and showed favorable docking interactions with MHC-I, with binding affinity values below -7 kcal/mol. Several sequences were also previously validated in the literature. To confirm structural stability, molecular modeling tests-including Ramachandran plot analysis-revealed stable conformations, with 88.9% of residues located in the most favored regions. Physicochemical parameters, such as average molecular weight, stability indices, and aliphatic indices, aligned with ideal ranges for multi-epitope constructs when compared to previously synthesized and laboratory-tested designs. Safety assessments demonstrated no in silico toxicity for the modeled full-length protein, with suitable antigenic properties and non-allergenic predictions. In silico cloning into the pVAX vector using specific restriction enzymes graphically represented the pVAX-MultiepitopeE5/E6/E7 gene-cloning strategy. Collectively, these results underscore the robustness and therapeutic potential of the proposed construct, positioning it as a methodological platform for developing peptide- or DNA-based vaccines against persistent HPV infections and associated cancers.
Palavras-chave: Human Papillomavirus (HPV), Immunoinformatics, Therapeutic vaccine, Multi-epitope vaccine, Oncoproteins E5, E6, E7
★ Running for the Qiagen Digital Insights Excellence Awards
#1126977

Pangenome and phylogenetic analysis of a new mercury-resistant Bacillus paramycoides strain

Autores: Nicolas Ferreira Polidorio,Gislayne Fernandes Lemes Trindade Vilas Boas,laurival antonio vilas Boas
Apresentador: Nicolas Ferreira Polidorio • nikolaspolidorio@gmail.com
Resumo:
Bacillus paramycoides is a recently identified species within the Bacillus cereus group, which has been shown in many studies to have applications in bioremedion of heavy metals and other xenobiotics in contaminated environments. In this study, we analyzed the genome of a mercury-resistant strain, B. paramycoides H2-1, previously classified via average nucleotide identity (ANI) analysis. In order to establish phylogenetic relationships, reference multilocus sequence typing (MLST) loci for the B. cereus group were firstly obtained from the PubMLST database, which contain the sequences of seven housekeeping genes. They were then used as the subject for a blastn search for the corresponding sequences from several strains of B. paramycoides, B. mycoides, B. nitratireducens, and B. proteolyticus. Subsequently, they were concatenated, and then aligned using ClustalW, and used to build a Neighbor-Joining phylogenetic tree in MEGA (v. 11.0.13). Moreover, a pangenome analysis of all publicly available B. paramycoides genomes was performed using Anvi’o (v. 8) to assess genomic diversity. Lastly, genome synteny was evaluated with progressiveMauve in Mauve software (v. 2.4.0). The phylogenetic reconstruction revealed that B. paramycoides H2-1 forms a distinct group within the B. paramycoides cluster. Pangenome analysis corroborated this divergence, with H2-1 exhibiting the highest number of singletons. Strain-specific gene clusters in H2-1 included endonucleases, metabolic enzymes, and regulatory proteins. Notably, metal resistance genes were widely distributed across the analyzed strains, aligning with prior reports of this species’ efficacy in heavy metal biosorption. Among core gene clusters, transposases were the most abundant, suggesting extensive genomic plasticity and rearrangement potential, which was confirmed by the resulting synteny graph. Further investigation is warranted to determine whether this trend extends to other members of the B. cereus group. Additionally, genes associated with exopolysaccharide biosynthesis, capsule formation, and cell wall modification were identified in H2-1. While these pathways may contribute to metal resistance, their mechanistic roles remain unclear and warrant deeper exploration. Future studies should prioritize functional characterization of these genes, alongside comparative genomic analyses across the B. cereus group to elucidate evolutionary and adaptive traits. Also, the high number of transposable elements found are another matter of further analysis, particularly those neighboring metal resistance genes, plentiful in this species.
Palavras-chave: Pangenome, Metal resistance, Phylogenetics, Bacillus cereus group, Bioremediation
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1126997

BIOINFORMATIC ANALYSIS OF ZIKA VIRUS NS1 EPITOPES WITH POTENTIAL FOR NEUTRALIZING ANTIBODY INDUCTION IN VACCINE DESIGN

Autores: Matheus Gardini Amâncio Marques de Sena,Davi Emanuel Ribeiro,Suellen da Costa Figuerêdo,Maria da Conceição Viana Invenção,Anna Jéssica Duarte Silva,Antônio Carlos de Freitas
Apresentador: Matheus Gardini Amâncio Marques de Sena • matheus.gardini@ufpe.br
Resumo:
Zika virus (ZIKV) infection, associated with severe neurological complications such as Guillain-Barré syndrome and microcephaly, still lacks licensed treatments and vaccines. Vaccine targets such as structural proteins have been investigated; however, other antigens, such as non-structural protein 1 (NS1), have been identified as promising vaccine candidates. Several studies have characterized highly neutralizing NS1-specific monoclonal antibodies (mAbs), providing a rational basis for vaccine design. Moreover, advances in structural analysis tools have enabled a focus on the molecular architecture of antigens. The aim of this study was to employ protein three-dimensional structure analysis tools to characterize epitopes from a vaccine construct with the potential to elicit neutralizing antibodies targeting regions of the NS1 protein that are susceptible to mAb-mediated neutralization. Initially, a literature-based screening was performed to identify NS1-specific neutralizing mAbs isolated from convalescent patients. Five mAbs were selected: 3G2 and 4B8, which mediate complete steric blockade, and 4F10, 2E11, and 14G5, associated with partial neutralization. Subsequently, the three-dimensional structure of NS1 (PDB ID: 5K6K) was analyzed using ChimeraX software, with identification of structural domains and critical residues involved in mAb binding. These residues were then compared to short epitope sequences included in a vaccine construct (NS1-epi), which consists of six predicted T and B cell epitopes. Structural analysis revealed that the NS1-epi epitopes are distributed across the β-ladder (four epitopes) and wing (two epitopes) domains, which are the major antigenic regions of the protein. Among these, three epitopes overlapped with regions containing critical residues previously described as mAb interaction sites. Two epitopes located in the β-ladder domain exhibited partial overlap with critical residues (11–20%), with one of them recognized by more than one mAb. In contrast, the epitope within the wing domain showed high overlap (>55%) with critical residues bound by highly neutralizing mAbs. These findings suggest that epitopes within the NS1-epi construct may induce antibodies that target key neutralizing sites, particularly in the wing domain. Although the remaining epitopes are not located in regions of direct mAb interaction, they may still contribute to the stimulation of cellular immune responses. Altogether, the NS1-epi vaccine construct demonstrates potential to elicit robust antibody responses directed at critical regions involved in NS1 blockade. In vivo experimental assays using the vaccine epitopes confirmed the ability of NS1-derived peptides to elicit an immune response; however, additional experimental studies are necessary to validate the binding specificity and functional effectiveness of the induced antibodies.
Palavras-chave: immunoinformatics, vaccine, epitopes, zika virus, monoclonal antibodies
★ Running for the Qiagen Digital Insights Excellence Awards
#1127025

Integrated analysis of whole-exome sequencing and proteomics to investigate genetic and molecular factors associated with human sepsis

Autores: Angela Zanin Della Bianca,Michelle Orane Schemberger,Mariana Dallasterlla,Fernanda Do Carmo De Stefani,Lysangela Rornalte Alves,Igor Alexandre Cortes de Menezes,Luis Gustavo Morello - UEC,KARLA DANIELLE MORETTO,Beatriz Rosa de Azevedo,Bruno Janke do Nascimento,Marco Antonio Campanário,Hygor Trombeta,Bruna Cassia Dal Vesco,Vinícius Da Silva Coutinho Parreira,Alysson Henrique Urbanski,Tiago Minuzzi Freire da Fontoura Gomes,Helisson Faoro,Hellen Geremias Dos Santos,Fabio Passetti
Apresentador: Angela Zanin Della Bianca • ABIANCA@ALUNO.FIOCRUZ.BR
Resumo:
Human sepsis is characterized by an exacerbated immune response to pathogens and remains one of the leading causes of mortality in Intensive Care Unit (ICU). This study investigates whether genetic and molecular factors are associated with sepsis progression in adult patients, contributing to either death or recovery within 28 days of ICU admission. To this end, whole-exome sequencing (WES) data were integrated with shotgun proteomics data obtained from the plasma of septic patients admitted to ICUs. The genetic analysis focused on identifying single nucleotide polymorphisms (SNPs) and other genetic variations in coding regions. The proteomic analysis aimed to characterize circulating plasma proteins, with particular attention to single amino acid variations (SAAVs). Clinical characteristics were assessed within 24 hours of hospital admission, and the outcome (death or recovery) was monitored for up to 28 days after sepsis diagnosis. The study included 96 patients treated at the Hospital de Clínicas of the Federal University of Paraná (UFPR) and the Sugisawa Hospital, comprising 54 women and 42 men, both hospitals are localized in Curitiba, Brazil. The main sites of infection were pulmonary, abdominal and urinary. Of the patients included in the study, 53 progressed to death and 43 to recovery. WES data were aligned to the HG38 reference genome using BWA-MEM and analyzed using the Genome Analysis Toolkit (GATK). The pipeline included quality control with FastQC, conversion of SAM to BAM files, mapping quality assessment with Picard, variant calling with HaplotypeCaller and GenotypeGVCF, variant annotation using ANNOVAR, and variant flagging with the CEGH filter. Overall quality metrics were satisfactory, with an average of 317,803,480 reads per patient, only 0.07% duplicate reads, and a mean coverage of 120.40. An initial proteomic test was conducted on three samples: one from a patient who recovered (SR) and one from a patient who died from sepsis (SD), and one from a healthy control (HC). Each sample was processed with and without depletion of high-abundance proteins, followed by mass spectrometry analysis and identification using the PatternLab for Proteomics software with the SwissProt database. As this was a preliminary test with a small sample set, conclusive results are not yet possible. However, the test showed that the non-depleted samples presented a higher number of identified proteotypic peptides: SR (1330), SD (1443), and HC (1316), compared to their depleted versions: SR (1061), SD (1132), and HC (1253). The SD sample had the highest number of peptides, suggesting a potential link with a more intense inflammatory response. Additionally, a possible impact of long-term freezing on proteomic quality was observed, with fewer peptides identified in the SR sample, which had been frozen for a longer period, compared to the fresh HC sample. The proteins identified exclusively in non-depleted samples included several expected high-abundance targets, such as albumin and immunoglobulins, confirming that the depletion protocol was effective in selectively removing these proteins from plasma. This study was financially supported by (CNPq) and is part of a project funded by the Research Incentive Program of the Carlos Chagas Institute (ICC), Fiocruz.
Palavras-chave: Sepsis; whole-exome sequencing; genetic polymorphisms; human genome; HG38; Burrows-Wheeler Aligner; Genome Analysis Toolkit; Picard; proteomics; mass spectrometry; PatternLab for Proteomics; SwissProt; Spliceprot; shotgun proteomics; single amino acid variations.
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1127070

Microbiome monitoring of São Paulo's main freshwater reservoirs via metagenomics

Autores: Samir Vargas da Fonseca Atum,CASSIUS VINICIUS STEVANI,Douglas Moraes Mendel Soares,RENATO SANCHES FREIRE,João Setubal,Lara Urban
Apresentador: Samir Vargas da Fonseca Atum • samir.atum@usp.br
Resumo:
The São Paulo metropolitan area is home to more than 20 million people. 93% of its water consumption comes from 4 freshwater reservoirs systems: Cantareira, Alto Tietê, Guarapiranga, and Rio Grande. We present an ongoing environmental monitoring study where these 4 reservoirs are sampled twice a year, in the dry and wet season. The first year of the project started with 16S amplification and Illumina sequencing, and later we moved on to whole genome sequencing (WGS) of native DNA using nanopore technology. For the former the Qiime2 pipeline was used for data analysis, but for the latter we developed a bespoke bioinformatics analysis pipeline to track and understand the seasonal differences between reservoirs, assemble metagenomes, screen for taxa important for freshwater management, such as pathogens and cyanobacteria that cause harmful algae blooms, in addition to genes related to water quality and human health, such as cyanotoxins and antimicrobial resistance (AMR) genes, and taste and odor compounds (such as geosmin) as well as recover metagenome-assembled genomes (MAGs).
Currently 5 samplings were performed and sequenced, yielding 8 16S taxonomic profiles generated from Illumina reads, and 12 metagenomes generated with nanopore reads. A total of 40 MAGs were recovered, many of them of possibly new species, alongside several antimicrobial resistance genes and other genes of interest. Furthermore, even though 16S and WGS results are not directly comparable, the study indicates that each reservoir has a particular taxonomic profile that varies throughout the year but remains distinct: the samples from each reservoir cluster together in a principal component analysis graph. We also detected a sharp rise of Cyanobacteria in Guarapiranga in the summer samples, which can be a dangerous indicator that the reservoir is susceptible to algae blooms even with the current water management strategies in place to avoid them.
Palavras-chave: metagenomics, environmental monitoring, AMR resistance, nanopore
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1127189

The Open Science Revolution in Complex Disease Genetics: An Integrated Pipeline from FASTQ to GWAS and Functional Pleiotropy

Autores: Kaira Cristina Peralis Tomaz,Felipe Eduardo Ciamponi,Rafaela Pacheco,Jennifer Stefani Ribeiro Dos Santos,Mariana Feitosa Cavalheiro,Fábio Malta de Sá Patroni,Julio Bernardi,Murilo Meneghetti,Lorena Maria Rudnik,Alexandre Rossi Paschoal,Marcelo Brandão
Apresentador: Kaira Cristina Peralis Tomaz • k192007@dac.unicamp.br
Resumo:
Polygenic diseases, such as Alzheimer's disease (AD) and type 2 diabetes (T2D), present complex challenges in medical genetics due to their non-Mendelian inheritance patterns, which involve multiple alleles and environmental factors. The global incidence of AD is increasing, and diabetes is contributing to a growing healthcare burden. Recently, it has been noted that cognitive dysfunction is a significant comorbidity of diabetes, indicating a potential link between AD and diabetes and suggesting possible genetic connections.
Advances in identifying specific genetic variants and understanding their interactions are paving the way for personalised medicine and thereby enhancing treatment effectiveness.
Bioinformatics analysis of genomic data offers valuable insights into the genetic foundations of Alzheimer's disease (AD) and diabetes, facilitating the development of targeted interventions. The GWAS (Genomewide Association Study) approach is an essential and well-established tool in bioinformatics for analysing genetic associations with phenotypes. However, despite the abundance of publicly available data on the internet, bioinformatics analyses remain a bottleneck in conducting biological studies.
Despite advances in genomics, bioinformatics workflows remain fragmented, limiting translational insights. Here, we present a fully open-source pipeline that streamlines polygenic disease analysis from raw sequencing data (FASTQ) to functional annotation. This pipeline solves important problems in making research repeatable and able to grow by combining tested methods for finding genetic variants, conducting GWAS, analysing pleiotropy, and adding regulatory information.
We also demonstrate how standardised, ethically curated datasets enable robust analyses of shared genetic mechanisms by leveraging the NIH’s Database of Genotypes and Phenotypes (dbGaP), a foundational open- science repository for genotype-phenotype studies. dbGaP’s dual-access model (open metadata vs. controlled individual-level data) allowed us to harmonise diverse cohorts while adhering to ethical guidelines, exemplifying how open data infrastructures can accelerate discoveries in comorbidities like AD-T2D.
Our pipeline's modular design enables researchers to bypass costly data generation phases and focus on hypothesis-driven exploration, democratising access to high-impact genomics. By aligning with open-science principles, this work mirrors transformative initiatives like the Human Genome Project, where shared data and tools spurred global collaboration. The integration of dbGaP datasets highlights the untapped potential of public repositories to fuel large-scale, reproducible studies — particularly in under-resourced settings. This system pushes forward genetic research and emphasises the need for open-source, community-focused science in genomics, encouraging collaboration across different fields and revealing the connections that contribute to complex disease studies.
Palavras-chave: Open Science Revolution, Complex Disease Genetics, GWAS, Functional Pleiotropy, Integrated Pipeline
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1127241

SPONTANEOUS MUTATIONAL SIGNATURES IN XERODERMA PIGMENTOSUM VARIANT CELLS

Autores: Maria Carolina Fernandes dos Santos,Camila Corradi,Carlos Frederico Martins Menck
Apresentador: Maria Carolina Fernandes dos Santos • mcarolinafs@usp.br
Resumo:
DNA is continuously exposed to the influence of endogenous and exogenous factors, leading to damage that interferes with transcription and replication processes. In addition to the repair pathways for DNA damage, cells can also employ mechanisms of lesion tolerance, such as Translesion Synthesis (TLS). During replication, TLS recruits specialized polymerases capable of bypassing bulky lesions that block replication progress, thus preventing replication fork collapse. Deficiencies in the POLH gene, which encodes DNA polymerase η (pol eta), a key TLS polymerase, lead to a rare autosomal recessive syndrome known as the variant form of xeroderma pigmentosum (XP-V). In this study, we performed whole-exome sequencing of XP-V fibroblasts and XP-V fibroblasts complemented with functional polymerase, cultured for six months under conditions free from direct exposure to visible light and ultraviolet radiation. With this approach, we investigated the spontaneous mutagenesis occurring in these cells. The exomes were evaluated with FastQC, mapped to GRCh38 with BWA-MEM, and variant calling was performed according to GATK best practices for variant calling using HaplotypeCaller and Mutect2 with different filtering strategies. We used the SomaticSignatures and MutationalPatterns packages for the analysis of the somatic variant spectrum, and the contribution of mutational signatures. Preliminary results obtained with HaplotypeCaller showed that the XP-V lineage underwent higher mutagenesis than XP-Vcomp, with the highest contribution for XP-V observed in C>T transitions followed by C>A transversions. The results from Mutect2 will allow more significant analyses.
Palavras-chave: Mutagenesis, Translesion Synthesis, Mutational Signatures
#1127377

Prioritization and structural analysis of Pseudomonas aeruginosa therapeutic targets

Autores: Maria Carolina Sisco,André Borges Farias,Miranda Clara Palumbo,Lucas Gabriel Garcia,Teca Calcagno Galvão,Dario Fernandes do Porto,Adrián Gustavo Turjanski,MARISA FABIANA NICOLÁS
Apresentador: Maria Carolina Sisco • carolinasisco@gmail.com
Resumo:
Pseudomonas aeruginosa is a widespread pathogen that causes severe nosocomial infections, especially in cystic fibrosis and severely burned patients. Biofilm formation, resistance-conferring β-lactamases and carbapenemases, and other virulence determinants make its treatment challenging, leading to an adverse patient outcome. The discovery of new drug targets and antimicrobial agents has been identified as a priority by the World Health Organization. Using a bioinformatic approach, we have prioritized 25 highly druggable potential targets. We performed Flux Balance analysis (FBA) using COBRApy and the Genome Scale Metabolic model of P. aeruginosa PA14 iSD1509. Then, we performed in silico single-gene knockouts in LB and Synthetic Cystic Fibrosis Medium (SCFM), both in aerobiosis and anaerobiosis. Those genes whose deletion was associated with a decreased Biomass Objective Function (i.e., growth rate) were pre-selected as evidence of their fitness cost. Then, the prioritized proteins coded by these genes were filtered by off-target criteria: those proteins that represent human homologs and those that have an e-value < 10-5, identity > 40%, and coverage > 30% with species of the human gut microbiome were excluded from the analysis. We identified fourteen potential targets associated with chokepoint reactions, i.e., reactions that exclusively produce or consume a specific metabolite. Graph topological metrics like betweenness centrality and cellular localization analysis were also conducted.
Three of the targets prioritized belong to the NADH:quinone oxido-reductase, respiratory complex I: NuoI, NuoJ, and NuoK, with the latter two localized in the cytoplasmic membrane. All three targets are predicted to be essential for the aerobic growth in SCFM, indicating that perhaps they are essential for the establishment of CF pathogenesis. In this way, we used AlphaFold multimer v3 to obtain the structure of NADH dehydrogenase complex. We compared our results with the homologous structure in Escherichia coli (PDB 7NYR), revealing certain similarities (RMSD of 0.937784 over 904 residues). We also performed a systematic cavity search in the complex using CavityPlus and DoGSiteScorer to identify potential binding sites. We selected 11 consensus cavities (highly druggable) located in the interface of several subunits, including subunit I and J. These results point to novel binding sites that can be explored for drug discovery. We selected five small molecules, based on molecular similarity to substrates FMN and ubiquinone, as probes to perform docking and molecular dynamics simulations in order to evaluate potential binding affinity, molecular stability, and map according to the amino acids and their specific interactions involved in harboring these compounds. We demonstrate that using a bioinformatic and modelling approach, we can discover new drug targets and hence, new therapies, meeting several specificity criteria and significantly reducing the time.
Palavras-chave: Therapeutic target, Drug Discovery, Molecular Modelling, Metabolic network, Virtual Screening.
★ This work is running for the Next Generation Bioinfo Award
#1127632

Construction and Analysis of the Moniliophthora roreri pangenome

Autores: Isabella Gallego Rendón,Diego Mauricio Riaño Pachón
Apresentador: Isabella Gallego Rendón • isagallegor97@gmail.com
Resumo:
Moniliophthora roreri, the causal agent of frosty pod rot, is a devastating fungal pathogen affecting cacao production across Latin America. Its broad host range, ecological adaptability, and high pathogenicity underscore the need to understand its genomic diversity to inform disease management strategies. Here, we present a comprehensive pangenome analysis of 24 publicly available M. roreri genomes using two state-of-the-art graph-based methods: Minigraph-Cactus and PGGB. Graph-based approaches allow us to integrate structural variation and genome-wide sequence diversity into a unified representation. The resulting pangenomes were used to classify genes into core, accessory, and strain-specific categories, revealing genomic features likely associated with adaptation and pathogenicity. Functional annotation is performed with HMMER and PANNZER2, and enriched Gene Ontology terms are identified for each gene category using the topGO and REVIGO tools, offering insight into biological processes specific to different parts of the genome. The study includes a comparative analysis between the graph-based pangenomes and a previously constructed orthology-based version. This evaluation uses metrics such as genome completeness, representation of structural variants, core/accessory gene content, and computational performance. Our findings demonstrate the value of graph-based methods in capturing the genomic complexity of fungal pathogens and provide a foundation for future research into the molecular basis of virulence and host adaptation in M. roreri.
Palavras-chave: Pangenome, Frosty Pod Rot, Moniliophthora roreri, Cacao
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1127786

DRUG REPOSITIONING USING DOCKING MOLECULAR TARGETING PNMA3 AND RASSF2 PROTEINS ASSOCIATED WITH CHEMOTHERAPY RESISTANCE IN CERVICAL CANCER

Autores: Larissa Paula De Carvalho Ferreira,Fábio Ribeiro Queiroz,Angelo Neto,Bruna Custódio Dias Duarte,Carolina Pereira de Souza Melo,Nicolau Junior,Matheus Gomes,Pedro Luiz Lima Bertarini,Paulo Guilherme de Oliveira Salles,Leticia da Conceição Braga,Laurence Rodrigues do Amaral
Apresentador: Larissa Paula De Carvalho Ferreira • larissa_carvalho1309@hotmail.com
Resumo:
Cervical cancer remains one of the leading causes of female mortality worldwide, with persistent infection by high-risk HPV types recognized as a major etiological factor. However, molecular alterations in apoptotic pathways and resistance mechanisms to therapy also play pivotal roles in disease progression. Among these, PNMA3 and RASSF2 genes have been identified as potential biomarkers of chemoradiotherapy resistance in cervical cancer patients, suggesting that the functional interaction between their proteins may contribute to therapeutic failure. Based on this hypothesis, molecular docking studies were conducted using the MOE software to identify small-molecule drugs capable of modulating the PNMA3-RASSF2 protein-protein interaction. Among the compounds evaluated, Risdiplam (CHEMBL4297528) and Avanafil (CHEMBL1963681) emerged as the most promising candidates. Risdiplam exhibited the strongest binding affinity for RASSF2 (-9.29 kcal/mol), forming key interactions with residues such as Gly61, Val268, and Met260. In comparison, Avanafil demonstrated the highest binding affinity for PNMA3 (-10.65 kcal/mol), interacting notably with His411, Asn448, and Ala443. These findings suggest that both compounds may effectively interfere with the functional association between PNMA3 and RASSF2. Protocol validation was performed through a redocking assay with the HIV-1 protease cryst al structure (PDB ID: 1HVR) complexed with Amprenavir, achieving an RMSD of 1.5962 Å and confirming methodological reliability. Functional domains for PNMA3 and RASSF2 were defined based on UniProt annotations and Site Finder analysis, guiding the docking studies. Initial molecular dynamics simulations corroborated the docking results, demonstrating conformational stability of the protein-drug complexes, with RMSD values stabilizing below 2.5 Å during the 100 ps simulation. These findings reinforce the potential of Risdiplam and Avanafil as candidates for drug repurposing strategies aimed at overcoming chemoradiotherapy resistance in cervical cancer. Future steps will focus on the experimental validation of these promising interactions.
Palavras-chave: Cervical cancer, Chemoresistance ,PNMA3 ,RASSF2 , Drug repurposing
#1127976

A prognostic model based on a six-gene signature related to Neutrophil Extracellular Traps (NETs) in cervical cancer using machine learning

Autores: Mario Jefferson Farrapo Sales,Ísis Salviano Soares de Amorim,Robson de Queiroz Monteiro
Apresentador: Mario Jefferson Farrapo Sales • mariofarrapo2016@gmail.com
Resumo:
Cervical cancer (CC) is the third most common type of cancer among women in Brazil. Although various prognostic biomarkers are currently used in clinical practice to predict disease progression and treatment response in patients, there is a need for more accurate biomarkers to improve the diagnostic and therapeutic management of CC. Growing evidence demonstrates that Neutrophil Extracellular Traps (NETs) play a crucial role in cancer progression, mediating key processes such as cell survival, metastasis, immune evasion, and therapy resistance.
Given this, the present study aims to develop a NET-related gene signature capable of predicting the prognosis of patients with CC, using machine learning methods and bioinformatics analyses.
Initially, we identified 231 NET-related genes from five independent studies. Transcriptomic and clinical data from CC patients available in The Cancer Genome Atlas (TCGA) were downloaded and used to form the training dataset. We performed an initial screening of the genes through differential expression analysis to identify differentially expressed genes (DEGs) between tumor and normal tissues, resulting in a total of 58 DEGs.
Next, the LASSO Cox regression method was employed to identify genes associated with prognosis and to avoid overfitting effect, using 10-fold cross-validation. Ultimately, a six-gene NET signature was constructed, and we developed a classification model based on the calculation of patients’ risk scores. The optimal cutoff point was determined based on the Youden index to maximize the model's sensitivity and specificity, stratifying patients into High-Risk and Low-Risk groups.
Kaplan-Meier curves and univariate and multivariate Cox regression analyses revealed that the high-risk group is independently associated with worse prognosis (p < 0.0001). The model’s performance was evaluated using ROC curves and confusion matrices, demonstrating high sensitivity and specificity, as well as an accuracy > 80% for 1-, 3-, and 5-year survival predictions.
These results demonstrate that the NET-related gene signature developed is associated with a worse prognosis in CC and can predict patient outcome with high accuracy. These findings not only validate the crucial role of NETs in tumor progression but also open new frontiers for personalized medicine and pave the way for novel therapeutic strategies in cervical cancer.
Palavras-chave: Prognostic model, gene signature, NETs, cervical cancer
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1128158

Comparative genomics and in silico bioprospection of secondary metabolites from public available Nostoc genomes

Autores: Victor Emmanuel Cantanheide Barbosa Dos Santos,Elias Pereira Silva Neto,Anna Clara Vieira Pinto,Flávia Maria Dias Moraes,Thallyson Gabriel Martins Correia Fontenele,Luan Felipe Lindoso Pires,Hillari Fernanda Gracetto,Leonardo Teixeira DallAgnol
Apresentador: Victor Emmanuel Cantanheide Barbosa Dos Santos • victor15cant@gmail.com
Resumo:
Nostoc is a genus of heterocystous filamentous cyanobacteria found across all continents in a wide range of ecosystems. This genus is recognized as one of the most promising groups for the production of bioactive compounds due to its environmental and biotechnological applications. Thus, this study aimed to conduct an in silico prospection of biosynthetic gene clusters (BGCs) in four genomes deposited by the South China Normal University (GCA_014698505.1, GCF_014698475.1, GCF_014696625.1, and GCA_014698835.1). For this purpose, the National Center for Biotechnology Information (NCBI) GenBank was used to retrieve the sequences, which correspond to the following species, respectively: Nostoc foliaceum FACHB-393, Nostoc spongiaeforme FACHB-130, Nostoc parmelioides FACHB-3921, and Nostoc paludosum FACHB-159. After obtaining the genomic sequences, they were analyzed using antiSMASH Bacterial Version 8.0 (beta) to identify BGCs with low similarity to the available database. Additionally, they were annotated using Rapid Annotation using Subsystem Technology (RAST) to classify gene products into metabolic pathways and cellular processes. Subsequently, all BGCs exhibiting low similarity were listed and compared with the Minimum Information about a Biosynthetic Gene cluster (MIBiG) database. The diversity of secondary metabolite biosynthetic gene clusters (BGCs) identified so far in the genomes of the four Nostoc strains suggests broad potential for the production of bioactive compounds with multiple ecological functions and biotechnological applications. Clusters related to the production of lanthipeptides, ribosomally synthesized and post-translationally modified peptides, often associated with antimicrobial activity were detected. These compounds may play a role in defense against competing microorganisms and in regulating associated microbiota. Furthermore, NRPS-type or similar clusters (including NRPS, NRPS-like, and NRPS + T1PKS) were identified, involved in the synthesis of complex peptides with potential antibiotic, cytotoxic, or siderophore activity. Additionally, hglE-KS + T1PKS clusters were found, which are related to the biosynthesis of heterocyst-specific glycolipids specialized cells crucial for nitrogen fixation, essential for the cyanobacterium's adaptation to nitrogen-limited environments. Terpene and terpene precursor clusters were also detected. Terpenes are structurally diverse compounds with functions that may include UV protection, antimicrobial activity, and roles in ecological interactions. Therefore, in silico analyses indicate that the selected Nostoc strains possess BGCs encoding various types of secondary metabolites, suggesting biotechnological potential. However, further robust analyses are required to confirm the viability of using these cyanobacteria for biotechnological purposes.
Palavras-chave: genome mining, Nostoc, secondary metabolite
★ Running for the Qiagen Digital Insights Excellence Awards
#1128244

Allosteric Targeting of Trypanosoma cruzi Trypanothione Reductase: A Computational Screening Guided by Structural and Electronic Characterization

Autores: Guilherme Ian Spelta,Mariana Simões Ferreira,Pedro Geraldo Pascutti
Apresentador: Guilherme Ian Spelta • speltagi@biof.ufrj.br
Resumo:
Chagas disease (CD) is a highly debilitating and potentially lethal protozoan infection that affects at least 1.9 million in Brazil and around 8 million people across the American continent. The currently available pharmacological therapy for this condition presents significant limitations, such as low efficacy in the advanced stages of the disease and a high prevalence of toxic effects. In light of this, enzymes involved in the oxidative stress control pathway mediated by trypanothione (Try), such as trypanothione reductase (TryR), have emerged as promising molecular targets, mainly due to their exclusive presence in trypanosomatids and their essential role in the parasite’s survival and infection success. This study aims to identify small-molecule binders of T. cruzi TR that prioritizes regions that show mechanical or electronic relation to its catalytic site through computational tools. Molecular Dynamics (MD) simulations of TryR were performed for 100 ns in three replicas using the NAMD3 program. Try charge parameters compatible with the CHARMM force field were derived from DFT-level single point energy calculations using ORCA software and the FFToolkit module of VMD 1.9.4. Conformations obtained along the MD trajectories were clustered through hierarchical conformational cluster analysis using the AmberTools23 software package. This approach yielded representative structures corresponding to the most probable conformational states observed in the MD trajectories. These structures were used for cavity identification through a geometric algorithm with the aid of the Cavity module from the CavityPlus server. The mechanical relationship between the catalytic cavity and the cavities identified in this step was assessed through motion correlation analysis using Normal Mode Analysis (NMA) via the CorrSite2 module of the same server and through comparison of the cavities volume variation throughout the MD trajectories, using the POVME2 program. Additionally, molecular orbital calculations were performed on the representative structures using the semiempirical PM7 method via the MOPAC program. The molecular orbital data were analyzed using the PRIMORDIA software to calculate Fukui reactivity descriptors, with the structures exhibiting the lowest reactivity over the catalytic residues selected as priority candidates for the docking studies. Following the established methodology, TryR explored three major conformational populations in its oxidized state and only one in its reduced state during MD simulations. Representative structures from these states enabled the identification of 14 cavities not observable in the enzyme’s crystal structure. Among the 35 total cavities identified, 8 showed potential allosteric behavior (z-score > 0.5) and maintained a minimum volume above 200 ų during simulations. Notably, one of these potential allosteric cavities displayed a volume anti-correlation with a catalytic cavity throughout the MD trajectories. Finally, a virtual screening will be carried out by docking small molecules from the pre-curated Hit Locator subset from ENAMINE database into the most promising cavities, using the AutoDock Vina 4 program. The outcomes of this study are expected to contribute to the development of novel, safer, and more selective therapeutic strategies against Chagas disease by identifying promising inhibitors and revealing previously unexplored allosteric cavities in T. cruzi trypanothione reductase, a parasite-specific target.
Palavras-chave: Chagas Disease,Structure Biology,Protein Dynamics,Molecular Modelling,Allostery
★ Running for the Qiagen Digital Insights Excellence Awards
#1128254

IRACEMA: A Database Management System for Bioactive Compounds Isolated and Characterized by Brazilian Researchers

Autores: Thais De Andrade Lourenço,Gustavo Henrique Goulart Trossini,Marcus Tullius Scotti
Apresentador: Thais De Andrade Lourenço • thais.andrade.lou@gmail.com
Resumo:
Bioactive compounds are substances with extra-nutritional properties found in natural sources or synthesized in laboratories. These substances play regulatory roles in metabolic processes, essential for human health. Over time, bioactive compound discovery has become strategically important for scientific advancement, particularly in the pharmaceutical and healthcare industries. Brazil, known for its rich biodiversity and strong expertise in organic chemistry, holds great potential for discovering bioactive molecules, both natural and synthetic. However, the exponential growth of biological and chemical data poses significant challenges in terms of integration, accessibility, and analysis. In this context, bioinformatics plays a pivotal role by supporting the organization, annotation, and computational exploration of biological data to predict bioactivity and elucidate mechanisms of action. The IRACEMA project ("Innovative Research, Analysis and Computational Exploration of Molecules Assembled in Brazil") was created to address these challenges. It aims to build the first national database dedicated to biologically active compounds identified in Brazil, integrating bioinformatics and cheminformatics approaches to accelerate translational research and foster collaboration between academia and the pharmaceutical sector. To achieve this, IRACEMA combines bibliographic review, data collection, predictive modeling, and the development of an interactive web platform for molecular visualization and analysis. The project leverages React and Next.js for the frontend, NestJS and Node.js for backend development, PostgreSQL for data management, and Python/Flask with RDKit and other microservices for bioinformatics and cheminformatics applications. Agile methodologies, including tools like Jira and GitHub, support efficient project management and version control. Key challenges include standardization of biological descriptors, integration of heterogeneous datasets, and the representation of quantitative structure–activity relationships (QSAR) in a biologically meaningful way. By addressing these challenges, IRACEMA not only strengthens Brazil’s role in bioactive compound research but also drives innovation in medicinal chemistry by making this data accessible, bridging the gap between academic discoveries and real-world applications.
Palavras-chave: Bioactive Compounds, Chemoinformatics, Bioinformatics, Database, Drug Discovery, Biological Activity Prediction, Data Integration
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1128282

Search for Specific Targets of Streptococcus agalactiae for the Development of a Diagnostic Method for Neonatal Sepsis Based on Epitope Prediction

Autores: Ana Carolini Natividade Peçanha,Ana Carolina Rennó Sodero,DANIEL ERNESTO RODRIGUEZ FERNANDEZ,HELENA KEIKO TOMA
Apresentador: Ana Carolini Natividade Peçanha • carolinipecanha@gmail.com
Resumo:
Search for Specific Targets of Streptococcus agalactiae for the Development of a Diagnostic Method for Neonatal Sepsis Based on Epitope Prediction Neonatal sepsis remains one of the leading causes of morbidity and mortality among newborns, with Streptococcus agalactiae as a primary etiological agent. This condition represents a significant global health challenge, where early diagnosis is crucial for reducing mortality rates and improving patient outcomes. This study aimed to identify specific protein targets in S. agalactiae for the development of an epitope-based diagnostic method. To achieve this, genomic analysis of the GBS85147 strain was performed, revealing three unique proteins with potential diagnostic applications: Type VII secretion protein EssB, IS30 family transposase, and Serine-rich repeat glycoprotein Srr2. These proteins were selected based on their specificity for S. agalactiae, ensuring minimal cross-reactivity with other bacterial species. Subsequently, structural modeling and computational epitope predictions were carried out using multiple bioinformatics servers, incorporating various methodologies to predict both linear and conformational epitopes. A consensus analysis of predicted epitopes identified promising antigenic regions tailored for diagnostic use. Type VII secretion protein EssB, for instance, plays a critical role in bacterial virulence, making it an ideal candidate for diagnostic applications. Similarly, IS30 family transposase and Serine-rich repeat glycoprotein Srr2 demonstrated strong antigenicity, reinforcing their potential as biomarkers. Preliminary findings highlight the significant promise of these proteins for diagnosing S. agalactiae. This study underscores the importance of integrating bioinformatics tools into the development of diagnostic methods. By leveraging genomic data and epitope predictions, this approach paves the way for faster, more specific, and accessible diagnostic solutions. Future experimental validation of these proteins and their predicted epitopes is essential to confirm their applicability in clinical settings. The adoption of such innovative diagnostic strategies could drastically reduce neonatal sepsis incidence and optimize treatment outcomes, ultimately contributing to improved public health.
Palavras-chave: Epitope prediction, Bioinformatics, Neonatal sepsis
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1128324

DBAMPRECORD: A WEB-BASED DATABASE PLATFORM FOR STORAGE AND MANAGEMENT OF ANTIMICROBIAL PEPTIDES SEQUENCES AND ANNOTATED PHYSICOCHEMICAL METADATA

Autores: Rafael Lucas da Silva,Carlos André dos Santos Silva,Madson Allan de Luna Aragão,Ana Maria Benko-Iseppon
Apresentador: Rafael Lucas da Silva • rafael.lucass@ufpe.br
Resumo:
As omics technologies continue to expand, generating vast quantities of genomic and proteomic data, the need for specialized tools to store, manage, and query these sequences becomes increasingly critical. Proteins in the defense systems of living organisms form a complex arsenal of signaling molecules involved in both attack and defense mechanisms against microorganisms. Among these protein components are the antimicrobial peptides (AMPs), a class of short proteins with a wide range of bioactive functions. Increasingly recognized in biomedical research, AMPs hold great promise as novel therapeutic agents due to their potent antimicrobial properties, especially in the context of rising antimicrobial resistance and the limited development of new antimicrobial drugs. However, the absence of specialized and comprehensive databases presents significant challenges for the management and integration of both natural and synthetic AMP sequences within large-scale datasets, ultimately hindering advances in bioinformatics research and therapeutic development. In response to these challenges, we introduce DBAMPRecord, a novel web-based platform designed to efficiently store, query and manage AMP sequences, three-dimensional structures, physicochemical metadata and experimental data from wet-lab screening and optimization assays. The platform offers fast access and an intuitive user interface tailored to both bioinformatician scientists and researchers from non-computational backgrounds. The DBAMPRecord was built on PostgreSQL, a robust relational database management system, ensuring scalability, data integrity and rapid querying capabilities. The platform was structured on a relational database implemented with supporting the storage of AMP sequences, physicochemical descriptors, structural annotations and experimental metadata. The front-end interface was developed using the Flask web framework integrated with Bootstrap, providing a modular and responsive architecture. The user interface enables structured data entry, sequence visualization and multi-parameter filtering. Data ingestion utilizes the FASTA format, allowing standardized parsing, insertion and retrieval of sequences. The platform offers essential functionalities including user registration, advanced search options, sequence visualization and analytical tools that facilitate researchers' workflow in AMPs analysis. Users can perform detailed searches based on multiple criteria such as sequence and antimicrobial signatures, significantly enhancing research efficiency and effectiveness. By providing structured access to AMP data, DBAMPRecord serves as a potent resource for bioinformaticians, biologists and researchers, contributing to the streamlined discovery and characterization of novel antimicrobial agents. Future developments will expand analytical capabilities, incorporate predictive modeling tools and continuously update the database with curated AMPs data, reinforcing its role as a key resource for antimicrobial research and development.
Palavras-chave: Data Integration; User Experience; Bioinformatics; Data Management; AMPs
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1128333

A COMPARATIVE MACHINE LEARNING FRAMEWORK FOR MULTICLASS BACTERIOCIN IDENTIFICATION USING CLASSICAL MODELS AND DEEP NEURAL NETWORKS

Autores: Rafael Lucas da Silva,Madson Allan de Luna Aragão,Ana Maria Benko-Iseppon
Apresentador: Rafael Lucas da Silva • rafael.lucass@ufpe.br
Resumo:
Bacteriocins are small antimicrobial peptides (AMPs) produced by bacteria that exhibit both bacteriostatic and bactericidal activities through a variety of structures and mechanisms. They are classified into three main groups: Class I bacteriocins contain lanthionine bridges that lock them into ringed structures; Class II bacteriocins are unmodified peptides of roughly 25-80 amino acids, many bearing a conserved “YGNGV” motif, and often function as two-peptide systems to form membrane pores; and Class III bacteriocins are larger proteins (200-600 amino acids) that kill target cells via diverse enzymatic or pore-forming mechanisms. Their potent, often wide-spectrum activity against closely related strains has positioned bacteriocins as promising alternatives to conventional antibiotics in the face of rising antimicrobial resistance. In this work, we developed and benchmarked four multiclass classifiers, based on Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Random Forest (RF) and a deep neural network (DNN), to discriminate between class 1, class 2, class 3 bacteriocins and non-AMP sequences. The SVM, KNN and RF models were implemented in scikit-learn, while the DNN was built with TensorFlow, allowing direct performance comparison on a unified feature set. The dataset comprised class 1 sequences (n=499), class 2 sequences (n=229) and class 3 sequences (n=93) retrieved in FASTA format from the BAGEL4 database, plus a negative set of 1.000 randomly generated peptides sequences (length 15-150 amino acids) created by a custom Python script and confirmed by existing predictors to lack predicted antimicrobial activity. Using the peptides.py library, we extracted a panel of physicochemical descriptors, including molecular weight, isoelectric point, sequence length, Boman index, aliphatic index, hydrophobicity, hydrophobic moment and net charge, for every sequence. After stratified splitting of the dataset into training (80%) and test (20%) subsets and normalization with StandardScaler, each algorithm was trained on identical feature sets for direct comparison. Performance metrics included precision, recall, F1-score, Matthews Correlation Coefficient (MCC), multiclass ROC analysis and permutation-based feature importance. The SVM achieved a F1 of 0.84 (MC =0.70; AUC>0.85); KNN reached a F1 of 0.83 (MCC=0.71); RF yielded a F1 of 0.84 (MCC=0.73), with molecular weight and isoelectric point as its top predictors, while the DNN outperformed all classical methods with a F1 of 0.91 (MCC=0.86), demonstrating superior capacity to capture complex nonlinear relationships. Confusion-matrix analysis for the DNN revealed sensitivities of 0.98 for negatives, 0.95 for class 1, 0.94 for class 2 and 0.98 for class 3, paired with specificities above 0.96 and misclassification rates below 5%, confined almost exclusively to adjacent classes. These results indicate that the DNN maintains balanced sensitivity and specificity while minimizing false positives and negatives, when compared with the classical models. Our findings suggest that deep learning offers accuracy and robustness for multiclass bacteriocin identification and classification, providing an effective in silico pre-screening tool. Future studies employing complementary computational approaches and experimental validation will further accelerate the discovery of novel bacteriocin-inspired antibiotic candidates.
Palavras-chave: Antimicrobial peptides; Bacteriostatic peptides; Bactericidal peptides; Deep neural networks; Physicochemical descriptor analysis; AMP in silico screening
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1128486

Structural and Dynamic Effects of IDUA Mutations in Mucopolysaccharidosis Type I: A Molecular Dynamics Study

Autores: Yago Mendes Paes,Carlos Robson Costa Cruz,Veridiana Piva Richter,Guilherme Baldo,Adriano Werhli,Karina Machado
Apresentador: Veridiana Piva Richter • veripiva@gmail.com
Resumo:
Mutations in the IDUA (alpha-L-iduronidase) gene, which encodes an essential enzyme for the degradation of the glycosaminoglycans heparan sulfate and dermatan sulfate, can lead to the development of mucopolysaccharidosis type I (MPS I). This is a rare autosomal recessive disorder characterized by the progressive accumulation of these macromolecules in lysosomes, severely compromising multiple cellular and physiological functions. MPS I has a progressive course and can result in severe cognitive impairment, in addition to leading to premature death due to systemic complications, including cardiac, respiratory, and musculoskeletal dysfunctions. A recent unpublished study conducted at the Porto Alegre Clinical Hospital with Brazilian MPS I patients, identified 15 novel mutations in the IDUA gene, spanning different types of genetic alterations, including splicing, missense, nonsense, and frameshift mutations. These mutations can directly impact the structure and function of alpha-L-iduronidase by altering its stability, catalytic capacity, and substrate interaction. Consequently, enzymatic activity in these MPS I patients is either significantly reduced or completely lost, driving the progressive pathological effects of the disease. In this study, four protein variants with missense mutations were modeled using the AlphaFold 3 structural modeling tool. From the best-obtained models, molecular dynamics (MD) simulations were performed for 200 ns using the GROMACS MD package, comparing the structure of the wild-type protein with the four mutated proteins, all bound to a glycan. These simulations aimed to assess the potential impacts of these mutations on the structural stability of the enzyme, identifying conformational changes that could compromise its biological function. Preliminary results revealed significant variations in the root mean square deviation (RMSD) of the protein system, with greater fluctuations observed in two mutations (missense 7 and missense 12), particularly in a residue region located within 5 angstroms of the mutation site. Additionally, differences in the number of hydrogen bonds established in each system were observed, suggesting alterations in stability and intermolecular interactions. Analyses of the root mean square fluctuation (RMSF) highlighted changes in the flexibility of specific residues, indicating potential impacts on protein dynamics. These findings provide valuable structural insights into the effects of these mutations on alpha-L-iduronidase, contributing to a better understanding of the molecular mechanisms of MPS I. Furthermore, this study may support the development of therapeutic strategies to mitigate the structural and functional consequences of these mutations.
Palavras-chave: Enzyme dysfunction, Lysosomal storage disease, Protein modeling, Structural bioinformatics, Pathogenic variants
#1128538

Single-cell metatranscriptogenomics: investigation and characterization of viral infections in human tonsillar tissues

Autores: Noilson Oliveira,Helder Takashi Imoto Nakaya,Eurico Arruda
Apresentador: Noilson Oliveira • noilson@usp.br
Resumo:
The combination of scRNA-seq with metatranscriptogenomics enables the investigation of viral infections at the single-cell level by integrating the analysis of host cell mRNA with the detection of viral mRNAs, which may also include full genomes of positive-sense RNA viruses, often captured during sequencing. This study analyzed palatine tonsils from 17 patients who underwent tonsillectomy, with scRNA-seq data obtained from the NCBI SRA repository, generating around 20 TB of intermediate data throughout processing. 51,789,332 cells were processed, of which 1,251,266 were on quality control criteria and were used in downstream analyses. After quality filtering with FASTP, reads were aligned to the human reference genome (GRCh38/hg38) using BWA, and unmapped sequences were extracted with SAMtools and taxonomically classified using Kraken2. Viral classification was validated with BLASTN, and viral read-associated cell barcodes were retrieved from R1 files using Python-developed scripts using the library Biopython. A total of 15,142 viral reads were identified across 2,711 infected cells. Gene expression analysis was conducted using Scanpy, with cell type annotation performed via CellTypist and pathway enrichment analysis using GSEAPY. Infected cells were identified from viral genera including Erythroparvovirus, Cardiovirus, Enterovirus, Parechovirus, Simplexvirus, Lymphocryptovirus, Alphapapillomavirus, and Mastadenovirus, affecting various cell populations such as CD4+ T cells, CD8+ T cells, Naive/Memory B cells, Germinal Center B cells, Plasma cells, Epithelial cells, Plasmacytoid Dendritic cells, Follicular Dendritic cells, and Myeloid cells. Differential expression analysis between infected and non-infected cells revealed biological pathways and processes modulated by viral infection. This highlights the potential of integrated metatranscriptogenomics and scRNA-seq approaches to further our knowledge about immune responses and cellular heterogeneity in human tonsil tissues.
Palavras-chave: scRNA-seq, metatranscriptogenomics, viral infection, differential expression
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1128586

Exploring a Natural Product Chemical Space via Molecular Fingerprinting: A Chemoinformatic Strategy for Melanoma

Autores: Cassiana Maurer De Carli,Geisa Paulino Caprini Evaristo,Giselle Pinto de Faria Lopes,Ana Carolina Ramos Guimarães
Apresentador: Cassiana Maurer De Carli • cassiana.dcarli@gmail.com
Resumo:
Cheminformatics is a powerful approach in drug discovery, offering computational tools to navigate the vast chemical space of natural products and effectively guide the search for new leads. Natural products, due to their structural diversity and biological richness, are valuable sources of anticancer compounds. In melanoma, an aggressive cancer with a high mutation rate and multidrug resistance, new therapeutic strategies are urgently needed. Elatol, a marine sesquiterpene, has shown cytotoxic activity in melanoma models, motivating the search for structurally related bioactive compounds through chemoinformatics. Molecular fingerprints are binary representations of chemical structures that capture the presence or patterns of specific structural features. As they can identify compounds that share key chemical and spatial features with known bioactives, a fingerprint-based approach is an alternative computational strategy that enables the screening of natural products analogs across large chemical databases. Given that chemical similarity often correlates with biological activity, fingerprints can efficiently uncover bioactive analogs for further development. Twelve types of binary molecular fingerprints were used to assess structural similarity with elatol via the Tanimoto coefficient. Compounds were retrieved from five curated databases: Nuclei of Bioassays, Ecophysiology and Biosynthesis of Natural Products Database (NuBBEDB), Comprehensive Marine Natural Products Database (CMNPD), Collection of Open Natural products (COCONUT), ChEMBL and PubChem. Principal Component Analysis (PCA) was applied to evaluate the distribution and clustering of candidate compounds based on key pharmacophoric and structural features. Among the retrieved compounds, 42,220,554 compounds demonstrated high similarity (Tanimoto coefficient ≥ 0.8) with elatol. The PCA showed that pharmacophoric fingerprints were particularly effective at grouping compounds based on relevant chemical groups and spatial interactions.The results provide a ranked list of elatol-like candidates for further investigation. Selected candidates are being prioritized for molecular docking and dynamics simulations to evaluate their potential affinity with melanoma-related targets.
Palavras-chave: natural products, melanoma, molecular fingerprints, elatol
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1128593

Genomic mining of secondary metabolites with biotechnological potential through reference genomes of cyanobacteria of the genus Dolichospermum

Autores: Elias Pereira Silva Neto,Mabele Cristina Silva Torres,Victor Emmanuel Cantanheide Barbosa Dos Santos,Marianna de Jesus Fernandes Pereira,Adriele Vieira de Oliveira,Gabriel José Viana Rosa,Sarah Allexia Gama Bastos,Mariana do Nascimento Moraes Rego,Leonardo Teixeira DallAgnol
Apresentador: Elias Pereira Silva Neto • eliaspereirasilvaneto@gmail.com
Resumo:
Cyanobacteria have biotechnological potential in different areas, and with the increasing availability of omics data and tools for in silico analysis, it is possible to find targets for future experimental studies. One of the genera used for this is Dolichospermum, which is explored to solve issues ranging from space exploration to developing new agricultural strategies. In this sense, this work aimed to perform in silico prospecting of biosynthetic pathway gene clusters (BCG's) in five reference genomes of the genus: GCA_047275885.1, GCA_014696825.1, GCA_014698305.1, GCA_014698735.1 and GCA_014697105.1, available in the GenBank of the National Center for Biotechnology Information (NCBI) and which correspond respectively to the following species: Dolichospermum azotica SJ-1, Dolichospermum sphaerica FACHB-251 (1), Dolichospermum lutea FACHB-196, Dolichospermum catenula FACHB-362 and Dolichospermum subtropica FACHB-260 (2). After genome collection, they were annotated using Rapid Annotation using Subsystem Technology (RAST) to classify gene products into metabolic pathways and cellular processes. They were later added to antiSMASH bacterial version 8.0 (beta) in relaxed mode to identify low-similarity BCGs, which were listed and observed in the Minimum Information about a Biosynthetic Gene cluster (MIBiG) database. Mining resulted in BCGs of the categories NRPS, NRP-metallophore, T1PKS, lanthipeptide class II, lanthipeptide class V, and hglE-KS that were originally described in other cyanobacteria such as Rivularia, Nostoc, Scytonema, Microcystis, Fischerella, Synechocystis, and Cylindrospermum. In total, 12 low-similarity clusters were prospected. Among them, we can highlight the “micropeptin K139” (1) from region 45.1, originally isolated from the cyanobacterium Microcystis aeruginosa K-139 and still little studied. However, there have been studies that explored the anti-neuroinflammatory potential of this category of metabolite. Another interesting BCG is responsible for the biosynthetic pathway of hapalosin (2) in regions 15.1 and 30.1, which is important for the chemical defense of cyanobacteria in their natural environment and can be applied for the activity of reversing resistance to multiple drugs. Therefore, it is clear that in silico prospecting of possible BCGs can be an essential tool for biotechnological solutions to everyday problems. However, it should be used with caution, since low similarity with the databases requires that more experimental studies be carried out both in Dolichospermum and in other genera of cyanobacteria, so that the reliability of genomic mining in these groups is ensured.
Palavras-chave: genomic mining, Dolichospermum, secondary metabolites
#1128693

Insights into miRNAs of the stingless bee Melipona quadrifasciata

Autores: Dalliane Oliveira Soares,Lucas Yago Melo Ferreira,Gabriel Victor Pina Rodrigues,João Pedro Nunes Santos,Lucas Barbosa,Ícaro Santos Lopes,Tatyana Chagas Moura,Isaque João da Silva de Faria,Roenick Proveti Olmo,Marco Antonio Costa,Weyder Cristiano Santana,Eric Roberto Guimarães Rocha Aguiar
Apresentador: Dalliane Oliveira Soares • dalli.biotec@gmail.com
Resumo:
MicroRNAs (miRNAs) are key post-transcriptional regulators involved in a wide range of biological processes in insects; however, little is known about their roles in stingless bees. In this study, we present the first characterization of miRNAs in Melipona quadrifasciata using small RNA deep sequencing (sRNAs). A total of 193 high-confidence mature miRNAs were identified, including 106 sequences exclusive to M. quadrifasciata. Expression analysis revealed that mqu-miR-1 and mqu-miR-276 together account for over 70% of all miRNA reads, suggesting central roles in development and reproduction. Comparative analyses showed greater conservation of M. quadrifasciata miRNAs with other hymenopterans, especially Apis mellifera and Bombus species. The methodology involved sample collection, library preparation, and small RNA deep sequencing, followed by data preprocessing. miRNA populations were characterized through genomic analysis using bioinformatics tools such as miRDeep2 for miRNA prediction and quantification, and BEDtools for genomic data manipulation and comparison. miRNA annotation was performed based on the miRBase v22.0 database. Conservation analysis was carried out by comparing miRNAs with those from other hymenopteran species. In addition, putative target genes were predicted using a consensus approach, and functional annotation indicated involvement in several regulatory biological pathways. This study represents the first comprehensive identification of the miRNA repertoire in stingless bees using sRNA sequencing and provides a valuable foundation for understanding miRNA-mediated gene regulation in this ecologically and economically important pollinator.
Palavras-chave: microRNAs, RNA deep sequencing, Melipona quadrifasciata, stingless bee, Meliponini.
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1128699

Genomic Insights of a Methicillin-Resistant Mammaliicoccus sciuri ST52 Strain Isolated from Canine Otitis in Peru: Evidence of a mecA1 Gene and Intrinsic Resistance

Autores: Gerald Moreno-Morales,Miguel Alcántara-Cueto,Thalia Silvestre-Espejo,Jhonny Calla-Rosa,Claudia Maguiña-Molina,Jhonathan Bazalar-Gonzales,Dennis Carhuaricra-Huaman,Luis Luna-Espinoza,Raquel Enma Hurtado Castillo,Lenin Maturrano-Hernández
Apresentador: Raquel Enma Hurtado Castillo • raquelgen1@gmail.com
Resumo:
The opportunistic pathogen of animals and humans, Mammaliicoccus sciuri (formerly Staphylococcus sciuri), is recognized for its potential as a reservoir of antimicrobial resistance genes, including mecA1 and mecC. We report for the first time M. sciuri associated with otitis in a four-year-old male dog from the Lima region, Peru, in 2024. The initial microbiological diagnosis was Staphylococcus sp., and the isolate was referred to our research laboratory for more detailed analysis. Using the Kirby-Bauer method, we detected resistance to oxacillin (1 µg) and penicillin (10 U), intermediate resistance to clindamycin (2 µg), and susceptibility to erythromycin (15 µg), gentamicin (10 µg), and enrofloxacin (5 µg). The minimum inhibitory concentration (MIC) assay for oxacillin determined a value of 1 µg/mL. Additionally, the mecA gene was not detected by PCR, likely due to primer non-specificity when the target is not S. aureus. Long-read sequencing (Oxford Nanopore) was performed on the isolate. Genomic analysis using PubMLST tools revealed the species M. sciuri ST52 (strain R70_GMM), with a genome size of 2,805,519 bp and 32.6% GC content. Using ResFinder v4.7.2 and AMRFinderPlus v4.0.19, we identified the following genes: mecA1 (99.95% identity), blaZ, fosD, sal(A), arsB, arnC, and arsR, each showing at least 84% identity. A homologous region to orfX (81.5% identity) was detected 2,344 bp upstream of mecA1; however, no regulatory genes (mecR, mecI) or mobile element components (SCCmec) were found when exploring the 2,500 bp regions flanking mecA1. Using Abricate v1.0.1, homologs of virulence genes were identified, mainly stress response and capsular genes: htpB, cap8P, isdE, clpC, clpP, and cap8O, with at least 65% identity. Analysis with PathogenFinder v1.1 estimated a 0.589 probability that the strain could be a human pathogen, suggesting moderate pathogenic potential, identifying eight families of pathogenic genes and five non-pathogenic ones. Phylogenomic analysis using a Maximum Likelihood approach based on core genome SNPs, alongside other members of clonal complex 52 (CC52), revealed that this isolate forms a distinct branch compared to previously reported ST52 sequences, including the Brazilian strain S. sciuri SS02, which was found in a cat with sepsis and acute respiratory distress syndrome. The predicted MecA1 protein of our strain was modeled using AlphaFold3 and structurally compared with the Pbp2a protein model of S. aureus (UniProt A0A0N9EFF2) using PyMOL, resulting in an RMSD of 0.848. The findings of this study reinforce S. sciuri R70_GMM strain as an ancestral reservoir of mecA, appearing in this case as a resident gene, with a product conformationally identical to Pbp2a, capable of generating intrinsic resistance to some β-lactams. Continued surveillance studies using a One Health approach, focusing on methicillin-resistant Gram-positive bacteria, are recommended to determine the clinical significance of this enigmatic species.
Palavras-chave: Genome, antimicrobial resistance, Staphylococcus sciuri, Methicillin-Resistant
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1128775

Computational approaches for lncRNA discovery using long-read RNA-seq data

Autores: Otávio Pereira Carreiro De Sousa,Caio César Maia Medeiros,Maria Bárbara Borges de Santana,Thaís Gaudencio do Rêgo,Vinicius Maracaja Coutinho
Apresentador: Otávio Pereira Carreiro De Sousa • opcsousa@gmail.com
Resumo:
The advent of third-generation sequencing technologies has significantly advanced identification and investigation of long non-coding RNAs (lncRNAs), enabling broader and more accurate sequencing. This has deepened the understanding of lncRNA functions and interactions with other molecules. The ability to generate long, high-precision reads has been instrumental in uncovering the structural and functional complexity of lncRNAs in the context of both research and clinical applications. A wide range of computational tools can be employed to process the massive volume of data generated by long-read sequencing technologies, particularly focused on RNA-seq assays. Therefore, we present a literature review highlighting tools and databases related to lncRNA studies using long-read RNA-seq approaches. Key workflow steps include demultiplexing, quality control (QC), filtering, trimming, and mapping of RNA long-reads. Tools identified for demultiplexing include ONTbarcoder and Guppy. ONTbarcoder has some advantages such as the use features of real-time barcoding, giving a rapid overview of barcades obtained within minutes of sequencing. Guppy is a traditional demultiplexing tool with great performance but should be used specific for Oxford Nanopore technologies. FastQC offers fast, versatile visualization of quality metrics across platforms. For filtering and trimming, Nanoq offers fast processing for nanopore reads and fits automated pipelines, while Filtlong has slower processing and higher memory usage but allows more complex read filtering. On the other hand, Fastplong is designed for a single and fast overall QC, filtering and trimming. Subsequently, MultiQC concatenates analyses of multiple tools into a single report. For mapping, Minimap2, Graphmap2, and deSALT were identified. Minimap2 is a general-purpose aligner, also supporting short-read alignment to long-read assemblies with splice-aware mapping. Graphmap2 targets long error-prone reads, while deSALT provides fast analysis of large data volumes but demands greater computational resources. Subsequent steps in lncRNA transcriptome assembly include identifying and characterizing lncRNA isoforms, normalization, and differential expression analysis. Key tools here are Bambu, IsoQuant, and StringTie2. Bambu focuses on reference-guided transcript reconstruction but has limitations in de novo detection. StringTie2 excels at de novo assembly, while IsoQuant is versatile for both annotation-free discovery and reference-guided analyses. Functional characterization of lncRNAs was also addressed, including enrichment and co-expression analysis, lncRNA-protein interactions, RNA-RNA interactions, and RNA-DNA interactions. ClusterProfiler and CEMITool were highlighted for enrichment and co-expression analysis. ClusterProfiler supports a wide range of gene set enrichment analyses, while CEMITool offers an automated pipeline for co-expression module identification and analysis. For lncRNA-protein interactions, LncADeep and LPIH2V were identified. ASSA, RIBlast, and LASTAL are prominent for RNA-RNA interaction studies, providing efficient search algorithms for interaction prediction. Triplexator and LongTarget were the main tools cited for RNA-DNA interactions, facilitating the discovery of triplex-forming sites between RNA and DNA molecules. Additionally, a survey of 21 frequently cited databases relevant to lncRNA research was conducted, covering multiple organisms and data types, such as LncBook, LNCipedia 5 and lncATLAS. The lack of standardized nomenclature within the academic community poses a challenge to the use of these databases. Finally, this review aims to assist researchers in selecting the most appropriate tools, considering different usage purposes and biological contexts.
Palavras-chave: Bioinformatics, Transcriptomics, lncRNA, Long-read, RNA-seq
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1128782

Benchmarking and prompt engineering small-parameter LLMs for genomic annotation analysis

Autores: Maria Victória Grisi Pinheiro Fabião De Araújo,Luis Henrique Fernandes de Carvalho,Icaro Rodrigues Mori,Maria Bárbara Borges de Santana,Vinicius Maracaja Coutinho,Yuri de Almeida Malheiros Barbosa,Thaís Gaudencio do Rêgo
Apresentador: Maria Victória Grisi Pinheiro Fabião De Araújo • vivigfabiao@gmail.com
Resumo:
Genomic annotations, especially when found in GFF and GTF formats, lack universal rules capable of standardizing data structure. As a result, differences in formatting, granularity, and structure are common, making standardized exploration of genomic data challenging. Faced with the challenges of analyzing and processing this data, bioinformaticians have been using several AI tools to power and accelerate analysis. Large Language Models (LLMs) are currently part of the most used tools, but many researchers often lack sufficient computational resources to implement them due to the high number of parameters. Therefore, we benchmarked the small-parameter LLMs Gemma 2 (9b), DeepSeek-R1 (14b), Phi-4 (14b), Qwen 2.5 (14b) and GPT-4o Mini (8b) to generate responses and SQL codes in Python to analyze human (Homo sapiens) genomic annotations from Ensembl. Each model was employed with a simple and straightforward prompt describing its tasks, aiming to evaluate the models' core self-context on genetics, genomic annotations, and SQL. Tasks were proposed as 20 questions divided into 5 categories: 1) Gene Count and Distribution by Chromosome, 2) Specific Gene Characteristics, 3) Chromosomal Location, 4) Gene and Isoform Features and 5) Data Visualization and Distribution. Each LLM's performance was evaluated based on accuracy, completeness, quality and explained applicability of the codes. GPT-4o Mini performed best despite refined context not being provided, with accuracy of 57.5%, while DeepSeek-R1 was the worst, with accuracy of 22.5%, followed by Qwen 2.5 (32.5%), Gemma 2 (37.5%), and Phi-4 (45%). In general, for all models, 55% of the answers were completely or partially correct with minimal adjustments to fit data context, and 45% were wrong. GPT-4o succeeds especially in categories 2 (80%) and 3 (75%), followed by 4 (~33.3%) and 5 (~16.7). All models struggled mainly with more complex tasks, such as querying multiple attributes and data visualization. General overall accuracy per category for all models was: 1 (62%), 2 (30%), 3 (52.5%), 4 (~13.33), and 5 (~23.33). The tested models exhibited varying performance. DeepSeek-R1 demonstrated a tendency towards hallucinations and verbose responses, similar to an “excessive overthinking”, frequently providing lengthy and descriptive outputs that lacked accuracy. Conversely, GPT-4o Mini offered concise and adequately explained solutions, efficiently addressing the tasks through direct approaches. All models displayed a baseline understanding of genetics and molecular biology; erroneous queries, while incorrect, maintained a degree of logical consistency, acknowledging the models' lack of direct data access. Observed errors could be mitigated through a few more specific adjustments or well-structured, prompted instructions for SQL queries. Finally, our benchmarking reveals that small-parameter LLMs can facilitate genomic annotation analysis when provided with clear instructions and relevant context. This study establishes a foundation for the development and usage of LLMs as bioinformatics tools. The project is currently underway (github.com/voczie/LLM-benchmarking), and our next steps will focus on optimizing these models through prompt engineering, parameter tuning, Retrieval Augmented Generation, Chain of Thought reasoning, and fine-tuning for specific genomic annotation tasks to maximize their potential in genomic data analysis.
Palavras-chave: LLMs, Genomic Annotations, Benchmarking
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1128800

First genomic mining initiative in Capilliphycus salinus ALCB114379

Autores: Anna Clara Vieira Pinto,Elias Pereira Silva Neto,Victor Emmanuel Cantanheide Barbosa Dos Santos,Thallyson Gabriel Martins Correia Fontenele,Leonardo Teixeira DallAgnol
Apresentador: Anna Clara Vieira Pinto • clara.anna@discente.ufma.br
Resumo:
Capilliphycus salinus is a little-studied cyanobacterium that was described in 2019, with the holotype found in Salvador, in the state of Bahia, at Pedra do Sal Beach (12°57′06′′ S, 38°20′42′′ W). The aim of this study was to carry out an exploratory search for BCGs (biosynthetic gene clusters) using the genome published in January 2025 in the GenBank of the National Center for Biotechnology Information (NCBI) with the following access code (GCA_047276495.1). The genome, although annotated, was submitted to Rapid Annotation using Subsytem Technology (RAST) to classify the products of the genes into metabolic pathways and cellular processes. The genome was then added to antiSMASH bacterial version 8.0 (beta) in relaxed mode to identify BCGs. Once the clusters were obtained, they were checked for the compound produced in the Minimum Information about a Biosynthetic Gene cluster (MIBiG) database. Twenty-five subsystems were identified in RAST, but without clear relationships with the mining products, which resulted in 8 regions of possible BCGs, with only three showing any level of similarity with the database, reinforcing the need for further research to characterize gene clusters in Capilliphycus salinus and in phylogenetically close cyanobacteria to feed future databases. The three regions with similarity to the database: 1.1 (1), 1.5 (2) and 1.6 (3), presented this parameter at the following levels: high, medium and low. The first BCG (1) is possibly responsible for the production of microcyclamide (RiPP), with cytotoxic potential studied in P388 murine leukemia cells and in the development of Danio rerio (zebrafish), the second BCG (2) possibly produces p-chlorophenylacetic acid (PCPA), an aromatic fatty acid with anticancer properties. The third BCG (3) supposedly produces puwainaphycin F/minutissamide A, which are compounds recognized for their cytotoxic and antimicrobial activities. Thus, the first effort to mine this genome reveals the potential of this cyanobacterium for the production of molecules of pharmacological interest.
Palavras-chave: genomic mining, Capilliphycus salinus, secondary metabolites
#1128812

Comparative Analysis of Protein Structure Prediction Tools for Missense Mutations Associated with Drug Resistance in Mycobacterium tuberculosis

Autores: Veridiana Piva Richter,Adriano Werhli,Karina Machado
Apresentador: Veridiana Piva Richter • veripiva@gmail.com
Resumo:
Proteins are composed of one or more extended, intricately folded chains of amino acids. These biological macromolecules play a crucial role in the structure, function, regulation of cellular processes and metabolic pathways in living organisms. Structural bioinformatics is the field that involves a variety of data repositories, algorithms, and software tools to explore, analyze, predict, simulate and interpret biomolecular structures and their interactions. Advances in structural bioinformatics tools, machine learning algorithms, growth in computational capacity and the increase in the number of structures deposited in the Protein Data Bank (PDB) make possible the development of tools and deep learning models (as proposed by AlphaFold 3) that currently can predict protein structures with high reliability. However, one of the challenges in structural biology is predicting protein structure with single-pointed mutations, given that AlphaFold2 does not appear to be suitable for predicting the structure and the effect of missense mutations on protein structure. Missense mutations are characterized by the substitution of a base pair that results in the change by another amino acid, which can modify important properties in proteins, such as their conformation, stability, flexibility and drug resistance. Experimental analyses yield insights into protein stability but are both resource-intensive and time-consuming, rendering comprehensive experimental testing of all potential mutations impractical. Since many protein structures remain undetermined experimentally, computational methods are employed to generate three-dimensional protein structures from their amino acid sequences. In this work, to better understand the impact of a single-point mutation in a protein structure, we propose to evaluate different structure prediction algorithms to verify which are the most appropriate for mutated proteins. As structure prediction tools we are using ColabFold, AlphaFold3, Swiss-Model, I-Tasser, Modeller, OmegaFold and Phyre2. Subsequently quality tests of the structures were performed to evaluate the generated models, using Molprobity, Verify3D, and ERRAT. Considering the important target proteins of Tuberculosis bacteria, which are known to have mutations related to drug resistance. We modeled 15 proteins, totaling a total of 384 mutant variants. The analysis includes the respective genes: atpE; Rv0678; tlyA, ddn; embB; ethA; inhA; katG; gyrA; gyrB; rplC; pncA; rpoB; gid and rpsL. Based on the findings, we hope to conclude that it`s possible to develop strategies to understand if some mutations can destabilize the protein and what are the best tools to predicted proteins with mutations. It’s important to analyse the behavior of these tools to figure out how the proteins prediction will be affected by the presence of mutations.
Palavras-chave: Protein structure prediction, Missense mutations, Structural bioinformatics, Drug resistance, Tuberculosis
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1128845

Characterization of Single-Cell Tumor-Specific Splicing Events in HGSOC

Autores: Gabriel Fernando Costa Da Fonseca,Nayara Evelin de Toledo,Ana Carolina Pires e Silva,Nayara Gusmão Tessarollo,Glenerson Baptista,Mariana Boroni
Apresentador: Gabriel Fernando Costa Da Fonseca • gafonseca@id.uff.br
Resumo:
Ovarian cancer is the deadliest form of gynecologic cancer, with the high-grade serous ovarian carcinoma (HGSOC) subtype being particularly challenging due to its high treatment resistance and mortality rates. Alternative splicing (AS) is an essential RNA processing mechanism that allows cells to produce different isoforms from the same gene by including or excluding specific exons and introns. This process is vital for various biological functions, including cell differentiation, and plays a crucial role in cancer progression by generating isoforms that can promote tumor growth. Dysregulation of AS can lead to the production of isoforms that help tumor cells evade immune detection or enhance immune suppression, contributing to therapy resistance. Conversely, tumor-specific isoforms generated through AS may influence interactions with immune cells, such as T cells or natural killer (NK) cells, potentially worsening treatment response. This project aims to identify AS-related isoforms produced by different tumor cells in HGSOC. Using publicly available single-cell RNA-seq (scRNA-seq) data from HGSOC patients (phs002262.v3.p1), we identified specific splicing events and their corresponding isoforms. Raw scRNA-seq data from two different datasets were retrieved from the SRA using the SRA Toolkit. These data were then processed with CellRanger to generate BAM files, which were submitted to Starsolo for identification of AS events. The resulting data was processed through a pipeline for preprocessing, integration, and identification of different cell populations using the Seurat R package. Following this, identification of different AS profiles associated with cell types was performed using the MARVEL tool. In total, we identified 14 different cell populations, in which we detected an average of 18,013 protein-coding genes, 11,622 long non-coding RNAs, and other classes of RNAs expressed at low levels. Initially, we compared epithelial cell populations and malignant cell clones, where we identified AS events and isoforms associated with a tumor-specific profile. To achieve this, we conducted a differential splicing junction analysis, identifying 4,088 splicing junctions across 938 genes that were significantly up or down-regulated among tumor clones. We were able to identify four different types of AS profiles ISO-switch, coordinated, complex, and opposing and also determined the chromosomal coordinates corresponding to the AS events and the isoforms associated with the identified genes. Next, we will evaluate the isoforms impact on patient survival using the TCGA cohort. By examining these patterns, we hope to uncover novel biomarkers associated with treatment response or poor prognosis.
Palavras-chave: scRNA-seq, Alternative Splicing, Ovarian cancer, Biomarkers
★ This work is running for the Next Generation Bioinfo Award
★ Running for the Qiagen Digital Insights Excellence Awards
#1134065

Trends in Small Open Reading Frames Research

Autores: Nayane de Souza,Bruno Thiago de Lima Nichio,Fabiana Rodrigues de Góes,Liliane Santana,ALEXANDRE ROSSI PASCHOAL
Apresentador: Nayane de Souza • nayane.d.souza@gmail.com
Resumo:
Short open reading frames (sORFs) are genomic regions that have the potential to encode microproteins, typically comprising fewer than 100 amino acids. Despite their functional significance in various physiological and pathological processes, microproteins have historically been under-annotated due to their small size, limited evolutionary conservation, localization in noncanonical regions, and the challenge of distinguishing them from translational noise. However, advances in deep sequencing, ribosome profiling (Ribo-seq), and mass spectrometry (MS) have revealed that many of these sORFs are indeed translated into functional peptides, fueling growing interest in the field. sORFs can be classified based on their genomic location relative to canonical coding sequences, including upstream ORFs (uORFs), downstream ORFs (dORFs), overlapping ORFs (oORFs), and ORFs located within noncoding RNAs (ncORFs). In this study, we manually curated 184 review articles and conducted a science mapping analysis to investigate the research landscape surrounding sORFs. Our analysis indicates that only a few countries are currently leading in research output and collaboration within this field. Additionally, we found that topics such as noncoding RNAs (ncRNAs) and cancer have been central to recent research efforts, with an increasing trend toward studies specifically focused on long noncoding RNAs (lncRNAs). Our findings provide a comprehensive overview of the sORF research landscape, highlight critical knowledge gaps, and propose future research directions—ultimately contributing to a more coherent and accessible framework for sORF discovery and analysis.
Palavras-chave: Small Open Reading Frames, sORFs
#1134399

HOW DO MICROBIAL COMMUNITIES DEAL WITH DNA DAMAGE IN THE ANTARCTIC SUBSEAFLOOR?

Autores: Rayana dos Santos Feltrin,Tiago Antonio de Souza,Alan Durham,Carlos Frederico Martins Menck
Apresentador: Rayana dos Santos Feltrin • rayanafeltrin@gmail.com
Resumo:
The genetic material of organisms is under constant threat from various sources that can damage DNA. When considering deep-sea regions and the Antarctic environment, conditions such as low temperatures and high hydrostatic pressure further challenge the maintenance of stability of this molecule. However, cells are equipped with many DNA damage response pathways that ensure the integrity of their genomes. In this work, we investigated the DNA repair mechanisms that microorganisms have to survive under deep-sea conditions in Antarctica. To address this, we have integrated original data generated in this study with publicly available sequencing datasets, most of them coming from deep-sea sediment samples. Our original data was obtained by sequencing metagenomes from two Antarctic locations: King George Island (KGI), at the sea-level (soil sample), and Bransfield Strait (BFS), at a depth of 1250 m (marine sediment sample). DNA was extracted from these samples and sequenced using the Illumina platform. Metagenome assembly was performed with metaSPAdes v3.15.5, assembly quality was assessed by metaQUAST v5.3.0, and metagenome annotation was carried out using Prokka 1.14.6. We have also conducted binning and bin refinement using the metaWRAP pipeline. Taxonomic classification of the resulting metagenome-assembled genomes (MAGs) was conducted with GTDB-Tk 1.7.0, as well as by comparison to the Genomic catalog of Earth Microbiomes (GEM). MAG annotation was performed using the Prokaryotic Genome Annotation Pipeline (PGAP). Concerning the public datasets, we have assembled metagenomes using sequences from the Sequence Read Archive (SRA/NCBI) employing the same approach. To address our main question, we have analyzed the presence of DNA repair genes in these metagenomes using HMMER 3.4, considering hits with e-value < 1E-5 as orthologs. From our original samples, we have recovered six MAGs for the KGI metagenome and five from BFS. Most of them belong to psychrophilic taxa, indicating their adaptation to cold environments. With respect to DNA repair pathways in the metagenomes, we found that, in general, they are well-conserved. However, we have observed a clear difference among KGI and BFS MAGs, especially analyzing a light-dependent DNA repair system. Whereas photolyase, a light-dependent DNA repair enzyme, was detected in some sea-level MAGs, no hit was obtained for all recovered deep-sea MAGs, suggesting a strong adaptation to a dark seafloor environment. Altogether, our analyses reveal that the microbial communities inhabiting Antarctic depths rely on several DNA repair mechanisms, possibly as an adaptation to the extreme conditions. We also shed light on the possible evolutionary loss of a light-dependent DNA repair system in deep-sea microbes, but further investigation is needed.
Palavras-chave: DNA repair, deep-sea, Antarctica
#1134423

Tryp-ncRNA: A Comprehensive Pipeline for Predicting non-coding RNAs in Trypanosomatids Using RNA-Seq Data

Autores: Raquel Enma Hurtado Castillo,Patricia de Cássia Ruy,Lissur Orsine,Angela Kaysel Cruz
Apresentador: Raquel Enma Hurtado Castillo • raquelgen1@gmail.com
Resumo:
RNA sequencing enables both qualitative and quantitative analysis of transcriptomes, as well as a deeper understanding of regulatory processes. In addition to protein-coding genes, a diversity of noncoding RNAs (ncRNAs) - ranging from small RNAs (e.g., miRNAs, snRNAs) to long noncoding RNAs (lncRNAs) - plays essential roles in gene regulation and cell development, as demonstrated in several eukaryotes. However, in Trypanosomatidae species, the diversity and functions of ncRNAs remain largely unexplored. Our laboratory is currently investigating their presence and roles in Leishmania parasites. They possess a unique genetic organization: transcription is constitutive, occurs in polycistronic transcription units (PTUs), and mature mRNAs are generated through trans-splicing. Such features pose significant challenges for omics-based computational analyses (PMID: 27177350). Building upon the individual Perl scripts described in Ruy's work (10.1080/15476286.2019.1574161), we developed the Tryp-ncRNA pipeline. This pipeline was reimplemented in Python and Shell, integrating and optimizing their functionality. The pipeline consists of two modules: (i) prediction module calculates the coverage of the reads mapping and the transcript size using Bowtie, IGVTools, and in-house scripts; (ii) characterization module includes the programs PORTRAIT, ptRNApred, RNAcon, tRNAscan-SE, and snoscan. The Try-ncRNA pipeline is available to our laboratory on GitHub and has been implemented for use in the conda and docker environments, the pipeline will be publicly available within 6 months, after some rounds of testing. To evaluate the prediction and improvements of the pipeline, we compared it using total RNA Illumina raw data from three L. braziliensis (MHOM/BR/75/M2903) lifestages analyzed by Ruy: procyclic promastigote (PRO), metacyclic promastigote (META), and amastigote (AMA), available under accession number SRP162992. Our pipeline was applied to an improved genome assembly of the same strain (GCA_964014055.1, kindly made available by S.M. Beverley and J. Cotton), and a total of 10,433 ncRNAs were identified, of which 3,598 were small ncRNAs, and 6,835 were long ncRNAs. Among them, 9,360 ncRNAs were in the PTU strand and 974 in the "antisense" direction. Regarding genomic location, 2,076 ncRNAs were predicted to be in mRNA 3' UTRs, while 1,479, in 5' UTRs. Of the candidates, 6,184 (59.27%) were predicted as ncRNAs by PORTRAIT, 10,398 (99.66%) by ptRNApred, 344 as snoRNAs by snoScan, and 37 as tRNAs by tRNAscan-SE. Comparing predictions from the original pipeline (Ruy, 12,050 ncRNAs) with Tryp-ncRNA using the same LbrM2903 genome V30, 12,046 ncRNAs were identified, with 11,436 matches by BLASTn. Differences were mainly due to minor depth variations below 100x or 50x thresholds. With the updated version of the genome, the Tryp-ncRNA pipeline predicted 10,434 ncRNAs, 7,721 of which were shared. The long-read sequencing of the new genome assembly corrected repetitive regions and resolved overlapping contigs in complete chromosomes, improving accuracy by ncRNA positioning and sizing. The automated pipeline developed here advances transcriptomic analysis of trypanosomatids. By integrating a robust genome assembly, leveraging DIAMOND for rapid sequence alignment, and using updated Pfam database versions, the pipeline delivers faster analyses and improves result precision. Funded by FAPESP (2023/03015-0; 2024/09300-1).
Palavras-chave: L. braziliensis, transcriptome, pipeline, noncoding RNA
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
#

Autores:
Produto Oficial AB3C