Gujarati, a morphologically rich and low-resource Indian language, presents significant challenges for natural language processing (NLP) tasks such as news article classification and summarization due to limited annotated datasets and linguistic complexity. This research proposes a comprehensive framework that leverages transformer-based models (BERT, mBERT, XLM-R) in combination with deep learning and traditional machine learning approaches to enhance the performance of Gujarati news classification and summarization. The study explores data preprocessing techniques, transfer learning, and multilingual embeddings to overcome data scarcity while maintaining semantic and contextual accuracy. Experimental results demonstrate that transformer architectures outperform conventional methods by achieving higher classification accuracy and more coherent abstractive summaries, thus providing a scalable solution for other low-resource languages. The proposed methodology contributes to the advancement of low-resource NLP, supporting the development of intelligent news analytics systems and fostering wider accessibility of digital content in Gujarati.
Gujarati news classification, news summarization, transformer models, deep learning, machine learning, low-resource NLP, transfer learning, multilingual embeddings, BERT, XLM-R