Hindi Document Extractive Summarization: Neural Method on a New Data Set

Abstract

Extractive summarization is one of the vital tasks in text analysis and natural language processing. Although Hindi is one of the world’s highly speaking languages and produces thousands of online documents daily, most existing text summa- rization works focus on the English language. A Neural network- based summarizer is popular for abstractive summarization but has not been explored for extractive one except in a few recent studies. The present work uses a neural extractive summarizing model to develop a Hindi language extractive summarizer. The main contribution of the paper is two-fold. First, we generated a new Hindi-based text summarization data set from a popular Hindi news channel AajTak. The code to generate the data set is available at https://tinyurl.com/sonaa-hindi-text. Then we use this data set to train a Neural Extractive Summarization model. The model also learns the word embeddings while learning itself. The ROUGE-2-F1 and ROUGE-1-F1 results on test data show promising output with a score of 20.02 and 39.81, respectively.

Date
Dec 3, 2022 11:30 AM — 1:00 PM
Location
Online
Suman Kundu
Suman Kundu
Assistant Professor of Computer Science and Engineering

My research interests include social network analysis, network data science, streaming algorithms, big data, granular computing, soft computing, fuzzy and rough sets.