Discovering the thematic structure of the Quran using probabilistic topic model
MetadataShow full item record
Topic modeling refers to extracting topics from text. Topic model is a statistical model whose aim is to discover topics from a large collection of documents. A topic consists of a collection of words that are more likely to be found together in the given context of that topic or theme. This paper applies a topic model to discover the thematic structure of the Quran. For centuries, the Quran has been widely studied for the topics it contains and the relationships among them. The Holy Quran is a treasure of tremendous amount of information that addresses various aspects of human life, social as well as individual. The information present in the Quran relates in a conceptual manner although its individual bits may look unstructured and scattered. This paper attempts to use a computational method to identify this hidden thematic structure automatically. We considered each surah in the Quran as a document and used Latent Dirichlet Allocation, a probabilistic topic modeling algorithm, to discover the topics/themes. The Arabic Quran was used as the corpus instead of transliteration or translation. Our results are very promising and we were able to discover the major themes in the surahs, along with the most important terms that describe these themes.