Comparison of Chi-Square and Information Gain Feature Selection Methods for Support Vector Machine-Based Sentiment Analysis

Case Study: Vidio Application Reviews on Google Play Store

Authors

  • Vitta Margaret Sinambela Mathematics, Faculty of Mathematics and Natural Sciences, Universitas Padjadjaran
  • Herlina Napitupulu Mathematics, Faculty of Mathematics and Natural Sciences, Universitas Padjadjaran
  • Nurul Gusriani Mathematics, Faculty of Mathematics and Natural Sciences, Universitas Padjadjaran

DOI:

https://doi.org/10.37278/sisinfo.v8i1.1352

Keywords:

Sentiment Analysis, Support Vector Machine, Chi-Square, Information Gain, Vidio

Abstract

Vidio is a local streaming platform that dominates the Indonesian market, but still faces challenges in improving user satisfaction as reflected by its 3.5 rating. To enhance the application, user experience insights are needed, which can be identified through sentiment analysis. This study aims to analyze the sentiment of Vidio application user reviews and compare the performance of the Support Vector Machine model using Chi-Square and Information Gain feature selection. The dataset comprises 4,670 reviews collected from July 01 to November 30, 2024. Model evaluation utilizes Balanced Accuracy metrics optimized through hyperparameter tuning to ensure fair assessment on imbalanced data. The experimental results demonstrate that Chi-Square feature selection yields the optimal performance, achieving a peak Balanced Accuracy of 94.78%. Significantly, this result was attained using a computationally efficient Linear Kernel (). In contrast, the Information Gain method yielded a lower Balanced Accuracy of 94.20% despite utilizing a complex Polynomial Kernel (). These findings conclude that Chi-Square provides a superior trade-off between classification accuracy and model complexity, offering a more robust solution for sentiment analysis.

References

using InceptionV3 and SVM,” International Journal of Engineering Research & Technology (IJERT), vol. 10, no. 8, pp. 6–10, Aug. 2021.

R. A. Ariyanto and N. Chamidah, “Sentiment Analysis for Zoning System Admission Policy Using Support Vector Machine and Naive Bayes Methods,” J. Phys. Conf. Ser., vol. 1776, no. 1, p. 012058, Feb. 2021, doi: 10.1088/1742-6596/1776/1/012058.

N. M. S. Hadna, P. I. Santosa, and W. W. Winarno, “Studi Literatur Tentang Perbandingan Metode Untuk Proses Analisis Sentimen Di Twitter,” in Seminar Nasional Teknologi Informasi dan Komunikasi (SENTIKA 2016), Yogyakarta, Mar. 2016.

Naiyang. Deng, Yingjie. Tian, and Chunhua. Zhang, Support vector machines : optimization based theory, algorithms, and extensions. CRC Press, Taylor & Francis Group, 2013.

Md. S. Reza, U. Hafsha, R. Amin, R. Yasmin, and S. Ruhi, “Improving SVM performance for type II diabetes prediction with an improved non-linear kernel: Insights from the PIMA dataset,” Computer Methods and Programs in Biomedicine Update, vol. 4, p. 100118, 2023, doi: 10.1016/j.cmpbup.2023.100118.

A. S. Nugroho, A. B. Witarto, and D. Handoko, “Support Vector Machine: Teori dan Aplikasinya dalam Bioinformatika,” https://asnugroho.net/papers/ikcsvm.pdf.

A. Tharwat, “Classification assessment methods,” Applied Computing and Informatics, vol. 17, no. 1, pp. 168–192, Jan. 2021, doi: 10.1016/j.aci.2018.08.003.

Downloads

Published

2026-02-27

How to Cite

Sinambela, V. M., Napitupulu, H., & Gusriani, N. (2026). Comparison of Chi-Square and Information Gain Feature Selection Methods for Support Vector Machine-Based Sentiment Analysis: Case Study: Vidio Application Reviews on Google Play Store. SISINFO : Jurnal Sistem Informasi Dan Informatika, 8(1), 43–51. https://doi.org/10.37278/sisinfo.v8i1.1352

Issue

Section

Articles