Klasifikasi Ulasan Konsumen Menggunakan Random Forest dan SMOTE

  • Nurul Istiqamah Institut Teknologi dan Bisnis Nobel Indonesia
  • Muhammad Rijal Institut Teknologi dan Bisnis NOBEL Indonesia

Abstract

Ulasan berupa deskripsi teks direpresentasikan dalam bentuk skala peringkat tidaklah cukup sebagai acuan dalam menentukan sentimen yang seringkali bias. Penelitian ini dilakukan untuk mengklasifikasi ulasan konsumen dengan menerapkan teknik klasifikasi sentimen pada tingkat dokumen. Penelitian ini berfokus pada imbalanced class klasifikasi ulasan konsumen. Metode yang digunakan dalam mengatasi permasalahan imbalanced class adalah Synthetic Minority Oversampling Technique (SMOTE) dan Random Under-sampling (RUS) pada tahap pre-processing dan pada klasifikasi menggunakan Random Forest. Fitur data yang digunakan adalah fitur teks ulasan (proses analisis dan klasifkasi sentimen) dan fitur peringkat (proses pelabelan). Penerapan teknik class imbalanced (dalam hal ini SMOTE dan RUS) pada tahapan pre-processing mampu memberikan perubahan peningkatan akurasi dan mengenali data yang awalnya dianggap minor.

References

Bahwari, 2019, Sentiment Analysis Using Random Forest Algorithm-Online Social Media Based, Journal Of Information Technology and Its Utilization 2 (2), 29–33.

Bekkar, M., Djemaa, H dan Alitouch, T., 2013, Evaluation Measures for Models Assessment over Imbalanced Data Sets, Journal of Information Engineering and Applications 3 (10), 27–38.

Breiman, L., 2001, Random Forests, Machine Learning 45 (1), 5–32.

Fitriyah, N., Warsito, B., Maruddani, D. A. I., 2020, Analisis Sentimen Gojek Pada Media Sosial Twitter Dengan Klasifikasi Support Vector Machine (SVM), Jurnal Gaussian 9 (3), 376–390.

Li, Y., Guo, H., Zhang, Q., Gu, M dan Yang., 2018, Imbalanced Text Sentiment Classification Using Universal and Domain-Specific Knowledge, Knowledge-Based Systems 160 (June), 1–15.

Liu, B., 2015, Sentiment Analysis, Cambridge University Press, Cambridge.

Liu, B., 2012, Sentiment Analysis and Opinion Mining, Synthesis Lectures on Human Language Technologies 5 (1), 1–167.

Mustaqim, M., Warsito, B dan Surarso, B., 2019, Kombinasi Synthetic Minority Oversampling Technique ( SMOTE ) Dan Neural Network Backpropagation Untuk Menangani Data Tidak Seimbang Pada Prediksi Pemakaian Alat Kontrasepsi Implan, Jurnal Ilmiah Teknologi Sistem Informasi 5 (34), 116–127.

Neogi, A.S., Garg, K. A., Mishra, R. K., Dwivedi, Y. K., 2021, Sentiment Analysis and Classification of Indian Farmers’ Protest Using Twitter Data, International Journal of Information Management Data Insights 1 (2), 100019.

Nugroho, K.S., 2019, Dasar Text Preprocessing Dengan Python, medium.com.

Park, M., Jung, D., Lee, S dan Park, S., 2020, Heatwave Damage Prediction Using Random Forest Model in Korea, Applied sciences 10 (22), 1–12.

Parmar, A., Kataruya, R dan Petal, V., 2019, A Review on Random Forest: An Ensemble Classifier, Lecture Notes on Data Engineering and Communications Technologies 26, 758–763.

Qu, Z., Li, H., Wang, Y., Zhang, J., Abu-Siada, A., & Yao, Y., 2020, Detection of Electricity Theft Behavior Based on Technique and Random Forest Classifier, Energies 13 (8), 2039.

Rafrastara, F. A., Supriyanto, C., Paramita, C., Astuti, Y. P., & Ahmed, F. (2023). Performance Improvement of Random Forest Algorithm for Malware Detection on Imbalanced Dataset using Random Under-Sampling Method. Jurnal Informatika: Jurnal Pengembangan IT, 8(2), 113-118.

Saberi, B. dan Saad, S., 2017, Sentiment Analysis or Opinion Mining: A Review, International Journal on Advanced Science, Engineering and Information Technology 7 (5), 1660–1666.

Saifudin, A. dan Wahono, S., 2015, Pendekatan Level Data Untuk Menangani Ketidakseimbangan Kelas Pada Prediksi Cacat Software, Journal of Software Engineering 1 (2), 76–85.

Shaheen, M., 2019, Sentiment Analysis on Mobile Phone Reviews Using Supervised Learning Techniques, International Journal of Modern Education and Computer Science 11 (7), 32–43.

Soltanzadeh, P. dan Hashemzadeh, M., 2021, RCSMOTE: Range-Controlled Synthetic Minority over-Sampling Technique for Handling the Class Imbalance Problem, Information Sciences 542, 92–111.

Tama, V.O., Sibaroni, Y dan Adiwijaya., 2019, Labeling Analysis in the Classification of Product Review Sentiments by Using Multinomial Naive Bayes Algorithm, Journal of Physics: Conference Series 1192 (1).

Umer, M., Sadiq, S., Missen, M., Hameed, Z., Siddique, M dan Nappi, M., 2021, Scientific Papers Citation Analysis Using Textual Features and SMOTE Resampling Techniques, Pattern Recognition Letters 150, 250–257.

Utari, M., Warsito, B., Kusumaningrum, R., 2020, Implementation of Data Mining for Drop-Out Prediction Using Random Forest Method. In 2020 8th International Conference on Information and Communication Technology (ICoICT), IEEE, 1–5.

Vinodhini, G. dan Chandrasekaran, R.M., 2017, A Sampling Based Sentiment Mining Approach for E-Commerce Applications, Information Processing and Management 53 (1), 223–236.

Zhang, H. dan Wang, Z., 2011, A Normal Distribution-Based Over-Sampling Approach to Imbalanced Data Classification. In ADMA,

Zou, K.H. O'Malley, J, A, dan Mauri, L., 2007, Receiver-Operating Characteristic Analysis for Evaluating Diagnostic Tests and Predictive Models, Circulation 115 (5), 654–657.

Published
2024-01-22