Indonesian Text Dataset for Determining Sentiment Classification Using Mechine Learning Approach

Indra Edy Syahputra, Tulus Tulus, Syahril Efendi


Advances in information technology encourage the emergence of unlimited textual information with the use of online media developing so rapidly that the emergence of the need for information presentation without reducing the value of the information presented. Basicaly the concept of the dataset is a general form of almost every discipline, where the dataset provides empirical basic information for research activities. Sentiment analysis is done to see opinions or feelings about a problem or identify and classify information trends from the problem. The dataset analysis in determining sentiment classification is a model of sentiment classification that has relevance to the dataset with the use of machine learning techniques with supervision that learns from experience to predict output from labeled input data and output from machine learning. The results of experiments and tests that have been carried out on machine learning techniques with supervision can classify sentiments in the tweet text properly and the level of accuracy can still be improved to a better direction with data namely baseline 100 (days) and 83 (weeks), naivebayes 100 (days) and 82 (weeks), maxent 100 (days) and 83 (weeks), and SVM 100 (days) and 83 (weeks).


Mechine learning, classificatin, sentiment, dataset

Full Text:



Basu, A., Watters, S., & Shepherd, M., (2002). Support Vector Machines for Text Categorization. Proceesings of the 36th Hawai International Conference on System Sciences (HICSS’03), 0-7695-1874-5/03 © 2002 IEEE.

Benamara, F., Cesarano, C., & Reforgiato, D., (2006). Sentiment Analysis: Adjectives and Adverbs are better then Adjectives Alone. ICWSM’2006 Boulder, CO USA.

Chandani, V., Wahono, S.R., Purwanto., (2015). Komparasi Algoritma Klasifikasi Machine Learning dan Fature Selection pada Analisis Sentimen Review Film. Journal of Intelligent System, Vol. 1, No. 1, February 2015 ISSN : 2365 – 3982.

Dergiades, T., (2012). Do Investors Sentiment Dynamics Affeck Stock Return. Evidence from the USA Economy. Economucs Letters, 116 (3), 404-407. Doi: 10.1016/j.econlet. 2012.04.018.

Chang, C.C., & Lin J.C., (2011). LIBSVM: A Library for Support Vector Machine. ACM Transactions on Intelligent Systems and Technology (TIST), Article No.27, Volume 2 Issue 3, April 2011. DOI: 10.1145/1961189.1961199.

Chen, J., Huang, H., & Tian, S., (2009). Feature Selection for Text Classification with Naïve Bayes. Expart System Application, 36 (3), 5432-5435.Heckeling, G., 2014. Mastering Machine Learning with Scikit-Learn: Apply effective learning algoritms to real-word problems using scikit-learn. First Published: Oktober 2014. ISBN: 978-1-78398-836-5. Production Reference: 1221014. Published by Packt Publishing Ltd, Livery Place, 35 Livery Street, Birmingham B3 2PB, UK.

Esiyok, C., & Albayrak, S., (2015). Twitter Sentiment Tracking for Predicting Marketing Trends. © Springger International Publishing Switzerland 2015. F.Hopfgartnet (ed), Smart Information Systems, Advances in Computer Vision and Pattern Recognition, DOI: 10.1007/978-3-319-14178-7_2.

Harrington, P., (2012). Machine Learning in Action. Jeff Bleiel. ISBN: 9781617290183. Special Sales Department, Manning Publication Co, 20 Baldwin Road PO Box 261 Shelter Island, NY 11964. Printed in the United States of America.

Janadhana, R., (2012). Twitter Sentiment Analysis and Opinion Mining. Dapartmen of Computer Science, University of North Carolina at Chapel.

Pang, O.B., & Lee, L., (2008). Opining Mining and Sentimen Analysis. Foundations and Trend® in Information Retrieval, Vol. 2, Nos. 1 – 2 (2008) 1 – 135 DOI: 10.1561 / 1500000001.

Pang, O.B., Lee, L., Vaithyanathan, S., (2002). Thumbs up? Sentiment Classification using Machine Learning Tecniques. Appers in Proc. 2002 Conf. on Empirial Methods in Natural Language Processing (EMNLP). arXiv:cs/0205070v1, [cs.CL] 28 May 2002

Richert, W., & Coelho, P.L., (2013). Building Machine Learning System with Python. Master the art of machine learning with Python and build effective machine learning systems with this intensive hand-on guide. ISBN: 978-1-78216-140-0. First Published: July 2013. Production Reference: 1200713. Published by Packt Publishing Ltd. Livery Place. 35 Livery Street. Birmingham B3 2PB, UK.

Pustejovsky, J., & Stubbs, A., (2013). Natural Language Annotation for Machine Learning. Editors: Steele, J., & Blanchette, M. First Edition: Oktober 2012. ISBN: 978-1-449-30666-3, [LSI]. Publish by O’Reilly Media, Inc., 1005. Gravenstein Highway North, Sabastopol, CA 95472. Copyright © 2013 Pustejovsky, J., & Stubbs, A. All rights reserved. Printed in the United States of America.

Tong, S., & Koller, D., (2001). Support Vector Machine Active Learning with Application to Text Classification. Journal of Machine Learning Researh, Submitted 10/01; Published 11/01, 45-66.




  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.