Indonesian language email spam detection using N-gram and Naïve Bayes algorithm

Vernanda, Yustinus and Hansun, Seng and Kristanda, Marcel Bonar (2020) Indonesian language email spam detection using N-gram and Naïve Bayes algorithm. Bulletin of Electrical Engineering and Informatics (BEEI), 9 (5). ISSN 2302-9285

Full text not available from this repository.

Abstract

Indonesia is ranked the top 8th out of the total country population in the world for the global spammers. Web-based spam filter service with the REST API type can be used to detect email spam in the Indonesian language on the email server or various types of email server applications. With REST API, then there will be data exchange between the applications with JSON data type using existing HTTP commands. One type of spam filter commonly used is Bayesian Filtering, where the Naïve Bayes algorithm is used as a classification algorithm. Meanwhile, the N-gram method is used to increase the accuracy of the implementation of the Naïve Bayes algorithm in this study. N-gram and Naïve Bayes algorithms to detect spam email in the Indonesian language have successfully been implemented with accuracy around 0.615 until 0.94, precision at 0.566 until 0.924, recall at 0.96 until 1.00, and F-measure at 0.721 until 0.942. The best solution is found by using the 5-gram method with the highest score of accuracy at 0.94, precision at 0.924, recall at 0.96, and F-measure value at 0.942.

Item Type: Article
Subjects: 000 Computer Science, Information and General Works > 000 Computer Science, Knowledge and Systems > 004 Computer Science, Data Processing, Hardware
000 Computer Science, Information and General Works > 000 Computer Science, Knowledge and Systems > 006 Special Computer Methods
Divisions: Faculty of Engineering & Informatics > Informatics
Depositing User: Administrator UMN Library
Date Deposited: 21 Oct 2021 02:41
Last Modified: 21 Oct 2021 02:41
URI: https://kc.umn.ac.id/id/eprint/18919

Actions (login required)

View Item View Item