Review and Visualization of Facebook's FastText Pretrained Word Vector Model

Young, Julio Cristian and Rusli, Andre (2019) Review and Visualization of Facebook's FastText Pretrained Word Vector Model. 2019 International Conference on Engineering, Science, and Industrial Applications (ICESI). ISSN 2521-3814

Full text not available from this repository.

Abstract

One of the most popular machine learning methods for processing natural language is Word2Vec. Like several other machine learning methods, there are some concerns regarding the interpretability of the resulting model. In this paper, our research aims to review and analyze a pretrained word vector model for processing Bahasa Indonesia released by Facebook, FastText. The analysis process is started by comparing words existing in the pretrained model and in the official dictionary of Indonesian language (KBBI), then words in the model are visualized to provide further analysis and review. A combination of Principal Component Analysis (PCA) method and t-SNE algorithm is used as a dimensionality reduction technique to visualize the word set. Based on the analysis and visualization result, this paper proposes several considerations needed when using the FastText pretrained word vector model to process natural language in Indonesian such as whether or not common natural language text preprocessing techniques are needed.

Item Type: Article
Subjects: 000 Computer Science, Information and General Works > 000 Computer Science, Knowledge and Systems > 004 Computer Science, Data Processing, Hardware
700 Arts and Recreation > 740 Graphic Arts and Decorative Arts
700 Arts and Recreation > 770 Photography, Computer Art, Film, Video > 776 Computer Art (Digital art)
Divisions: Faculty of Engineering & Informatics > Informatics
Depositing User: Administrator UMN Library
Date Deposited: 05 Oct 2021 09:18
Last Modified: 13 Oct 2021 00:46
URI: https://kc.umn.ac.id/id/eprint/18536

Actions (login required)

View Item View Item