Lately, I got attracted to learn R (an open-source statistical
program available at https://www.r-project.org/) because of text mining.
Stata also a textual analysis package called txttool but I haven't
studied it yet.
So over the holidays, I undertook a
project to analyze textual documents. Data mining of textual documents is
on the rise because this area provides insights to your research if you want to
surface dominant themes. I did some content analysis before using Nvivo
at University of Melbourne and even with Nvivo I found it a lot of challenge to
analyze qualitative data. So, let's begin with our small project.
Step 1: Install R (go to the link I above)
Step 2: Install RStudio (a user interface
of R that is very helpful for newbies)
This is how the RStudio looks like.
Step 3: Install packages
install.packages(‘tm’, dependencies=TRUE)
install.packages(‘wordcloud’, dependencies=TRUE)
tm (textmining) and wordcloud packages allow you to mine the text in any
document and then convert the most frequent text into word clouds.
Step 4: Save your documents to text files using your notepad.
I suggest that you create a folder for the purpose of this project.
In my case, I saved the text files on C:/Users/grace/Desktop/txtmining
Step 5: Type the codes
These are not the best ever codes for text mining. These
codes were also borrowed from other R blogsites. I only selected those
codes that provides a simple solution to my problem of mining text documents.
library(wordcloud)
library(tm)
## You have to upload the packages as library so that you can use them.
setwd("C:/Users/grace/Desktop/txtmining")
## This sets the working director
txtdata <- Corpus
(DirSource("C://Users/grace/Desktop/txtmining"))
inspect(txtdata)
txtdata <- tm_map(txtdata, stripWhitespace)
## This removes blank spaces
txtdata <- tm_map(txtdata, content_transformer(tolower))
## This transforms uppercase to lowercase (e.g. 'DEPED' to 'deped')
txtdata <- tm_map(txtdata, removeWords, stopwords('en'))
No comments:
Post a Comment