#evalph: Text Mining in R for Newbies (like me!)

Lately, I got attracted to learn R (an open-source statistical program available at https://www.r-project.org/) because of text mining. Stata also a textual analysis package called txttool but I haven't studied it yet.

So over the holidays, I undertook a project to analyze textual documents. Data mining of textual documents is on the rise because this area provides insights to your research if you want to surface dominant themes. I did some content analysis before using Nvivo at University of Melbourne and even with Nvivo I found it a lot of challenge to analyze qualitative data. So, let's begin with our small project.

Step 1: Install R (go to the link I above)

Step 2: Install RStudio (a user interface of R that is very helpful for newbies)

This is how the RStudio looks like.

Step 3: Install packages

On the left top most panel, type the following:

install.packages(‘tm’, dependencies=TRUE)

install.packages(‘wordcloud’, dependencies=TRUE)

tm (textmining) and wordcloud packages allow you to mine the text in any document and then convert the most frequent text into word clouds.

Step 4: Save your documents to text files using your notepad. I suggest that you create a folder for the purpose of this project. In my case, I saved the text files on C:/Users/grace/Desktop/txtmining

Step 5: Type the codes

These are not the best ever codes for text mining. These codes were also borrowed from other R blogsites. I only selected those codes that provides a simple solution to my problem of mining text documents.

library(wordcloud)

library(tm)

## You have to upload the packages as library so that you can use them.

setwd("C:/Users/grace/Desktop/txtmining")

## This sets the working director

txtdata <- Corpus (DirSource("C://Users/grace/Desktop/txtmining"))

inspect(txtdata)

txtdata <- tm_map(txtdata, stripWhitespace)

## This removes blank spaces

txtdata <- tm_map(txtdata, content_transformer(tolower))

## This transforms uppercase to lowercase (e.g. 'DEPED' to 'deped')

txtdata <- tm_map(txtdata, removeWords, stopwords('en'))

## This removes words that are not necessary to your analysis (e.g. is, are, shall, in, the, etc)

wordcloud(txtdata)
## shows the word cloud of your texts

#evalph

Saturday, January 2, 2016

Text Mining in R for Newbies (like me!)

No comments:

Post a Comment