Monday, December 8, 2014

Convert string to date in Stata

Stata is a powerful tool for data analysis. It is also a data management tool that makes work easier.

Currently, I am exploring the Philippine employment data from 2002 to 2014 which looks like this:


The problem that I encountered when I copied and pasted these data in Stata is that the period variable is stored as string.
In time series, the period should be converted into date type.  The command to convert period into date type is:

generate period2=date(period, "MY", 2020)

The date () function takes 3 arguments.  The first is the variable you want to convert, the second is the format of the variable and the last is the top year or limit of the year.

format period2 %td

This command formats the period2 variable into day.

generate quarterly =qofd(period2)
format quarterly %tq

Finally, qofd() command takes the period2 variable and converts it to corresponding quarter.  I think qofd () stands for quarter of date.

Type list period emprate period2 quarterly in 1/5 to show that period has been properly converted into date format.

period                         emprate                        quarterly

1. Jan-02                      89.7                            2002q1
2. Apr-02                     86.1                            2002q2
3. Jul-02                      88.8                             2002q3
4. Oct-02                     89.8                             2002q4
5. Jan-03                     89.4                             2003q1 

Interrupted Time Series

When I first heard about this design, I had an impression that it is a difficult impact evaluation design.  At first, it was really confusing but I persisted and looked for simpler reading material that explains the concept.  From that time on, I got intrigued about this approach and I even went to the extent of buying two books from Amazon on time series.

So what is interrupted time series?

Interrupted time series (ITS) is a quasi-experimental design that uses historical data to show the impact and to infer the causal connection between the outcome of interest and the intervention.  In this design, there is no random allocation of treatment to program participants.  The counterfactual or the situation if the intervention was not introduced is the forecasted value of variable being measured. The reasoning behind this design is that if it can be shown that an "interruption" or break in the trend at the time of introduction of the intervention, then such break provides support to causal argument in favor of the intervention in the absence of other competing explanations.    


I tried playing around the employment rate from 2002 to 2014 to see if I could apply ITS in this data set.  Lo and behold! There is an interruption in the trend in 2005!  The blue line represents the intervention.  What happened in 2005 that caused abrupt change in the employment trend?  That is something subject for a proper research.  

Source of Data: NSCB