Introduction

The goal of this section is to find the average treatment effect of carbon pricing policy interventions on the national level on greenhouse gas emissions. In other words, we want to see whether implementing carbon pricing policies will help us decrease greenhouse gas emissions in a country. This is an observational time series cross section (TSCS) study.

While many economic and clinical studies make use of randomized controlled trials to measure the impact of an intervention, we cannot do this using an observational study like the one we have here. In social science, a conventional solution to this would be to perform a linear regression analysis using time and unit fixed effects. However, this sometimes introduces biases into the results, so in this study, we will be using a score-matching model with the difference-in-difference approach to help us evaluate the average treatment effect of carbon pricing policies on greenhouse gas emissions on the national level. The foundation of this TSCS empirical model was first created by Imai, Kim, and Wang, 2020 in hopes of bridging the gap between the matching models approach and TCSC studies. We will be applying their model to our study.

The model works as follows and heavily relies on the PanelMatching package, available at https://CRAN.R-project.org/package=PanelMatch:

For each treated observation, we have a set of control observations from other units in the same period that have the same treatment history
We then refine the matched set by using either standard matching or weighting methods
After choosing the best match or weights, we then apply it to a difference-in-difference model. This means that we readjust the timeline so that the point of intervention is t=0. Effects after 1 year would be seen at t=1, and so on. Note that we do this so we can compare time-varying interventions across different countries. For example, Canada received treatment in 2006, but Singapore received treatment in 2012. We reset the intervention year 2006 and 2012 to t=0 for an easier and more accurate comparison.

More information can be found about the model in the following links:

Model

With the climate change crisis being a global threat, countries have been working together to help slow and reverse the impacts of rising greenhouse gas emissions. Multiple mitigation efforts have been used including research and development, carbon pricing, and clean energy standards, just to name a few.

Our model focuses on carbon pricing policies, which can be broken down into two sub-categories: carbon tax and emissions trading systems (ETS). This will be our main independent variable (X). And since we are trying to find the relationship between carbon pricing policies and greenhouse gas emissions, total greenhouse gas emissions will naturally be our dependent (y or outcome) variable. To create a more accurate model, covariates will also be used, and these include GDP per capita, population, and fertility rates. The model can benefit from even more covariates, but further refinements to the above described model will be left for future studies.

Note that we will use the year of the first carbon pricing policy introduced in a country as the benchmark for treatment. For example, if a country implemented multiple carbon pricing policies varying across a time span, we will use the first date as the treatment implementation date.

Code and Analysis

Load Libraries

library(ggplot2)
library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──

## ✓ tibble  3.0.3     ✓ dplyr   1.0.2
## ✓ tidyr   1.1.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ✓ purrr   0.3.4

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(tidyr)
library(dplyr)
library(readxl)
library(reshape2)

## 
## Attaching package: 'reshape2'

## The following object is masked from 'package:tidyr':
## 
##     smiths

library(plyr)

## ------------------------------------------------------------------------------

## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)

## ------------------------------------------------------------------------------

## 
## Attaching package: 'plyr'

## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize

## The following object is masked from 'package:purrr':
## 
##     compact

library(PanelMatch)

Read Files

carbon_price <- read.csv('carbontax_policies.csv')
head(carbon_price, 10)

climate_indicators <- read.csv('climatechangeindicatorsdata.csv')
head(climate_indicators)

legend <- read.csv('indicatorslegend.csv')
head(legend)

temperature <- read.csv('GlobalLandTemperaturesByState.csv')
head(temperature)

ghg <- read.csv('GHG.csv')
head(ghg)

policies <- read.csv('laws_and_policies_global.csv')
head(policies)

gdp <- read_csv('GDPPC.csv')

## Parsed with column specification:
## cols(
##   Entity = col_character(),
##   Code = col_character(),
##   Year = col_double(),
##   `GDP per capita` = col_double(),
##   `145446-annotations` = col_logical()
## )

## Warning: 21 parsing failures.
##   row                col           expected                                           actual        file
## 19635 145446-annotations 1/0/T/F/TRUE/FALSE United States, Canada, Australia and New Zealand 'GDPPC.csv'
## 19636 145446-annotations 1/0/T/F/TRUE/FALSE United States, Canada, Australia and New Zealand 'GDPPC.csv'
## 19637 145446-annotations 1/0/T/F/TRUE/FALSE United States, Canada, Australia and New Zealand 'GDPPC.csv'
## 19638 145446-annotations 1/0/T/F/TRUE/FALSE United States, Canada, Australia and New Zealand 'GDPPC.csv'
## 19639 145446-annotations 1/0/T/F/TRUE/FALSE United States, Canada, Australia and New Zealand 'GDPPC.csv'
## ..... .................. .................. ................................................ ...........
## See problems(...) for more details.

head(gdp)

population <- read_csv('population.csv')

## Parsed with column specification:
## cols(
##   .default = col_double(),
##   `Country Name` = col_character()
## )

## See spec(...) for full column specifications.

head(population)

fertility <- read_csv('fertility.csv')

## Parsed with column specification:
## cols(
##   Entity = col_character(),
##   Code = col_character(),
##   Year = col_double(),
##   `Fertility rate (Select Gapminder, v12) (2017)` = col_double()
## )

head(fertility)

#Clean Data

# Selecting carbon pricing policies that have been implemented or abolished, remove under consideration countries
carbon_price <- carbon_price[which(carbon_price$Status %in% c('Implemented', 'Abolished')), ]

# Creating dummy variables for type of jurisdiction covered
carbon_price$National <- ifelse(carbon_price$Type.of.juridiction.covered == 'National', 1, 0)
carbon_price$Subnational <- ifelse(carbon_price$Type.of.juridiction.covered == 'Subnational', 1, 0)
carbon_price$Regional <- ifelse(carbon_price$Type.of.juridiction.covered == 'Regaional', 1, 0)

# Creating dummy variable for status of policy
carbon_price$Implemented <- ifelse(carbon_price$Status == 'Implemented', 1, 0)
carbon_price$Abolished <- ifelse(carbon_price$Status == 'Implemented', 0, 1)

head(carbon_price)

Some rows have data on the regional or sub-national level, so we relabel the jurisdiction for easier matching with datasets later on.

# Relabel Canadian provinces under Canada jurisdiction 
relocate_carbon_price <- carbon_price[which(carbon_price$National == 0),]
relocate_carbon_price[c(1:4, 14:21, 24), 5] = 'Canada'

# Relabel Chinese provinces under China jurisdiction 
relocate_carbon_price[c(6, 8, 10, 11:12, 25:27), 5] = 'China'

# Relabel Japanese prefectures under Japan jurisdiction 
relocate_carbon_price[c(23, 28), 5] = 'Japan'

# Relabel US regions/states under US jurisdiction
relocate_carbon_price[c(5, 7, 13, 22, 29), 5] = 'United States'

# Relabel Zacatecas under Mexico jurisdiction
relocate_carbon_price[c(30), 5] = 'Mexico'

head(relocate_carbon_price)

# Relabel EU (27), Norway, Iceland, Liechtenstein under separate country jurisdictions
relocate_carbon_price <- rbind(relocate_carbon_price, relocate_carbon_price[rep(9, 30), ])

eu <- c('Austria', 'Italy', 'Belgium', 'Latvia', 'Bulgaria', 'Lithuania', 'Croatia', 'Luxembourg', 'Cyprus', 'Malta', 'Czechia', 'Netherlands', 'Denmark', 'Poland', 'Estonia', 'Portugal', 'Finland', 'Romania', 'France', 'Slovakia', 'Germany', 'Slovenia', 'Greece', 'Spain', 'Hungary',    'Sweden', 'Ireland')

relocate_carbon_price[c(31:60), 5] = c(eu, 'Norway', 'Iceland', 'Liechtenstein')

relocate_carbon_price <- relocate_carbon_price[-9,]
head(relocate_carbon_price)

# Match relocate_carbon_price with carbon_price 
carbon_price[match(relocate_carbon_price$Name.of.the.initiative, carbon_price$Name.of.the.initiative), ] <- relocate_carbon_price

# Check if Norway, Iceland, Countries in `eu` have been left out; add to carbon_price
carbon_price[which(carbon_price$Name.of.the.initiative %in% c('EU ETS')), ]

carbon_price <- rbind(carbon_price, relocate_carbon_price[31:58, ])
head(carbon_price)

After cleaning up the jurisdiction column, we realized that having sub-national levels represent national level interventions my skew our analysis, so we decided to remove sub-national interventions and only focus on national level data.

# Keep only national data
carbon_price <- carbon_price[which(carbon_price$Type.of.juridiction.covered %in% 'National'),]
carbon_price <- carbon_price[-2,]

# Clean up dataframe
carbon_price <- carbon_price[, c('Type', 'Jurisdiction.covered', 'Year.of.implementation')]
head(carbon_price)

As stated above, some countries implemented policies at different times, but I will be using the first policy ever implemented as the treatment.

# Sort dataframe, find earliest year of intervention and keep it in dataframe
carbon_price <- carbon_price[order(carbon_price[, 'Jurisdiction.covered'], carbon_price[,'Year.of.implementation']), ]
carbon_price <- carbon_price[!duplicated(carbon_price$Jurisdiction.covered), ]
carbon_price <- carbon_price[,c(2, 3)]
colnames(carbon_price) <- c('country', 'year_implemented')

head(carbon_price)

# Restructure dataframe so it is easier to work with
country <- carbon_price$country

df <- data.frame(
  country= country,
  dates=c('1950-2019')
)

df <- cbind(df, colsplit(df$dates, '-', c('start', 'end')))

df <- ddply(df, .(country), function(x){
            data.frame(
                    country=x$country,
                    yrobs=seq(x$start, x$end),
                    yrstart=x$start,
                    yrend=x$end
            )
        }
)

df <- df[, c(1, 2)]

df <- join(df, carbon_price, by='country')

df$treatment <- ifelse(df$yrobs < df$year_implemented, 0, 1)
colnames(df)[2] <- 'Year'
head(df)

# Clean data by pivoting year columns to rows
ghg <- pivot_longer(ghg, cols=5:53, names_to='Year', values_to='emissions')

# Clean `Year` column by removing X character
ghg$Year <- sub('.', '', ghg$Year)
ghg$Year <- as.numeric(ghg$Year)

# Rename column name
colnames(ghg)[4] <- 'country'
ghg <- ghg[, c(4,5,6)]

head(ghg)

# Merge ghg dataset with df
df <- join(ghg, df)

## Joining by: country, Year

head(df)

# Select relevant columns and rename 
gdp <- gdp[, c(1, 3, 4)]
colnames(gdp) <- c('country', 'Year', 'gdppc')

# Drop data before 1970
gdp <- gdp[!(gdp$Year < 1970), ]
head(gdp)

df <- join(gdp, df)

## Joining by: country, Year

df$treatment[is.na(df$treatment)] <- 0
head(df)

# Pivot table 
population <- pivot_longer(population, cols=2:61, names_to='Year', values_to='Population')

# Rename column
colnames(population)[1] <- 'country'

# Merge population with df
df <- join(df, population)

## Joining by: country, Year

head(df)

# Select relevant columns and rename
fertility <- fertility[, c(1, 3, 4)]
colnames(fertility) <- c('country', 'Year', 'Fertility')

# Merge fertility with df 
df <- join(df, fertility)

## Joining by: country, Year

head(df)

The PanelMatch package uses country_codes instead of strings. In addition, it requires consecutive integer values for year. Below I will be readjusting the data for these columns so it is compatible with PanelMatch functions.

df$country_code <- as.numeric(factor(df$country))

df <- df[order(df$country, df$Year),]
head(df)

# Find if year is consecutive
df$diff <- c(0, abs(diff(df$Year)) == 1)
remove_df <- df[which(df$diff == 0 & df$Year != 1970),]
df <- df[!(df$country %in% remove_df$country), ]
df$Year <- as.integer(df$Year)
head(df)

# Drop country, year_implemented, and diff 
df <- df[,-c(1, 5, 11)]
head(df)

Estimation

# Display distribution of treatment variable within data (represented in red)
DisplayTreatment(unit.id = "country_code",
                 time.id = "Year", legend.position = "none",
                 xlab = "Year", ylab = "country",
                 treatment = "treatment", data = df)

# Create sets of matching treated units to control units and determine the weights for each control unit
PM.results.none <- PanelMatch(lag = 4, time.id = "Year", unit.id = "country_code", 
                         treatment = "treatment", refinement.method = "none", 
                         data = df, match.missing = TRUE, 
                         size.match = 5, qoi = "att", outcome.var = "emissions",
                         lead = 0:4, forbid.treatment.reversal = FALSE, 
                         use.diagonal.variance.matrix = TRUE)

# Use Mahalanobis distance matching
PM.results <- PanelMatch(lag = 4, time.id = "Year", unit.id = "country_code", 
                         treatment = "treatment", refinement.method = "mahalanobis",  
                         data = df, match.missing = TRUE, 
                         covs.formula = ~ gdppc + Population + Fertility, 
                         size.match = 5, qoi = "att" , outcome.var = "emissions",
                         lead = 0:4, forbid.treatment.reversal = FALSE, 
                         use.diagonal.variance.matrix = TRUE)

# Lagged values Mahalanobis distance matching
PM.results1 <- PanelMatch(lag = 4, time.id = "Year", unit.id = "country_code", 
                         treatment = "treatment", refinement.method = "mahalanobis", 
                         data = df, match.missing = TRUE, 
                         covs.formula = ~ I(lag(gdppc, 1:4)) + 
                           I(lag(Population, 1:4)) + 
                           I(lag(Fertility, 1:4)) + 
                           I(lag(emissions, 1:4)), # lags
                         size.match = 5, qoi = "att", outcome.var = "emissions",
                         lead = 0:4, forbid.treatment.reversal = FALSE, 
                         use.diagonal.variance.matrix = TRUE)

Now that we have different matching and weighting possibilities, we will use covariate balance tables to choose the best model for further analysis. Note that I did not use propensity score matching because there were limited number of matches.

summary(PM.results.none$att)

## $overview
##    country_code Year matched.set.size
## 1            42 1992              129
## 2            55 1990              132
## 3            58 2014              119
## 4            72 2010              124
## 5            77 2010              124
## 6            81 2012              123
## 7           103 2014              119
## 8           114 2008              126
## 9           120 1991              130
## 10          128 1990              132
## 11          144 1996              128
## 12          148 2014              119
## 13          152 1991              130
## 14          153 2008              126
## 15          167 2013              122
## 
## $set.size.summary
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   119.0   122.5   126.0   125.5   129.5   132.0 
## 
## $number.of.treated.units
## [1] 15
## 
## $num.units.empty.set
## [1] 0
## 
## $lag
## [1] 4

plot(PM.results.none$att)

# Covariate balance tables
get_covariate_balance(PM.results.none$att, df, covariates = c('gdppc', 'Population',
                                                         'Fertility', 'emissions'), plot = FALSE, ylim = c(-2,2))

##        gdppc Population Fertility emissions
## t_4 1.573986 -0.1809702 -5.653637 0.2375909
## t_3 1.578307 -0.1897384 -5.452527 0.2075156
## t_2 1.572509 -0.1989132 -5.254124 0.1958354
## t_1 1.565128 -0.2090071 -5.076611 0.1882795
## t_0 1.568980 -0.2191036 -4.917760 0.1559899

get_covariate_balance(PM.results$att, df, covariates = c('gdppc', 'Population', 
                                                         'Fertility', 'emissions'), plot = FALSE, ylim = c(-2,2))

##        gdppc Population Fertility emissions
## t_4 1.611029 0.06037861 -5.189295 0.4511161
## t_3 1.610575 0.05529593 -4.991314 0.4382125
## t_2 1.614358 0.04912887 -4.800636 0.4350622
## t_1 1.617832 0.04168611 -4.637545 0.4360364
## t_0 1.623161 0.03385483 -4.494333 0.4140819

get_covariate_balance(PM.results1$att, df, covariates = c('gdppc', 'Population', 
                                                         'Fertility', 'emissions'), plot = FALSE, ylim = c(-2,2))

##         gdppc Population  Fertility emissions
## t_4 0.6660306  0.3373572 -1.3903303 0.4349581
## t_3 0.6608417  0.3390550 -1.2734890 0.4292266
## t_2 0.6397183  0.3405779 -1.1558363 0.4283131
## t_1 0.6310038  0.3411686 -1.0633369 0.4333829
## t_0 0.6489127  0.3418486 -0.9919339 0.4116878

# Plot covariate balances
get_covariate_balance(PM.results.none$att,
                      data = df,
                      covariates = c('gdppc', 'Population', 'Fertility', 'emissions'), 
                      plot = TRUE, # visualize by setting plot to TRUE
                      ylim = c(-4, 3))

get_covariate_balance(PM.results$att,
                      data = df,
                      covariates = c('gdppc', 'Population', 'Fertility', 'emissions'), 
                      plot = TRUE, # visualize by setting plot to TRUE
                      ylim = c(-2, 2))

get_covariate_balance(PM.results1$att,
                      data = df,
                      covariates = c('gdppc', 'Population', 'Fertility', 'emissions'), 
                      plot = TRUE, # visualize by setting plot to TRUE
                      ylim = c(-2, 2))

While there is some unevenness in our covairate balances, the last model (PM.result1) looks like the best one out of the three. It uses lagged values on a Mahalanobis distance matching method. Below, we will apply this to the difference-in-difference approach to get average treatment effect.

PE.results <- PanelEstimate(sets = PM.results1, data = df)
names(PE.results)

##  [1] "estimates"              "bootstrapped.estimates" "bootstrap.iterations"  
##  [4] "standard.error"         "method"                 "lag"                   
##  [7] "lead"                   "confidence.level"       "qoi"                   
## [10] "matched.sets"

PE.results[['estimates']]

##        t+0        t+1        t+2        t+3        t+4 
##  -7095.811  -4340.528  -6570.553  -9065.459 -10796.747

summary(PE.results)

## Weighted Difference-in-Differences with Mahalanobis Distance
## Matches created with 4 lags
## 
## Standard errors computed with 1000 Weighted bootstrap samples
## 
## Estimate of Average Treatment Effect on the Treated (ATT) by Period:

## $summary
##       estimate std.error      2.5%    97.5%
## t+0  -7095.811  4954.782 -16266.48 2906.549
## t+1  -4340.528  6582.904 -16752.02 9323.534
## t+2  -6570.553  6169.473 -17675.52 6186.391
## t+3  -9065.459  6890.980 -21548.31 4454.044
## t+4 -10796.747  7324.496 -23923.05 3827.369
## 
## $lag
## [1] 4
## 
## $iterations
## [1] 1000
## 
## $qoi
## [1] "att"

plot(PE.results)

Conclusion

From the above summary statistic and plots, we see that carbon pricing interventions help decrease greenhouse gas emissions, but the difference is statistically insignificant (estimates are within 95% confidence interval). While this is the case, further study can be done to this model to verify the conclusion. This may be suggestive evidence that carbon pricing policies alone are not enough for us to help reduce greenhouse gas emission and reach our climate goals.

Examples of further steps/study include:

adding more covariates to the model to adjust for better matching or weighting
keeping in mind that other methods in addition to carbon taxing can be used
trying difference dependent variables like average temperature changes to further evaluate carbon pricing interventions

LEGALST 123 Modeling Part 1

Maggie Siu, Ashutosh Bhadouria, Jessica Ornowski

4/23/2021