The goal of this section is to find the average treatment effect of carbon pricing policy interventions on the national level on greenhouse gas emissions. In other words, we want to see whether implementing carbon pricing policies will help us decrease greenhouse gas emissions in a country. This is an observational time series cross section (TSCS) study.
While many economic and clinical studies make use of randomized controlled trials to measure the impact of an intervention, we cannot do this using an observational study like the one we have here. In social science, a conventional solution to this would be to perform a linear regression analysis using time and unit fixed effects. However, this sometimes introduces biases into the results, so in this study, we will be using a score-matching model with the difference-in-difference approach to help us evaluate the average treatment effect of carbon pricing policies on greenhouse gas emissions on the national level. The foundation of this TSCS empirical model was first created by Imai, Kim, and Wang, 2020 in hopes of bridging the gap between the matching models approach and TCSC studies. We will be applying their model to our study.
The model works as follows and heavily relies on the PanelMatching
package, available at https://CRAN.R-project.org/package=PanelMatch:
More information can be found about the model in the following links:
With the climate change crisis being a global threat, countries have been working together to help slow and reverse the impacts of rising greenhouse gas emissions. Multiple mitigation efforts have been used including research and development, carbon pricing, and clean energy standards, just to name a few.
Our model focuses on carbon pricing policies, which can be broken down into two sub-categories: carbon tax and emissions trading systems (ETS). This will be our main independent variable (X). And since we are trying to find the relationship between carbon pricing policies and greenhouse gas emissions, total greenhouse gas emissions will naturally be our dependent (y or outcome) variable. To create a more accurate model, covariates will also be used, and these include GDP per capita, population, and fertility rates. The model can benefit from even more covariates, but further refinements to the above described model will be left for future studies.
Note that we will use the year of the first carbon pricing policy introduced in a country as the benchmark for treatment. For example, if a country implemented multiple carbon pricing policies varying across a time span, we will use the first date as the treatment implementation date.
library(ggplot2)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ tibble 3.0.3 ✓ dplyr 1.0.2
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ✓ purrr 0.3.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(tidyr)
library(dplyr)
library(readxl)
library(reshape2)
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
library(plyr)
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
##
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
## The following object is masked from 'package:purrr':
##
## compact
library(PanelMatch)
carbon_price <- read.csv('carbontax_policies.csv')
head(carbon_price, 10)
climate_indicators <- read.csv('climatechangeindicatorsdata.csv')
head(climate_indicators)
legend <- read.csv('indicatorslegend.csv')
head(legend)
temperature <- read.csv('GlobalLandTemperaturesByState.csv')
head(temperature)
ghg <- read.csv('GHG.csv')
head(ghg)
policies <- read.csv('laws_and_policies_global.csv')
head(policies)
gdp <- read_csv('GDPPC.csv')
## Parsed with column specification:
## cols(
## Entity = col_character(),
## Code = col_character(),
## Year = col_double(),
## `GDP per capita` = col_double(),
## `145446-annotations` = col_logical()
## )
## Warning: 21 parsing failures.
## row col expected actual file
## 19635 145446-annotations 1/0/T/F/TRUE/FALSE United States, Canada, Australia and New Zealand 'GDPPC.csv'
## 19636 145446-annotations 1/0/T/F/TRUE/FALSE United States, Canada, Australia and New Zealand 'GDPPC.csv'
## 19637 145446-annotations 1/0/T/F/TRUE/FALSE United States, Canada, Australia and New Zealand 'GDPPC.csv'
## 19638 145446-annotations 1/0/T/F/TRUE/FALSE United States, Canada, Australia and New Zealand 'GDPPC.csv'
## 19639 145446-annotations 1/0/T/F/TRUE/FALSE United States, Canada, Australia and New Zealand 'GDPPC.csv'
## ..... .................. .................. ................................................ ...........
## See problems(...) for more details.
head(gdp)
population <- read_csv('population.csv')
## Parsed with column specification:
## cols(
## .default = col_double(),
## `Country Name` = col_character()
## )
## See spec(...) for full column specifications.
head(population)
fertility <- read_csv('fertility.csv')
## Parsed with column specification:
## cols(
## Entity = col_character(),
## Code = col_character(),
## Year = col_double(),
## `Fertility rate (Select Gapminder, v12) (2017)` = col_double()
## )
head(fertility)
#Clean Data
# Selecting carbon pricing policies that have been implemented or abolished, remove under consideration countries
carbon_price <- carbon_price[which(carbon_price$Status %in% c('Implemented', 'Abolished')), ]
# Creating dummy variables for type of jurisdiction covered
carbon_price$National <- ifelse(carbon_price$Type.of.juridiction.covered == 'National', 1, 0)
carbon_price$Subnational <- ifelse(carbon_price$Type.of.juridiction.covered == 'Subnational', 1, 0)
carbon_price$Regional <- ifelse(carbon_price$Type.of.juridiction.covered == 'Regaional', 1, 0)
# Creating dummy variable for status of policy
carbon_price$Implemented <- ifelse(carbon_price$Status == 'Implemented', 1, 0)
carbon_price$Abolished <- ifelse(carbon_price$Status == 'Implemented', 0, 1)
head(carbon_price)
Some rows have data on the regional or sub-national level, so we relabel the jurisdiction for easier matching with datasets later on.
# Relabel Canadian provinces under Canada jurisdiction
relocate_carbon_price <- carbon_price[which(carbon_price$National == 0),]
relocate_carbon_price[c(1:4, 14:21, 24), 5] = 'Canada'
# Relabel Chinese provinces under China jurisdiction
relocate_carbon_price[c(6, 8, 10, 11:12, 25:27), 5] = 'China'
# Relabel Japanese prefectures under Japan jurisdiction
relocate_carbon_price[c(23, 28), 5] = 'Japan'
# Relabel US regions/states under US jurisdiction
relocate_carbon_price[c(5, 7, 13, 22, 29), 5] = 'United States'
# Relabel Zacatecas under Mexico jurisdiction
relocate_carbon_price[c(30), 5] = 'Mexico'
head(relocate_carbon_price)
# Relabel EU (27), Norway, Iceland, Liechtenstein under separate country jurisdictions
relocate_carbon_price <- rbind(relocate_carbon_price, relocate_carbon_price[rep(9, 30), ])
eu <- c('Austria', 'Italy', 'Belgium', 'Latvia', 'Bulgaria', 'Lithuania', 'Croatia', 'Luxembourg', 'Cyprus', 'Malta', 'Czechia', 'Netherlands', 'Denmark', 'Poland', 'Estonia', 'Portugal', 'Finland', 'Romania', 'France', 'Slovakia', 'Germany', 'Slovenia', 'Greece', 'Spain', 'Hungary', 'Sweden', 'Ireland')
relocate_carbon_price[c(31:60), 5] = c(eu, 'Norway', 'Iceland', 'Liechtenstein')
relocate_carbon_price <- relocate_carbon_price[-9,]
head(relocate_carbon_price)
# Match relocate_carbon_price with carbon_price
carbon_price[match(relocate_carbon_price$Name.of.the.initiative, carbon_price$Name.of.the.initiative), ] <- relocate_carbon_price
# Check if Norway, Iceland, Countries in `eu` have been left out; add to carbon_price
carbon_price[which(carbon_price$Name.of.the.initiative %in% c('EU ETS')), ]
carbon_price <- rbind(carbon_price, relocate_carbon_price[31:58, ])
head(carbon_price)
After cleaning up the jurisdiction column, we realized that having sub-national levels represent national level interventions my skew our analysis, so we decided to remove sub-national interventions and only focus on national level data.
# Keep only national data
carbon_price <- carbon_price[which(carbon_price$Type.of.juridiction.covered %in% 'National'),]
carbon_price <- carbon_price[-2,]
# Clean up dataframe
carbon_price <- carbon_price[, c('Type', 'Jurisdiction.covered', 'Year.of.implementation')]
head(carbon_price)
As stated above, some countries implemented policies at different times, but I will be using the first policy ever implemented as the treatment.
# Sort dataframe, find earliest year of intervention and keep it in dataframe
carbon_price <- carbon_price[order(carbon_price[, 'Jurisdiction.covered'], carbon_price[,'Year.of.implementation']), ]
carbon_price <- carbon_price[!duplicated(carbon_price$Jurisdiction.covered), ]
carbon_price <- carbon_price[,c(2, 3)]
colnames(carbon_price) <- c('country', 'year_implemented')
head(carbon_price)
# Restructure dataframe so it is easier to work with
country <- carbon_price$country
df <- data.frame(
country= country,
dates=c('1950-2019')
)
df <- cbind(df, colsplit(df$dates, '-', c('start', 'end')))
df <- ddply(df, .(country), function(x){
data.frame(
country=x$country,
yrobs=seq(x$start, x$end),
yrstart=x$start,
yrend=x$end
)
}
)
df <- df[, c(1, 2)]
df <- join(df, carbon_price, by='country')
df$treatment <- ifelse(df$yrobs < df$year_implemented, 0, 1)
colnames(df)[2] <- 'Year'
head(df)
# Clean data by pivoting year columns to rows
ghg <- pivot_longer(ghg, cols=5:53, names_to='Year', values_to='emissions')
# Clean `Year` column by removing X character
ghg$Year <- sub('.', '', ghg$Year)
ghg$Year <- as.numeric(ghg$Year)
# Rename column name
colnames(ghg)[4] <- 'country'
ghg <- ghg[, c(4,5,6)]
head(ghg)
# Merge ghg dataset with df
df <- join(ghg, df)
## Joining by: country, Year
head(df)
# Select relevant columns and rename
gdp <- gdp[, c(1, 3, 4)]
colnames(gdp) <- c('country', 'Year', 'gdppc')
# Drop data before 1970
gdp <- gdp[!(gdp$Year < 1970), ]
head(gdp)
df <- join(gdp, df)
## Joining by: country, Year
df$treatment[is.na(df$treatment)] <- 0
head(df)
# Pivot table
population <- pivot_longer(population, cols=2:61, names_to='Year', values_to='Population')
# Rename column
colnames(population)[1] <- 'country'
# Merge population with df
df <- join(df, population)
## Joining by: country, Year
head(df)
# Select relevant columns and rename
fertility <- fertility[, c(1, 3, 4)]
colnames(fertility) <- c('country', 'Year', 'Fertility')
# Merge fertility with df
df <- join(df, fertility)
## Joining by: country, Year
head(df)
The PanelMatch
package uses country_codes instead of strings. In addition, it requires consecutive integer values for year. Below I will be readjusting the data for these columns so it is compatible with PanelMatch
functions.
df$country_code <- as.numeric(factor(df$country))
df <- df[order(df$country, df$Year),]
head(df)
# Find if year is consecutive
df$diff <- c(0, abs(diff(df$Year)) == 1)
remove_df <- df[which(df$diff == 0 & df$Year != 1970),]
df <- df[!(df$country %in% remove_df$country), ]
df$Year <- as.integer(df$Year)
head(df)
# Drop country, year_implemented, and diff
df <- df[,-c(1, 5, 11)]
head(df)
# Display distribution of treatment variable within data (represented in red)
DisplayTreatment(unit.id = "country_code",
time.id = "Year", legend.position = "none",
xlab = "Year", ylab = "country",
treatment = "treatment", data = df)
# Create sets of matching treated units to control units and determine the weights for each control unit
PM.results.none <- PanelMatch(lag = 4, time.id = "Year", unit.id = "country_code",
treatment = "treatment", refinement.method = "none",
data = df, match.missing = TRUE,
size.match = 5, qoi = "att", outcome.var = "emissions",
lead = 0:4, forbid.treatment.reversal = FALSE,
use.diagonal.variance.matrix = TRUE)
# Use Mahalanobis distance matching
PM.results <- PanelMatch(lag = 4, time.id = "Year", unit.id = "country_code",
treatment = "treatment", refinement.method = "mahalanobis",
data = df, match.missing = TRUE,
covs.formula = ~ gdppc + Population + Fertility,
size.match = 5, qoi = "att" , outcome.var = "emissions",
lead = 0:4, forbid.treatment.reversal = FALSE,
use.diagonal.variance.matrix = TRUE)
# Lagged values Mahalanobis distance matching
PM.results1 <- PanelMatch(lag = 4, time.id = "Year", unit.id = "country_code",
treatment = "treatment", refinement.method = "mahalanobis",
data = df, match.missing = TRUE,
covs.formula = ~ I(lag(gdppc, 1:4)) +
I(lag(Population, 1:4)) +
I(lag(Fertility, 1:4)) +
I(lag(emissions, 1:4)), # lags
size.match = 5, qoi = "att", outcome.var = "emissions",
lead = 0:4, forbid.treatment.reversal = FALSE,
use.diagonal.variance.matrix = TRUE)
Now that we have different matching and weighting possibilities, we will use covariate balance tables to choose the best model for further analysis. Note that I did not use propensity score matching because there were limited number of matches.
summary(PM.results.none$att)
## $overview
## country_code Year matched.set.size
## 1 42 1992 129
## 2 55 1990 132
## 3 58 2014 119
## 4 72 2010 124
## 5 77 2010 124
## 6 81 2012 123
## 7 103 2014 119
## 8 114 2008 126
## 9 120 1991 130
## 10 128 1990 132
## 11 144 1996 128
## 12 148 2014 119
## 13 152 1991 130
## 14 153 2008 126
## 15 167 2013 122
##
## $set.size.summary
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 119.0 122.5 126.0 125.5 129.5 132.0
##
## $number.of.treated.units
## [1] 15
##
## $num.units.empty.set
## [1] 0
##
## $lag
## [1] 4
plot(PM.results.none$att)
# Covariate balance tables
get_covariate_balance(PM.results.none$att, df, covariates = c('gdppc', 'Population',
'Fertility', 'emissions'), plot = FALSE, ylim = c(-2,2))
## gdppc Population Fertility emissions
## t_4 1.573986 -0.1809702 -5.653637 0.2375909
## t_3 1.578307 -0.1897384 -5.452527 0.2075156
## t_2 1.572509 -0.1989132 -5.254124 0.1958354
## t_1 1.565128 -0.2090071 -5.076611 0.1882795
## t_0 1.568980 -0.2191036 -4.917760 0.1559899
get_covariate_balance(PM.results$att, df, covariates = c('gdppc', 'Population',
'Fertility', 'emissions'), plot = FALSE, ylim = c(-2,2))
## gdppc Population Fertility emissions
## t_4 1.611029 0.06037861 -5.189295 0.4511161
## t_3 1.610575 0.05529593 -4.991314 0.4382125
## t_2 1.614358 0.04912887 -4.800636 0.4350622
## t_1 1.617832 0.04168611 -4.637545 0.4360364
## t_0 1.623161 0.03385483 -4.494333 0.4140819
get_covariate_balance(PM.results1$att, df, covariates = c('gdppc', 'Population',
'Fertility', 'emissions'), plot = FALSE, ylim = c(-2,2))
## gdppc Population Fertility emissions
## t_4 0.6660306 0.3373572 -1.3903303 0.4349581
## t_3 0.6608417 0.3390550 -1.2734890 0.4292266
## t_2 0.6397183 0.3405779 -1.1558363 0.4283131
## t_1 0.6310038 0.3411686 -1.0633369 0.4333829
## t_0 0.6489127 0.3418486 -0.9919339 0.4116878
# Plot covariate balances
get_covariate_balance(PM.results.none$att,
data = df,
covariates = c('gdppc', 'Population', 'Fertility', 'emissions'),
plot = TRUE, # visualize by setting plot to TRUE
ylim = c(-4, 3))
get_covariate_balance(PM.results$att,
data = df,
covariates = c('gdppc', 'Population', 'Fertility', 'emissions'),
plot = TRUE, # visualize by setting plot to TRUE
ylim = c(-2, 2))
get_covariate_balance(PM.results1$att,
data = df,
covariates = c('gdppc', 'Population', 'Fertility', 'emissions'),
plot = TRUE, # visualize by setting plot to TRUE
ylim = c(-2, 2))
While there is some unevenness in our covairate balances, the last model (PM.result1
) looks like the best one out of the three. It uses lagged values on a Mahalanobis distance matching method. Below, we will apply this to the difference-in-difference approach to get average treatment effect.
PE.results <- PanelEstimate(sets = PM.results1, data = df)
names(PE.results)
## [1] "estimates" "bootstrapped.estimates" "bootstrap.iterations"
## [4] "standard.error" "method" "lag"
## [7] "lead" "confidence.level" "qoi"
## [10] "matched.sets"
PE.results[['estimates']]
## t+0 t+1 t+2 t+3 t+4
## -7095.811 -4340.528 -6570.553 -9065.459 -10796.747
summary(PE.results)
## Weighted Difference-in-Differences with Mahalanobis Distance
## Matches created with 4 lags
##
## Standard errors computed with 1000 Weighted bootstrap samples
##
## Estimate of Average Treatment Effect on the Treated (ATT) by Period:
## $summary
## estimate std.error 2.5% 97.5%
## t+0 -7095.811 4954.782 -16266.48 2906.549
## t+1 -4340.528 6582.904 -16752.02 9323.534
## t+2 -6570.553 6169.473 -17675.52 6186.391
## t+3 -9065.459 6890.980 -21548.31 4454.044
## t+4 -10796.747 7324.496 -23923.05 3827.369
##
## $lag
## [1] 4
##
## $iterations
## [1] 1000
##
## $qoi
## [1] "att"
plot(PE.results)
From the above summary statistic and plots, we see that carbon pricing interventions help decrease greenhouse gas emissions, but the difference is statistically insignificant (estimates are within 95% confidence interval). While this is the case, further study can be done to this model to verify the conclusion. This may be suggestive evidence that carbon pricing policies alone are not enough for us to help reduce greenhouse gas emission and reach our climate goals.
Examples of further steps/study include: