Influence of media on juveniles’ violent behavior

by Adelaide Chen, Courtney Yeh, Yinbei Xu, Zehra Ali

Overview

According to market research in 2019, the average video gamer spends seven hours and seven minutes each week playing video games, such as, Call of Duty: Modern Warfare, Apex Legends, Red Dead Redemption II, and Super Smash Bros: Ultimate. Although these games have very different storylines and playing styles, they all do share the common factor of violence. For instance, Call of Duty follows soldiers through various war scenarios abroad, in which the gamers are required to kill the “terrorists” in order to succeed in the game; however, this often results in realistic civilian casualties on screen. Due to the graphic and gruesome nature of these types of violent games, many politicians, such as President Donald Trump, and concerned parents claim that violent behaviors of juveniles are the result of violent video games. Thus, the question we wish to address is, whether media (violent video games, violent films, news, uncensored social media) influence juveniles to become more inclined to commit violent crimes. In order to analyze these factors we must assess the various short-term and long-term priming effects these video games have on these individuals.

To understand the relationship between factors that can form violent behavior and acts of violence among juveniles, we use linear regression methods as well as decision trees and random forests to understand the level of correlation between these factors, as well as whether increases in factors that can form violent behavior can be predictors of increases in violence among youth. We use public datasets containing information about juvenile arrest rates, grossing data for movies (to be broken down by genre to isolate the effect of violent films), video game sales, and and mass or school shootings (as proxies for the number of news articles related to violence).

Data

Game Sales Data

This dataset represents the sales of video games over the years that may contribute to an increase in the rate of violence in youths. The data was collected from a website that compiles video game sales data annually for different countries. This information is publicly available.

According to research, 21% of computer game sales were attributed to juveniles (Statista). If we assumed that the possible range of customers for video games would be in the age range of 10 to 78 (life expectancy in the US), we would expect approximately 12% of sales to be associated with video/computer games. The actual volume of sales attributed to juveniles is significantly higher than the expected value.

Additionally, it could be hypothesized that video games have the potential to normalize violence, neutralizing the psychological effects of committing acts of violence. Due to these reasons, video games could have the potential to increase rates of violent crime among juveniles.

Given the combination of these reasons, video game sales could be a strong predictor of violent behavior in juveniles.

Source: https://www.vgchartz.com/yearly/

game sales chart

In this dataset, the top selling 100 videogames for each year between 2005 and 2018 are given. Thanks to the website that collected this data, we have information about how these games are categorized. Based on the standardized categorizations of these games, we can limit the dataset to represent only those of interest (Action, Shooter, Fighting), as shown above.

One issue of using Action, Shooter and Fighting categories is that games in these categories may not always be exhibiting violence. For instance, one of the games categorized to be an “Action” game is Spongebob SquarePants: The Yellow Avenger. However, as there are 600 games in this dataframe, it is not possible to manually go through and eliminate each game that might be a slight outlier like this. This may be taken as a drawback for the models that will use this data to generate predictions.

game sales graphs

The top graph show that violent video game sales peaked in 2013, gradually falling until 2018. The bottom graph shows that non-violent games do not follow the same trend, though, so it can be concluded that the incease of violence-related video games are not simply due to the overall growth of the video game industry.

game sales data

Above shows the final data that is used in the scope of making predictions of juvenile violent crime rates.

Movie Grossing Data

Similar to video games’ effects on individuals, violent movies may downplay the repercussions of violent acts, and may even make such acts look heroic or attractive. Therefore, the revenues generated by movies related to action, horror or crime may be correlated with the number of juvenile arrests.

movie sales chart

movie sales graph

This graph compares the relative popularity of different genres of movies. Most notable is the year 2013, where nearly all violent films see an increase while nonviolent movies experience a dip. Overall, however, there were more movies produced in 2013 compared to previous years.

Mass Shootings Data

Although mass shootings may not have as direct a connection to juveniles’ actions and psychological states (unless they are directly affected by the shooting), the coverage of mass shootings in mass media and social media may again affect youths in a way that might give rise to acts of violence.

mass shootings chart

mass shootings juvenile vs adult histogram

After cleaning up the dataframe into the desired timeframe and categorizing the data, we can use the chart and histogram in the analysis of our question. These visuals show that juveniles make up roughly five percent of the mass shooting incidents in the United States from 2005 to 2018. This is significant because it shows that violent media did not affect minors enough to generate a disproportionate rise of mass shooting deliquents.

Juvenile Arrest Rates

Although the juvenile arrest rates are not equivalent to the number of juvenile convictions relating to acts of violence, it would still be a good proxy for the number of cases that involve juveniles who commit violent acts.

juvenile arrests chart

juvenile arrest graph

The dataframe originally contained all the juvenile arrests in the United States. After cleaning the dataframe into cases from 2005-2018, it seems to show an exponetial decay in the number of incidents as the years go on. There seemed to be a small peek in 2006 at roughly six-thousand incidents, then dropping all the way to two-thousand cases in 2018. A sociological reasoning behind this drop may be the new emergence of video games and the internet allows those individuals to indulge in an activity at home rather than resulting to crime. Although there is not evidence to support this claim, the graph does show that violent media does not have a significant impact on violent juvenile crimes.

Models

In this section of the project, we used different models to predict the rate of juvenile arrests for violent crimes from our 3 features related to violence in the media:

(1) game sales that relate to violence (categories “fighting, shooter, action)
(2) movie sales related to Action, Horror or Crime
(3) mass shootings (actions of those older than 18).

Given that the data related to each of these factors was collected on an ad-hoc basis rather than being part of an overarching, collective dataset, the models we chose had less of an emphasis on feature selection.

Additionally, because the question at hand is related to predicting a rate rather than classification or clustering, models like neural nets and logistic regressions are not appropriate. Furthermore, the years in which data was available for all features is limited to 2006 to 2016, restricting the number of times the dataset could be split. In light of these two factors, a validation set was not used during the model selection process.

Beyond ordinary least-squares regressions, LASSO and Ridge regressions, Principle Component Regressions and Decision Tree/Random Forest models were also taken into consideration to find the most suitable model.

Shown below are data used in the regressions. ‘juvenile arrest rates’ refers to juvenile arrest rates relating violent crime in a given year.

final data

Linear Regression

Ordinary least-squares, LASSO and Ridge regression

First, we set violent game and movie sales, and the number of mass shooting cases as Independent variable in regression model, and juvenile arrest rates for violent crimes as response variable (to be referred to as “juvenile arrest rates” to be short). Upon fitting OLS, LASSO and Ridge models and comparing errors of the models, we can find that there are little differences among three regression methods, but among which, Ridge regression fits best to our data.

So we can just focus on output of OLS regression, From the result of which, $R^{2}$ performs well in this model, which is close to 0.85, meaning three predictors we choose in the model can well-explained variation in juvenile arrest rate, but in the mean time those features have no significant influence on juvenile arrest rate. As a result of which we should improve the model by scaling predictors.

OLS Regression Output

OLS with scaled data

Ascribing the undesirable result of OLS regression to large scale differences among features, we consider fitting ordinary least-squares with scaled data.

Scaled violent game and movie sales do have a significant influence (both p values are smaller than 0.05) on juvenile arrest rates from regression result. This time, $R^{2}$ reduces a little to 0.785. We are trying to find a more reasonable way in both increase significances of features and goodness of fit of model.

Scaled OLS Output

OLS with logarithmic data

Because variables like violent game and movie sales are of big scales, Logarithm may help to increase the sinificance of them in the regression model.

Comparing this model with the initial ols model, $R^{2}$ increases to over 0.95, which is very close to 1. At the same time, though three features in this model have no significant influence on juvenile arrest rates on significant level 0.05, the p values are very close to 0.1, which means both significances of independent variables and coefficient of determination ($R^{2}$) of this model are better than what we have tried above.

OLS with logarithmic data output

Principle components Regression

Though this model has only three predictors, we can consider dimension reduction procedure because there is some relationship between violent game and movie sales.

Using two principle components we get from Principle component Analysis (PCA) to do regression, we can see the first principle (largely contributed by violent game and movie sales) has definitely significant influence on juvenile arrest rate. But principles components as predictors in this model is less interpreable than other regression approaches, and $R^{2}$ doesn’t performs better.

PCA Regression Output

summary on linear regression

Comparing models mentioned above, Ordinary Least Squares on logarithmic data fits best to this dataset, which means with this model we can well predict juvenile arrest rate with violent game and movie sales and number of mass shooting cases.

Decision Tree Model

Decision trees can be thought of as asking a set of questions about the data at hand in order to reach accurate prediction values. They work for both numerical predictions and classifications. In the scope of the task at hand, the decision tree will ask questions such as “Are game sales larger than a certain value?” to help partition data into different branches, which will then be used to answer other threshold questions. We repeat this process to ultimately reach predictions for each datapoint.

Upon training a decision tree model, we see that it achieves an $R^{2}$ of 1, but with test data it performs worse than linear regression models, with an $R^{2}$ of 0.78 and higher error.

Random Forest Model

Although the decision tree was able to perfectly predict the arrest rates in the training set, it did not perform as well on the test set. Therefore, it would make sense to try out a random forest model. Random forest models crete multiple decision tree models, trained on samples drawn from the data with replacement (similar to bootstrapping). Then, by averaging the results of the decision tree models, the random forest model is able to produce better predictions.

The random forest model performed worse than the decision tree model in the training set with an $R^{2}$ of 0.95, and performs much worse with the test set. Therefore, a random forest model is not the best fit to create predictions for this data.

Conclusion

Our task was to study how violence in media (such as movies, video games, and reports of mass shootings) might affect the number of violent crimes involving juveniles. In order to do this, we used movie revenues and video game sales as proxies for popularity, and directly used the number of mass shootings.

We then evaluated the accuracy and effectiveness of several different models: OLS, ridge regression, LASSO, principle component regression, decision trees, and random forest.

OLS, ridge regression, LASSO: limited effectiveness, all with p-values above the significance threshold of 0.05. But a OLS model with logged predictors, which effictively reduce the scale of them, performs better than the initial dataset, and the goodness of fit of this model proves it to be a good model.
Principle Component Regression: use less predictors (only two principle components) in the model compared with ols, but it doesn’t improve the model significantly because of lower $R^{2}$ and lack of interpretation. So dimensional reduction in a four-variable model is not necessary.
Decision Trees and Random Forest: The decision tree model performed worse than OLS regression, both in $R^{2}$ and error. Furthermore, the random forest model also performed worse than OLS regression.

Looking at the results of our models, it is possible to conclude that we have generated a model that could predict violent crime rate among juveniles from three stats: violent video game sales, violent movie sales, and the number of mass shootings per year. Given that the log-OLS model had an $R^{2}$ of 0.96, this model may be able to make predictions with high accuracy in the future.

Having a model that is able to predict violence among youths may lead to better resource allocations for the years to come (i.e. increase counseling services availability for juveniles, or to put effort towards reducing illegal gun sales to those underaged).

This model has much room for improvement. The model draws upon only three features - these could be expanded to include different factors again related to media, such as increases in violent TV show viewership, or to factors even unrelated to media to create a more holistic approach to predicting violent crime rates among juveniles. Additionally, to further enhance the predictive accuracy of these models, more datapoints could be included (stretching the training data farther into the past) and see if accuracy increases; at the same time going farther into the past my decrease relevance to the current-day context and hence make the model also less relevant.

Ethically, these models will not have a very direct impact on individuals’ lives, as the predictions will not lead to certain groups being incracerated or policed at higher rates. This is mainly due to the anonymity of the data types chosen as explanatory variables - movie goers or those who watch news are not proportionally from one ethnic group or economic status.

Beyond being used to generate predictions, these models (mostly the regression models) can also give insight into potential causal links as well, if there is a hypothesis that violence observed in media has pyschological effects that increases violence in juveniles. For these models to be used for inference, however, they will have to be controlled for many variables and adjusted. This simply shows the versatility of regression models.