Who is at Fault? Metis Project 4

4/5/2020

Finally it had come: the project were we were going to use natural language processing (NLP) and I knew I wanted to do something a bit out of the box.

My PhD involved a fair amount of moral psychology work and thinking about what others take offense to cross-culturally. We know some basic things: pretty much every culture is appalled by incest, for example. But what else could we find out?

Because I also worked (in various capacities) at UChicago's Booth School of Business, I knew that teaching soft skills is a serious priority. How can we best get along with others and make sure that group work flows smoothly? Being able to predict when others think we have crossed a line, or better understanding how they might react to our behavior more broadly, could help us to have more fruitful interactions with each other, leading, potentially, to more productivity within business contexts.

Where might I find narratives detailing a moment when someone was unsure of what was his fault? Where could I find crowdsourced determinations of whether that person was at fault?

REDDIT!

In this project, I sought to answer three questions:

1. Under what circumstances are people unsure of whether they’re at fault?
2. How do others respond to those narratives?
3. Can we predict whom others will find at fault?

The Data

My data were from a subreddit called Am I the Asshole? or AITA. This subreddit is often the butt of many a pop culture joke, which gave me all the more reason to take it seriously as a thing to analyze and make sense of.

Using the Reddit API, I was able to gather around 800 posts and their comments from AITA. I stored these data in MongoDB.

1. When Are We Unsure of Fault?

To answer my first question, I used topic modeling on all of the posts I had. I ended up using a non-negative matrix factorization (NMF) model with a term frequency-inverse document frequency (TF-IDF) matrix. So what topics came up?

It's interesting to note here is that family (kid) means that you’re the child in the family, family (adult) means that you’re the adult in the family.

Which topics were discussed the most?

People seem to talk most about work and friends. This makes sense: these are situations where the impression you make likely matters to you. There is a middle level of closeness, as opposed to family who are stuck with you and people you might meet in passing, whom you won’t see again. Instead, these middling levels of knowing someone lead to more need for image management, implying a potentially greater likelihood to second guess one's own behavior.

2. How Do Others Respond?

Though people talked the most about work and friends, the most commented-upon topic was, by far, family where the writer is the adult. Secondarily, people also like to comment on posts about weddings and posts related to family where the writer is the child.

What this possibly means is that other people really have opinions on how one should run one’s family, but people are somewhat less worried about how their actions will be received in their own families. When we tell narratives about our own families, we might not expect that others are evaluating our behavior, but, in fact, they are.

Something nice here, though, is that while you might be very worried about how your friends and colleagues perceive you and your actions with them, it's possible that they're not really all that worried about it.

Another really nice finding is that the more positive we are, the more positive others are in response:

I used TextBlob and IBM Watson's Tone Analyzer to get sentiment (positive, negative) and tone (a range of emotions) for each review. What I found is that peoples' sentiment actually mimic's the sentiment of the post they're responding to. There's been a lot of research on how humans mirror each other-- usually in person-- but this is preliminary evidence that mirroring is occurring both in terms of sentiment and via written text.

Practically, this is really interesting as a best practice for how we should engage with each other. Though this is just a correlation, it is possible that acting positively inspires others to be positive.

3. Can We Predict Who is at Fault?

Finally, the question you've all been waiting for! Who is the asshole?! Is it me?!

The short answer is, sadly, no: we can't predict who will be at fault given the data we have. All metrics were similar across people who were deemed assholes and people who were not deemed assholes. A classification model also didn't have much explanatory power.

If I had to guess, the types of violations, rather than the topics discussed (the people violated), or, perhaps some combination thereof, are what would actually allow us to predict who is deemed at fault. Thus, answering this particular question might require more of a qualitative approach followed by a quantitative one.

BUT! I did find one difference between people deemed at fault and people deemed not at fault:

Among other metrics, the average score indicated that posts where the author was not at fault received more upvotes.

What I think this means is that people are upvoting or downvoting to flag as "asshole" or "not the asshole" instead of writing their opinion in the comments, which would then be tallied by the Reddit bot.

This might be why people think that this subreddit is full of apologists: you’re more likely to see upvoted posts at the top (due to Reddit's algorithm), and upvoted posts are more likely to be flagged as not at fault.

So what can we conclude about determining fault overall? Topic, sentiment, and tone (emotion) do not signal whether someone is at fault. I do think there is something giving this signal, but that it likely has to do with violations related to autonomy and obligation to others— neither of which was picked up in the metrics used.

0 Comments