No Man’s Sky: text mining of comments related to the game

No Man’s Sky is an action-adventure survival video game released in August 2016. Right from its first reveal, the game was highly praised, which caused excessive expectations and attention on the small team designing the game. Unfortunately, when the game was finally launched, players’ reaction were often negative, citing the lack of features that were shown in earlier coverage.

This is what I will cover today in this article, the change in tone and how the feelings about the game evolved over time. For this analysis, I extracted a substantial sample of Reddit comments for the year 2016. Using these comments, I will try to recreate the narrative of this game’s history, with some attention on the mood of the users.

no-mans-sky.jpg

A. Data preparation

My first task was to extract interesting text about the game over time. I chose Reddit as the source because of interesting quotes found on the Wikipedia page of the game. As it seems, Reddit, specially the No Man’s Sky subreddit, was an extremely active hub with a tense atmosphere.

  • Quote from Wikipedia before the launch date:

No Man’s Sky developed a dedicated fan-base before its release, with many congregating in a subreddit to track and share information published about the game. Sam Zucchi writing for Kill Screen proposed that the players anxiously awaiting No Man’s Sky were a kind of religion, putting faith in Hello Games to be able deliver an experience that has otherwise never been offered by video games before, the ability to explore a near-infinite universe.

  • Quote from Wikipedia after the launch date:

The subreddit forum had become hostile due to a lack of updates from Hello Games or Sony, leading one moderator to delete the subreddit due to the toxicity of the comments, later undoing that action on further review.

Now, thanks to some helpful redditors, all I had to do was to create a free account on Google BigQuery to have access to a warehouse with GBs of Reddit comments. Unfortunately, having a free account limited the size of my extraction. I narrowed the query to:

  • 15000 random comments,
  • written in the year 2016,
  • from the No Man’s Sky subreddit.

I exported the comments to a csv file and continued with my usual tool: R.

 

 

B. The big picture

B.1. Date distribution

My first questionings were about dates: when were the comments mostly written? As seen in figure B.1., a great proportion were posted around the launch date, in August. This first spike was expected, but there are three other spikes that are of interest: when the launch date was announced in March, when the launch was delayed in May, and when an important update came out in November.

N_Comments.png

Figure B.1. – Distribution of comments over the year 2016: bars represent the number of comments written for each week of 2016 (from a sample of 15000 comments).

B.2. Comments polarity

Now that I’ve seen the numbers, I wanted to have a general impression of the comments polarity (or mood) over the year. In this case, the polarity would tell if the opinion shared in the comments is positive, negative or neutral. A very simple and effective method for catching the polarity is by counting the positive and negative words. To implement this method, I used the Syuznet package which implements a emotion lexicon, and returns the number of positive and negative words in the comment.

Now, did sentiments evolve through the year? My intuition was that comments would be more negative following the official launch date. As seen in figure B.2., the year started with positive comments, but went downhill after. There was a spike of positivism before and after the launch, but the launch itself was very negative. At the end of the year, maybe because of the foundation update, comments are now getting more positive. The green line is essentially a weighted average of the comments polarity (for more details, consult the source code). As such, there is more uncertainty at the beginning and the end of the year, as there were less comments.

Comments Polarity.png

Figure B.2. – Comments polarity over the year: the green line is a smoothed relative measure of the comments mood by checking whether each comment has more positive or negative words.

 

B.3. Time intervals

To better understand the evolution of comments and put it in a simple way, I divided the year 2016 into six parts. The cuts were done in a way where important events would separate the sections. Figure B.3. shows all six color coded time intervals. The explanation for each of them are:

  • January 1 to May 24: This interval starts at the beginning of the year and ends before the announcement that the launch was delayed. During this period, the initial launch date, June, was announced. (Announcement)
  • May 25 to July 28: This period follows the delayed launch until the first video of a leaked copy appeared online. (Delay)
  • July 29 to August 8: This period starts with the unfortunate leak and stops before the launch date. (Leak)
  • August 9 to August 18: This period starts at the launch date until the first real update of the game: patch 1.04. (Launch)
  • August 19 to November 26: This period sees the game evolve from patch 1.04 until just before the patch 1.10. It must be noted that there were no updates from September 18 until November 27. (Minor patches)
  • November 27 to December 31: This period closes the year starting with the Foundation Update (patch v 1.10), which was a major upgrade of the game. (Major update)
Comments separation.png

Figure 3 – The six time intervals investigated : Announcement, delay, leak, launch, minor patches and the major update

 

 

 

C. Analysis of the six periods

Now, what words and concepts describe well each time interval? Figure C.1 shows the words that are important for every interval. The words size is proportional to their computed importance. See the next box for more technical details:

The size of words in the cloud are proportional to the weight the tf-idf algorithm yielded. Important words, in this context, are words that often appear, but in a limited number of time intervals.

Additionally, all words were stemmed, meaning that words were simplified to their root (ex: waiting  wait), y transformed to i, long words shortened, etc. This assured that variations of the same words would not be separated.

wordcloud.png

Figure C.1:  The most important words for the six periods of 2016

As for figure C.2, we see only words that are associated with non neutral feelings. Green words are associated with positive feelings, red words are associated with negative feelings. With a quick glance, we see obvious words like destruct, delay, flaw and complaint. Unfortunately, some words do not translate well with our context. The fact that the word refund is in green is an obvious example of this; we can safely say that it should be red.

Sentiment wordcloud.png

Figure C.2:  The most important words related to positive of negative sentiments for the six periods of 2016 (green is positive, red is negative)

The next paragraphs revel a glimpse of the story for each of the six periods.

First period – January 1 to May 24 – Announcement

In the first period, people were excited about that game, discussing about rare footages of the game that existed at that time. Of course, June is the most important word here, as the initial launch date was announced in that period. Also, people were harnessing the potential of the game. This means we get general words that relate to potential features of the game: loot, creation, destruct[ion], respawn and offline.

Here are some defining quotes from that period of time:

  • This is one of the best videos yet! The possibilities of what we may see just has me so excited. The silhouettes of the different bases just blew me away. So hard waiting for June 21st.

  • I’m sure this is something that is going to be tweaked as they polish the game up to release.

Second period – January 1 to May 24 – Delay

The second period starts with the announcement that the game would be delay[ed] until August. Interestingly, optimistic people said that they are polish[ing] it for the eventual release. Were they adding additional features like feed[ing] of the animals or tradit[ional] multiplayer? Notable Reddit comments for this period:

  • So all of the trailers, footage, interviews, trusted news sources, etc. we’ve seen of a fully functional game was false? If it is delayed, they’re just taking the time to polish the game. It’s not uncommon at all for delays. Just look at Uncharted 4, delayed too many times, released as a great game.

  • The delay was probably for polishing.

Third period – July 29 to August 8 – Leak

The third period covers a little bit more than a week before the game launch on Monday, August 8. It starts around the date a Reddit user leak[ed] footage from a copy of the game he got from eBay. As other people got leaked copies, the initial previews were not positive, citing exploits that heavily bypass the game and some design flaws that tarnished the game. Notable Reddit comments for this period:

  • The leaks confirmed my purchase. Sony should be thanking them.

  • Basically, for the past few years this whole sub has overhyped the game so far that their expectations are way too high. The leaker gave some info saying the game is just a normal video game and some stuff wasn’t as expected, so the sub imploded into itself like a black hole.

Fourth period – August 9 to August 18 – Launch

Finally the game came out at the beginning of the fourth period. Now that it is out, people are talking about actual specs of their characters, how they managed their slots, which limited number of slots for both upgrades and resource space. They talk about Gek, one of the sentient species.  The game requires a powerful computer, therefore people talk about the computer parts: GTX, Radeon and AMD graphic cards. Unfortunately for many people, the game can’t be enjoyed – whether it is because of their unmet expectations or their inadequate computer – and requested a refund. Notable Reddit comments for this period:

  • It’s pretty horrible that this game got so much attention and a AAA pricepoint. It should have just released on PS4 with half the marketing budget. Add one to the refund column. Just not playable on my xeon, 950, 16 GB system regardless of settings.

  • I have tons of slots for my suit, but none for my ship. Driving me nuts.

Fifth period – August 19 to November 26 – Minor patches

In the fifth period, the Reddit community is home to lots of toxic comments. The small updates are not bringing value which means that again, refund is still a strong word. Redditors go as far as to say that the game was a scam. Tweet appear as notable words because the game’s developer, Hello Games, hilariously tweeted through its official account “No Man’s Sky was a mistake”. Of course, they blamed it on a hack. Still, there are some threads on the actual game, with words like Gek, Nada (a character from the game), slot and waypoint that came out as important. Notable Reddit comments for this period:

  • Oh yes possibly the biggest gaming scam in recent history, it sure was “something unique, vast and inventive”.

  • Get that refund and buy the game again whenever you feel like it’s worth the money 🙂

Sixth period – August 19 to November 26 – The major update

In the sixth and final period, the comments follow the Foundat[ion] Update (patch v 1.10). Following this encourag[ing] update, many resources appear on this word cloud, i.e.: plutonium, zinc, iron, heridium, platinum. A lot the popular words that appear in this time range are about technical lexicon of No Man’s Sky. There is still some complaint[s], but this is sign that the subreddit is indeed healing of its wounds. Notable Reddit comments for this period:

  • Plutoniummmmmm! My most beloved element!

  • I bought it after the update and love it. I really hope they add 4K support and maybe PSVR support in the future. I’d also like to see them stack ship component upgrades so save slots.

Leave a comment