Grabbing a GDF from Facebook data

Sometimes I am asked to do quick social network analysis for friends or family, I am quite often surprised how often people offer me their Facebook password to do the analysis!  One thing you should never do give out is your Facebook password to anyone, no matter how well you know them.

To do the analysis I require a GDF file, there is a Facebook app that will generate this for you and this is a quick post to show people how to get the GDF for me. There is a whole other question around how much you want to trust an app to poke around your data but the app is well used and is limited to what it can access by the Facebook development tools so it can’t steal your password and the such. here is how to grab the GDF.

1) Log in to Facebook

2) In the search bar search for ‘netvizz’ like so:


3) Click the personal network link, on the next page make sure ‘friends’ like and post count’ is not ticked and press ‘start’:

Screen Shot 2014-07-18 at 16.11.32

4) Right click the ‘gdf file’ and click save as or your browser’s equivalent.

Screen Shot 2014-07-18 at 16.18.21

All Done! Still stuck? Follow the steps in this very short YouTube video:

Posted in Data Analytics Tagged with: ,

Big Data Problems

Writing about an on going conversation that you are having with someone when you haven’t quite got your head around yet is a difficult task, it does help you to think the problems through but it leaves you with a mess of a blog post.

Mark and I have been having discussions around big data analytics; in particular we have been talking about a post that Mark has written a blog post worrying about the outputs from many Big Data projects, such as the pretty node/edge pictures and tables of key phrases that frequent my Twitter stream. Being a student of Mark’s for some time and now having him as a PhD supervisor, I think I know Mark quite well and when he asks you to read blog posts of his it is usually an intervention to make you think. The core of the blog post Mark likens the creation of these outputs to the magician pulling a rabbit from big hat, it looks cool and we all gasp but he says it’s magic, not science, produced by algorithms that a handful understand. His worry in his own words are that outputs are produced by “vast calculations that lead to the dulling of critical thought” and that ‘Science is underpinned by ethics and politics; magic isn’t’.

Mark has nudged me to write a response to this post, and at this moment while I sit starting at my screen I find it difficult to know how to start my reply. Not because I disagree with Mark, but because I find myself with more questions – which I suspect was his aim as my PhD supervisor.

When trying to get the idea of big data over to somebody new to the term both myself and Mark often try and find a practical example of successful big data analytics in the wild. The example we often cite is the mining of opinions during the 2012 U.S by the electoral teams and how this mining affected the use of social networks by the candidates. This is an interesting example to cite because earlier in the year, the technical adviser to the democratic electoral team, Harper Reid, rallied critics of big data with his famous “Big data is bullshit” talk, he was saying that ‘Big’ is just a buzzword to make you buy into storage technologies, that data analytics is just analytics and that if ‘big data’ is in fact a thing we should forget it as a buzzword and be concentrating on ‘big answers’. Harper tells us that the sort of analytics is the sort of thing we do in excel or with a database query and that the ‘Big’ is all business bullshit hype.

I take this as a suggestion that the jump from just plain data analytics to big data analytics is simply about the size of the dataset, that the techniques are the same and that we should all just be doing stuff to data, thinking about it and not being put off by everything having to be so damn big. While Harper may be right that the techniques are the same, I don’t think the situation is the same, what people are calling Big Data Analytics are the things that lead to Big Answers, these big answers lead to Big Innovations which end up at Big Fuck Ups. When I think of big data there are a few big interventions that spring to mind, these are Tempora and PRISM.

I think Mark and Harper are right about different things. Harper is probably right that being Big makes us think we are doing something extra special clever at the Macro level when the techniques are just the same but scaled up. Mark is probably right when he say many of techniques are akin to pulling Rabbits out of a hat.

Thinking about my own experiences of playing with data and running some analytical technique over them, I don’t think of myself as a scientist or a magician, I’m just playing. I’m don’t consider myself a journalist when putting data in to the social networks and I don’t consider myself a scientist when I extract it and run an analysis over it. In fact I don’t really initially expect anything much more than a laugh. Something recognisable does come out of the analysis though, and I kick this output around the same networks we are analysing, at this micro level of analysis I always feel there a message saying ‘this is play’. At this level I find it a little unfair to tell people playing with analytical techniques and making graphs that they should be a scientist, play is important!

Even at this low level while playing with my own ‘little’ data, there are hands trying to influence how I play. The data collection services have an interest in what I put in to their services and the algorithm writers are defining what comes out. They have an interest in how I play, which makes the analysis of the data very hard, but this is exactly why playing is important, to me play is part of a process of working out who is trying to influence what. This isn’t an easy thing to do as both the services and algorithms are well aware that of my playing and want me to be the hero in the story and will display the data in the way it knows I want to see it. It feels kind of relaxing when Twitter shows me tweets relevant to my life, or the network graph makes me the biggest node.

So when I see the message that these playful techniques can easily be scaled up to big data analytics and eventually big answers I do worry. The resulting big innovations have real effects on lives and unlike the scenario where I play with spread sheets to explore who is manipulating me using what data I have no control over what data the NSA is collecting or what algorithm they are using.

I don’t just see magicians and scientists, I see people playing too and while play is all fun and games there are hands influencing just how we play and perhaps it isn’t so playful scaled up to big intervention level. The whole scenario gives me somewhat of a split personality. On the one hand play may be the way to find out how these data services and algorithms are guiding the information that goes in  how it is analysed, yet on the other hand those exact same data services and algorithms are guiding how we play so that they get the results they want. When scaled up these skewed results and their resulting big interventions have major effects society. The transition to ‘Big’ anything is where we must be scientists.

Posted in Data Analytics

Sim University 5 Game Review


You know those people who establish dynasties so powerful that their names echo through the centuries? I am one of them. – El Vice Chancellor

Sim University (U.S name: Theme University) is a series of hit real-time strategy simulation games created and developed by a one man indie outfit. The game play sits at a crossroads between SimCity and The Settlers.

In the latest installment, Sim University 5, the player is given the role of El Vice Chancellor, a character with a CV full of honorary doctorates who has been installed to preside over the Democratic People’s Republic University. In campaign mode a different scenario is presented to the player at the start of each mission; the player is typically responsible for developing the University towards a certain goal, in early missions the goal is set by the cabinet of the fictitious country in which the University is based but as the game progresses these become crazy schemes that El and the Capitalists (See: Factions) dream up at the end of the previous scenario. Each scenario typically involves the building of new buildings, satisfying the staff/students’ needs, issuing edicts and embezzling funds from the treasury; all while keeping various powers and non player characters happy. At the end of each scenario a final score is determined based on the overall happiness of the students, the size of the University’s treasury, obnoxiousness of the VC’s personal number plate, size of the Vice Chancellor’s Swiss bank account and number of ‘friends made in high places’.


El gets to give speeches from his balcony to scare the masses. Fear is a simple but effective way to control NPCs.

Money will be needed to achieve the scenario goal and it is generated by students, ensuring these students are trapped in the system is an early objective in most scenarios. Tools such as the  ‘sexy course name generator’ will only be effective for shot periods of time and the player will need to invest in ways to trap students in ‘the system’ so they continually pay up. The few needs students have are represented by floating meters above their heads, the effects of a low aggregated score of these meters will have negative effects such as the sacking of El (Game Over!) or less students numbers (which results in less money coming in for things such as sexy license plate – and therefore a lower score). All is not lost though as effects be negated using a number of edicts that El has access to, such as rigging surveys or simply declaring that a league table doesn’t count.

The University will need staff to teach the students, these staff also have meters that represent their basic needs and the extent to which they are being met, these link to the catering quality (hunger), office quality (housing) and entertainment, furthermore each staff member has an affiliation with a political faction, which links their respect for the VC to the happiness of the faction’s leader and how well the faction’s goals are being met. The factions are as follows:

  • Communists: Mostly the lowly support staff and lecturers of the University. Communists like to see more people employed, everyone with an office over their head and a low-income disparity. While it is not that important to keep these happy it will stop the odd library closure, toilet malfunction and will stop Unison invading your University (which ultimately does nothing anyway). You can use edicts on the communist class to improve the entertainment meters of the capitalist class (see: force uniform edict)
  • Capitalists: The upper class citizens of the University. Like to see luxurious offices, high-class entertainment, and a growing treasury. Can be difficult to replace if they get upset, and often require an ‘embarrassing secret’ to keep in line. The Capitalist faction is valuable for keeping wealthy degree tourists flocking to your University for honorary degrees which in turn influences the Cabinet opinion of you, which is important to stay in power. Throughout the scenario missions your capitalist advisers will usually end up arrested, exiled or kidnapped and replaced with clones forcing you to start again from scratch for the next scenario.
  • Intellectuals: The educated staff in the University. They like high quality lectures for students and need high liberty ratings to stay happy. Generally a small faction.
  • Militarists: The security staff in the University. They like the University to be an ‘orderly’ society (the average safety rating higher than the liberty rating) . This puts them at odds with the Intellectual who prefer more freedom and less military presence. High militarist support is needed for special actions like declaring Martial Law.
  • Loyalists: El’s die-hard fans. They value a strong and pompous Vice Chancellor, and think the idea of elections, free or not, is generally preposterous since El  is the only candidate you will ever need.


Building a room to deal with academics with ‘Bloaty head syndrome’ returns in the 5th of the series.

The game is reasonably well balanced and I enjoyed the first couple of missions, the problem is that Sim University doesn’t give you enough reasons to replay each scenario as the game progresses they are predictably similar. Unbelievable and crazy scenarios are laid out in front of the player mission after mission, expecting the game to let up at some point you eventually get bored of the repeating pattern.

A well designed game let down by repeatability

Posted in Computer Games, Education Tagged with: , , , ,

Reoccurring phrases in the Facebook comments section of political parties

A while ago I used a combination of Facepager and R to find reoccurring phrases in the comments section of the BNP’s Facebook page. The idea was based around a conversation with a friend where I explained that I found it hard to talk to BNP supporters because I just couldn’t get to grips with where they were coming from, I wondered if we could use data mining techniques to get an idea of what makes people with extremist political views tick. I also wondered if we could use similar techniques over news articles to find out which news stories these types of people grasped on to an why.

While originally I wrote a script to work with the BNP’s Facebook data, I have now updated my script to include other UK political parties and an insane organisation Britain First.  I’ve just realised that I have forgotten UKIP, while temporary forgetting UKIP  is a pleasant experience that I recommend to everybody it has left me annoyed that I will have to run the script again and update the post.

Just to reiterate what I am doing; I am using Facepager to grab the last 10 pages of posts from the Facebook profiles of various political organisations, then I am digging down further to grab the comments on each of these posts; it is this data I am finding ngrams in.

You can find the script at the end of my post, the file it generated was huge and I’m not a pro at making data look good so you will have to put up with this huge image it has generated (click to enlarge):

commentsI’ve not had time to go through the data but there are still some interesting observations that may be worth investigating. I think what really stands outs is that data mining doesn’t give you any answers on its own. It just gives you more questions and you really need to know the data well. Here are my immediate thoughts:

1) The curious case of “89dbd2 nice unanimity lisbon”

This is a phrase that pops up in the comments on Green Party, Conservative, Labour and Liberal Democrat pages and I guess it is to do with qualified majority voting. It’s still interesting that the phrase pops up and I will have to find out what was in the news and why explore the dataset to see why it comes up (could be spam, see #2). The 89dbd2 is also puzzling and why do all the patterns across all the parties include it? Does this suggest that the comment is copied and pasted across party message boards and the letters are some kind of shortcode the Facebook API uses?

2) It is difficult to separate real opinion and spam

Both Labour and Conservative posts were dominated with comments containing the phrase “vote ukip” or a variant of it. Many of the phrases looked like spam rather than a legitimate opinion or comment. The phrases was also littered with the odd looking code “d9ԍ”. I wonder what this is, I wonder again if it is a short code or encoding error? It appears that this is more of a problem the more popular the party. Which makes sense I guess, if you want to spam a message then I guess you post it to the most popular page.

3) Are there lots of issues in the green party around the deaf?

Again – until I have explored the data set further it is hard to tell is something is a message that is being spam or a legitimate capture of a recurring phrase, but it seems that issues around the death have lots of support from the green party with the phrase “support for the deaf” and variants of it being the most common in the Green Party data set. Before I go and explore the dataset it really does make me think through the issues with data, are these spam comments or was a particular post around deafness shared more? Perhaps the greens have lots of polices regarding help for the deaf that I was unaware of? Maybe there was a big news story regarding the lack of support for the death?

4) At the end of the day

My absolutely favorite thing about this data set is that the biggest six-gram from the BNP was the phrase “at the end of the day”. The phrase uttered when there isn’t an argument because it’s just common sense, guv.

5) Supporters of BNP and Britian First are obsessed with “this country”,  what people should be doing and practically anything to do with Islam

In most of the data sets the most popular tigram is the names of the party, for example “the green party” or “the labour party”, which was true for all the parties apart from the BNP and Britian first, which both went with “in this country”.  Both of these parties were telling people what they should be doing with phrases like “it should be”, “we need to”, “they should be”, “get rid” and of course “if they don’t like it” you can “send them back”. Muslims really get a hard time in the 6-grams column with “this is not a muslim country” and similar variants coming top for Britian First. It’s all very predictable in fact I’m thinking of making a hate speech phrase generation app using this data so you can fit in down your local skinhead pub.

I used Facepager to generate the CSV, here is my R script:





Posted in Data Analytics

Installing Learning Locker on Mac OS X with MAMP

The Learning Locker requirements page recommends a LAMP style stack to install on, I use Mac OS X, which is close enough and because I want to keep my Apache, MySQL and PHP stuff self contained I use the free version of MAMP which is great for the most part but requires a little bit of extra fidderling when you want to compile new PHP extensions, like the Mango extension that you will need for Learning Locker, although the documentation says I can use MySQL and I didn’t want to go against the grain. Installing MAMP is a case of downloading the application and moving it to your applications folder.  I also use Homebrew which is a package manager for the Mac OS which I really recommend it if you looking to install *nix like tools, although be honest something like that should be included in the OS by default.

1. Install Composer

Composer is a dependency manager for PHP and required for Learning Locker, installed via Homebrew at the terminal with

2. Create Project

Then you will need to create your project. You can create it where ever you like, but it makes sense to serve it from the same directory that Apache is serving from. I don’t server it from the default MAMP directory but from /Users/David/Sites. You can create the project with:

3. Artisan

Change in to the learninglocker directory and run the following artisan command:

 4. Install and run MongoDB

Did this easily with brew

 5. Install MongoDB PHP extenson

MAMP doesn’t ship with the PHP source to compile I had to download PHP source in to the MAMP directory for the PHP version I was using, i.e /Applications/MAMP/bin/php/php5.5.3/includes/php . This means I had to go to the PHP site and download the source for 5.5.3 and put it in includes/php. Make sure you get the right version! Then:

find the appropriate php.ini in MAMP and add this line:

Reset MAMP

6. Final config

add mongodb config to database.php

Add the following service provider to app/config/app.php in ‘Autoloaded Service Providers’


Then you can head over to localhost/learninglocker to get started

Posted in Data Analytics, Education, PhD Tagged with: ,

The recurring phrases of BNP members.

I wanted to find out what people were saying on the facebook pages of extreme political parties. At this stage I wasn’t bothered so much about what the party was saying, but what the comments on the posts where saying. The task was to find n-grams in a CSV file and  I decided to do it in R. Originally the CSV was created from comments on 10 pages of BNP Facebook posts, I generated the CSV quite quickly using an application called FacePager, it was very easy to do and if you are interested you can find instructions on this post here.

The final script is here and is quite easy to follow:

To change the number of words in the sequence you want to look for this line and change the number to whatever you want:

I haven’t finished playing with the data yet, but if you are interested I have dug the most popular phrases out, it is quite obvious that ‘at the end of the day’ to the BNP it’s a case of us Vs them.

3 Word phases

Count Phrase

156    in this country
130    all the way
103    a lot of
99    bnp all the
89    to do with
87    got my vote
85    in our country
80    in the uk
78    dont like it
78    if you dont
75    the bnp are
75    the rest of
75    the right to
73    we need to
72    the british people

4 word  phrases:

Count Phrase

49    in our own country
47    nothing to do with
43    if they dont like
39    if you dont like
35    the rest of the
34    the end of the
32    have the right to
30    if you want to
30    in the first place
30    they don’t like it
29    at the end of
29    end of the day
28    in the name of
28    this is our country
26    our way of life
26    send them all back

5 word  phrases:

Count Phrase

26    at the end of the
26    if they dont like it
26    the end of the day
16    has nothing to do with
14    if you dont like it
12    for the sole purpose of
12    sole purpose of child exploitation
12    the sole purpose of child
11    bring back the death penalty

6 word phrases

Count Phrase

25    at the end of the day
12    for the sole purpose of child
12    the sole purpose of child exploitation
8    any plans for you to review
5    for you to review cannabis laws

Posted in Data Analytics

Using Facepager to find comments on Facebook page posts

I’ve been trying to find ways to poke the Facebook graph that will be easy for people who find working with the API directly difficult and currently I am using a tool called Facepager to extract data. After a few minutes working with the program it seems easy enough to get it to poke the Facebook Graph for the comments and then store them in a SQLite database. I’ll be looking to work with the results in R… but one thing at a time!


Grab posts from Facebook in five easy steps

1) After downloading and extracting Facepager for your system you should see a screen like the one in the image. Click the new database button and give it a filename. This will be your SQLite database, I’ll be using this in R in other posts, but for now you can just reload it it in Facepager to do stuff.

2)Then you want to click add node. The node name is the name of the Facebook page you want to explore. For example if you want to get all the comments from posts on the Minecraft page at: then the node name is minecraft.

3)Select what you are after in the ‘Resources’ Tab, I went with <page>/posts.

4)You need an access token to get any data out of Facebook, press the login to Facebook button and log in.

5)Press fetch data.

Easy, now you have all the results viewable in Facepager, each post only shows the first 25 comments though and I want them all. To find them all I just clicked each individual node and changed the resource to post/comments. A video on how I did it:


Posted in Data Analytics

Teenagers fill the UK sex education gap

Sex education at my high school school was limited to a single day. The boys would go in to a classroom with their P.E teacher equipped with a condom and a banana, the girls would get a visit from a lady from Procter & Gamble equipped with bags of free sanitary products. By the end of the day most children would be left with more questions, you can either ask your friends and risk the humiliation of not knowing what those with bigger sisters knew, or much worse, you could ask your parents. I always thought that it was part of the plan that kids would go and  It appears that this particular gap in education still exists, a few days ago I read a BBC report that said education in school is not up to scratch.

Talking to my wife about the BBC article she made the comment that this isn’t the same situation as it was when we were at school, she had picked up on a learning network on Youtube where teenagers were coming together and creating networks to learn from each other, swap tips and furthermore create solutions to missing gaps in the market.  I had heard of learning networks before, they are those pretty pictures of nodes and edges, but it is quite hard to make the leap from pictures to what is actually happening in the interactions between people who want to learn.

One stand out channel that my wife showed me is called Precious Stars, the story behind it is pretty interesting. The channel itself is ran by a 17 year old girl from the UK who at the age of 11 she was diagnosed with ME and joined an online support forum. On the forum she found a group of girls discussing reusable menstrual products, interested she researched further only to find a lack of easily accessible information and products for young girls. She joined forums, started working out what she didn’t know, shared this information on other forums and friends at school. Fast forward 5 years and the girl is one of many popular interlinked Youtube channels where girls discuss their experiences with reusable products and offer an alternative to what is taught by the rep that comes in to school.

The teenager took it further, finding that the market didn’t sell exactly what she was looking Bree convinced her parents to invest in a sowing machine and set up a shop. Finding that people trusted her advice and wanted to buy other products from her she got in touch with a few suppliers and now runs a little side business. More recently she has started to get her business involved with charities to help with girls who can not afford sanitary products.

You can watch her journey on her Youtube channel:

I find this absolutely fascinating, what interests me about her learning network is that it was never planned. Reading her blog posts it appears that the network grew organically to fill a gap in her own education and she found that talking about it with others was the way to learn quickest. There was certainly never a business plan or investors. In this circle of Youtube channels there are videos on similar subjects such as well being or diets. all of them supporting each other and leaving comments.

Precious Stars is a nice little example to demonstration how teenagers are doing incredible things on Youtube but it doesn’t stop at sanitary products. Boys are particularly interested in demonstration and sharing their  skills around games such as Call of Duty or Minecraft.

If only we could bottle up whatever it is these teenagers have that spurs them on to create these learning networks. When it comes to online learning perhaps we should be talking more of a lead from them. I wonder what they would make of some of the analytics tools we use on learner data when applied to their audiences.

Posted in Education

A Learning Technologist

Welsh writer Dylan Thomas, who was very fond of a drink, once quipped that ‘an alcoholic is someone you don’t like who drinks as much as you do.’

When I first came across the quote by Dylan Thomas I  liked it because in my mind I was changing it to something akin to ‘a Learning Technologist is someone you don’t like who talks about MOOCs as much as you do.’ I guess my mind did this because it’s very easy to criticise some of the current theory behind online learning. At a recent meeting on the subject of work from a well-known Learning Technologist, I jokingly asked ‘Is he a brilliant satirist or is he serious?’

Later on that day and thinking about my question, I remembered back 10+ years to my colleague class where many of us were perhaps a bit too big for our boots; it was a computer programming BTech and had a classroom full of boys who were perhaps not top of their class in high school but suddenly found themselves getting excellent marks in something they enjoyed in an environment that afforded them much more freedom then school. The situation was a recipe for disaster but our personal tutor was very experienced and would gently drop the right piece of advice to calm down the lads without crushing their enthusiasm. I have a memory of some of the lads verbally insulting the tutor who calmly replied that calling out the flaws in others is recognition of some sort of personal flaws you are anxious about.

My own comment was about a learning technologist whose work is leading the field but whose work I am not comfortable with. The Dylan Thomas quote is scary because it’s easy to label him as one of those ‘Learning Technologists’ and instead of thinking really hard about why I am not comfortable with it I can just pretend that I am not the same. I’m pretty sure my college tutor would have something really good to say to me now.

Posted in Education

Education and friendship

As I was coming up to my final exam of my masters my tutor said something that really hit me hard. On the subject of us all getting jobs he said something along the lines of:

Most students find being in a job hard, not because the job itself is hard but because in school, college and university they are surrounded by friends. A job isn’t a laugh with friends and they find themselves suddenly isolated.

This was a really scary thing for me to hear. I have had great friends in school, college and uni and I have very fond memories of education because of them.

While I have lots of great memories the education system with friends, many of them revolve around big things that happen in our personal lives. Friends would rally together around a big birthday or a friend departing to a different part of the country. I also remember the final few weeks spent with groups of friends in educational establishments. Sitting on the field cramming as much as I could in before my history GCSE, finishing college coursework early and setting up a Sega Mega Drive in the networks lab, sneaking in a few games of freeciv during the computer security revision, there is something special about the times you spend with people when you know you are all about to move on to the next stage of your life.

After this week I go on leave from work for a while; I’m going to get married! It’s one of those times that if you was in uni for a lesson then your friends would take a break from their assignments, rally round, wish you well, tell the tutor to shove it, stick on the Mega Drive and good memories are made. While my tutors comment really hit me hard at the time I haven’t really thought about it much since finishing my masters until recently. I had been thinking that there is something special about education that creates a unique bond between people and that in turn that bond is part of what gets you through education, makes you strive to do your best. I thought back to my tutors original comment and wondered what was  different about the workplace and why we couldn’t replicate those friendships and special moments. It is certainly a two way thing, the people you meet and friendships you make sit at the core of your experience with education.

I had a wonderful day yesterday, friends in work gathered round wished me well, wrote messages for me and gave me a gift to set me on my way. It very much reminded me of school and made me think my tutor was wrong. I didn’t feel isolated in work. I can’t help but wonder why he said what he did. Has he had bad experiances in work? I wonder why my experience has been different to his. I was wondering if it was because I work in education, but he was my tutor, so he did too!

I am driven in many the things I do now; like work on projects, my PhD, blog, because of the people around me. A few days ago I wrote about a MOOC being a lonely experience and Sheila commented on my post. Sheila made a point that assessment is something that education does to ruin things. I think that was a great point, it sounds silly but the process of a friend coming and making a comment on something I had written was equally as important as the point itself. So on the one hand my experience of the MOOC makes me worry technology might take that all-important friendship x factor out of education. On the other hand after I wrote about being alone I got a comment from a friend in a matter of minutes; technology is a funny thing. Friendship isn’t something you can design in to your course, but perhaps as we explore new model for education it is something we should think about.

Posted in Education, General Chatter

My Videos

Read CSV in to R with R studio

Extract Facebook Data and save as CSV

First attempt at Oculus Rift in the browser

Oculus Rift Plugin in Unity 3d Setup

Enable Oculus Rift support in Torque 3D