I’ve recently been looking for a way to archive tweets for the #laceproject and it turns out there are no good ways to store tweets. This is actually by design of the people at Twitter, because if you want access to old Tweets you should be paying them a lot of money.
There does seem to be two routes.
Route A) DIY:
You can go down the route of creating an application to read the Twitter API (which goes back a week I think) and save the tweets manually in a local database. This really has to be something I can run ‘in the cloud’ as I don’t want it dependent on my laptop being turned on enough. You would think lots of people would have already done this, and there are lots of links to dead projects, the only living example I can find is the Google spreadsheet Tweet collection thingy by Martin Hawksey. The spreadsheet is a tidgy bit fiddly to set up but it is the best thing out there I can find. However, there are two things that worry me about this approach:
1) It only captures the things you tell it to catch, so when you realise that people used the hashtag #laceprojectmeetup for all the events there is no way to go back and get them. In fairness this is a problem for anyone who doesn’t pay Twitter for access to the firehose.
2) Twitter are very protective of their data and are starting to slap the wrists of anybody who thinks it is a good idea to store tweets. I seem to recall a post from Martin saying Twitter warned him about some aspect of his work and the terms and conditions of the Twitter API (although I can’t find it now, could be dreaming). It worries me the reason we don’t see many of these tools is because Twitter are killing them, and I don’t want them to kill my application half way through the project.
Route B) SaaS:
You can use software as a service type thing ran by a business. While Twitter seem to be killing off all the decent homebrew applications that store tweets, they don’t seem to be killing off *all* the Silicon Valley starts ups. A quick Google search of ‘archive tweets’ gives me lots of hits for BuzzFeed blackhat seo style pages of “top 10 services to archive tweets”. Again many of these seem to have died quickly, if Twitter was killing them off or not I don’t know, but there are is a common theme between the living that makes me think there is a reason they are still alive. You cannot export your tweets as anything useful at all, you can look put you can’t touch, which for a project around analytics is not a great solution. While some of the tools seemed to have fancy graphs engines, there seems to be no way to actually get the data out in a decent format to play with in any other type of application.
Generating data to sell
In route B I found a service called Twubs the most interesting application because it was free, which made wonder what the business model is. They could be gearing up for a tiered purchase plans like many of the Silican Valley start ups do; tempt you with a freebie and then bang on a charge. I hope not though as I found it more interesting to think about the datasets that a Twubs user is generating and how they can profit off that. Twubs are basically getting you to generate data of things that are interesting to your project/business/institution in a dataset that was generated by you for a different company and they must see value in either that, or selling you the tools around that.
Since they aren’t even giving you the data in a format anybody wants, I propose a new service called Twubucket. You tell me about all the Twubs you are interested in and in return I’ll aggregate them and post the aggregated results to an Adobe Flash based dashboard, You can’t download the results, so Twubs won’t shut me down naturally, but it’s free -as I’ll be selling the data, You don’t have to worry that you can’t download the data anyway, because you can then sign up to my sister company Twubucketotairballon where I will aggregate your Twubuckets and put them all on massive screens on the side of Zepplins that go past your house. That’s also free because I’ll be selling Zepplins.
I find the whole creation of recursive closed business models very interesting. But what does it mean for doing anything analytical?