Researching Quantized Social Interaction

Socialbots: Day Two

And, the competition continues apace! Here’s the current scores out on the field at the end of Day Two:

  • Team A: 13
  • Team B: 5
  • Team C: 100

And, as per usual, here’s a link to the latest network graph of the battlefield — generally of note is the gradual shift of the Team A bot away from a targeted connection pattern in the Southern Front cluster of connected users to broader connections throughout the targets.

Our sense is if Team A and Team B are able to keep up their rate of connections into the week, they will need to find countermeasures to hinder the continued push of Team C in Round 2 next week and give them an opportunity to close the point gap. Failing that, they may have to work to maximize interactions among their followers, which will yield generally more points than raw connections. Stay tuned for more!

Socialbots — Day One

Greetings readers! As promised — this will be the first post in a series of daily posts over the next two weeks, cataloging the first large scale battle in bot-driven social influence. At last count, Socialbots 2011 features three teams, what we’re calling here Team A, Team B, and Team C. Teams A and B are fielding single entity bots in the competition, and Team C is running a lead bot and as well as a swarm of smaller supporting bots floating in the network.

After the first day of official action — Team C has pulled into an early lead, generating 90 points — 75 from mutual followbacks, and 15 points from a small set of @ replies. Current scores are:

  • Team A: 5
  • Team B: 4
  • Team C: 90

For your edification, we’ve put together an anonymized graph that shows the map of the battlefield with targets marked in red and lead bots marked in blue. It seems clear that Team A and C have chosen a more general strategy, ranging widely over the entire target group. Team B has programmed the bot to trace a tighter strategy — embedding itself within a tightly knit group of users in the bottom left quadrant of the map (what we’re affectionately calling “the Southern Front” here at mission control). We’ll be seeing the yield from that over the week. More tomorrow, stay tuned!

A First Look At Socialbots 2011

Hello everyone! Glad to give an announcement today that the Socialbots 2011 competition is officially off to the races. The turnout for the event has been awesome — for this first showdown we have four teams hailing from around the world and from a diverse range of backgrounds, ranging from academia, to media, and beyond.

Right now, we’re halfway through the coding phase of the competition, and teams are steaming ahead with preparing their bots for the field of battle. The task for Socialbots 2011 is based around a lead bot, with points awarded for the number of mutual connections with the lead bot and responses they can elicit from the 500 targets that we’ve generated. Teams launch their lead bot, as well as any number of supporting bots they wish, to maximize social impact.

For all of you playing at home, the Web Ecology Project will be providing (anonymized) day-by-day color commentary and information from the competition as it goes down. The bots will be going operational on January 23rd and running for two weeks, with the competition terminating on February 6th. But, while you’re waiting around for things to get started, here’s some information you might find interesting…

Stay tuned for more!

Help Robots Take Over The Internet: The Socialbots 2011 Competition

HELP ROBOTS TAKE OVER INTERNET
WIN $500 HOO-MAN DOLLARS

There’s been much done, and even more written about online influence and community. But it’s time to see who really has the biggest boots in town.

Today, we’re glad to announce the first-ever competitive event in the large scale robotic influence of online social groups. SOCIALBOTS 2011.

Teams will program bots to control user accounts on Twitter in a brutal, two-week, all-out, no-holds-barred battle to influence an unsuspecting cluster of 500 online users to do their bidding. Points will be given for connections created by the bots and the social behaviors they are able to elicit among the targets. All code to be made open-source under the MIT license.

It’s blood sport for internet social science/network analysis nerds. Winner to be rewarded $500, unending fame and glory, and THE SOCIALBOTS CUP.

Registration fee is $25 per team, payable by check or PayPal. Teams should sign up BY JANUARY 8th by sending an e-mail with contact information to tim.hwang@webecologyproject.org.

Currently, the schedule is as follows, though this is subject to change:
January 8: Registration Deadline
January 9 – January 22: Task Released and Bot Programming Begins
January 23 – February 6: Competition Live
February 7: Winner Announced

140kit Field Reports

The data your data could smell like

by Devin Gaffney & Ian Pearce

with Matt Morain and Alex Leavitt

Recently, two of our Web Ecologists (Ian Pearce and Devin Gaffney) followed up on an interesting data set that was uploaded by Sean McColgan, a digital strategist based out of London. The Web Eco team also extracted another data set from Twapper Keeper using an experimental-stage data set uploader, and came up with interesting results as to the efficacy of the ad campaign, which, in a few conversations between some Web Ecologists, clearly marked one of the first widely successful viral marketing campaigns conducted by any agency ever.

The full 20-page report covers a basic approach that 140kit employs in doing large-scale hands on research: First, a quick data-sheet or executive summary, and then some more in-depth research. The data is approached from the perspectives of general (basic, pre-established important points of interest), content (what is actually said), and network (the connections and relationships between users) analysis, which allows for a clear review of data that gives a sense of the dynamics of the data set instead of simple numbers and figures that only give glimpses of the information.

140kit’s analysis kit was expanded and optimized in order to learn about the user base: in particular, we were able to query True Knowledge’s incredible system to find out about the genders of the users:

34.91% Male, 24.55% Female, 40.52% Inconclusive

Gender Breakdowns, Old Spice Dataset: 34.91% Male, 24.55% Female, 40.52% Inconclusive

One of the core basic histogram charts available in any 140kit data set is the accounts created over time:


Account creation dates for “Old Spice" Data Set, 23,924 Users

And one of the more interesting pieces of information is the network analysis that we conduct. One of the most basic problems with other internal reports Web Ecology has seen from marketing analysis firms is that they frequently omit network analysis in analyzing data that is of a fundamentally networked nature:

Our slightly ugly network analysis... Someday, CIRCOS, someday...

Re-Tweet Network Map, with small-node pruning enabled alongside logarithmic node sizing based on out-degrees

Finally, here’s a link to that Youtube video, just in case you were in a bunker all summer and didn’t see the genius of W+K’s creative department:

Roll Your Own Human Powered Botnet with Pawnfarm

Simulating human identities online with robots is hard. So why not just pay humans to do it for you?

Pawnfarm, made in a collaboration between Web Ecology Project researchers Evan Burchard and Tim Hwang (and emerging from the various discussions at Web Ecology Camp IV) enables you to do just that. Once you have an instance of it running, you can put any arbitrary number of Twitter accounts under zombie control of the artificial artificial intelligence engine of Amazon Mechanical Turk.

Then, you can tell the bots to have human-generated @’s, RTs, tweets, and specify custom directions for them to follow (tweet about the weather, comment on articles, etc), just by pumping money into Mechanical Turk. It’s pretty slick.

Effectively, this creates a human-powered botnet completely under the command of the user of Pawnfarm. It’s hoped that this project will allow for the massive scaling of human-powered robo-identities on Twitter, and other interesting experiments coming out of it. The next steps, for the curious, are to allow the behavior of these bots to get proactive, having them follow/unfollow under certain conditions, reply to certain parts of a social network, etc, etc. Truly exciting times.

All the code now released over Github on the MIT License. Get it while it’s hot! And, send any questions/thoughts/etc to contact@webecologyproject.org.

Sample Analytics You Can Create On 140Kit

User Follower Distribution

Account Creation Timeline

Retweet Networks

Presenting 140Kit

An Open, Extensible Research Platform for Twitter

by Devin Gaffney, Ian Pearce, Max Darham, and Max Nanis


Hello world. It’s been awhile since the Web Ecology community last made a peep on the web. Some had been speculating that we had simply up and disappeared, but reports of our demise were greatly exaggerated, as they say.

Here’s what we’ve got.

Thanks to the completely amazing work of our affiliate researchers at Bennington, we’re glad today to announce the public launch of 140Kit, Web Ecology’s very own free-to-use toolkit for exploring and data mining Twitter. It’s the final product of the various provisional tools we’ve used to produce our previous reports on the social phenomena of Twitter, and of lead researcher Devin Gaffney’s own work on high throughput humanities.

So what does it do? Notably:

  • It enables complete data pulls for a set of users or terms on Twitter, with searches running continuously.
  • The ability to download those data pulls in raw form to use for whatever you please.
  • The ability to stand on the shoulders of giants by mixing and matching existing data pulls to generate entirely new combinations of data and analysis.
  • And the ability to instantaneously generate basic visualizations around the data (term use, inequality of participation, etc).

Best of all, we are making this platform open and free to use for all interested users. This includes opening up an API for queries of all sorts, an honest, open, and editable codebase, and plans already in the works to make the program extensible to allow developers to write their own analytics for the kit, on whatever sort of metrics in whatever programming language (stay tuned for details).

So, get in there and play, people. And let us know if you have any questions! tim.hwang@webecologyproject.org or contact@webecologyproject.org.

ChatRoulette

An Initial Survey

by Alex Leavitt & Tim Hwang

with Patrick Davison, Mike Edwards, Devin Gaffney, Sam Gilbert, Erhardt Graeff, Jennifer Jacobs, Dan Luxemburg, Kunal Patel, Mike Rugnetta, & Karina van Schaardenburg

ChatRoulette

This paper represents an initial study of ChatRoulette.com, conducted between February 6th and 7th, 2010 by researchers in attendance at Web Ecology Camp III in Brooklyn, NY. We sampled 201 ChatRoulette sessions, noting characteristics such as group size and gender. We also conducted 30 brief interviews with users to inquire about their age, location, and frequency of ChatRoulette use.

Summary

• ChatRoulette represents an example of a probabilistic community: a community shaped by a platform which mediates the encounters between its users by eliminating lasting connections between them.

• After ChatRoulette users become more acquainted with the system (ie., do not browse solely to explore), we predict a decrease in explicit content, an increase in the consolidation of content genres, and an increase in the formation of celebrity figures.

• Our survey shows that ChatRoulette’s current community continues to consist of males age 18-24, concurrent with Alexa data.

You can download our report here.

Code Release: Language Detection and Translation

Google Language Python Module

By Jon BeilinGoogle Language Python Module

One of the tenets of Web Ecology is accessibility to the field through open tools and open data. At the Web Ecology Project, we’re working to get more of our code in a clean, commented, and releasable state. The first tool that we have queued up for release is a Python module allowing easy use of Google Language Tools, involving language detection and translation, with transliteration in an experimental state (Google has not yet released the API spec for the transliteration portion so that was reverse-engineered).

Now for some sample uses of the tool:

>> from googlelanguage import *

>> print lang_detect("this is a sentence in English")
{'isReliable': True, 'confidence': 0.31734600000000002, 'language': 'en'}

>> print lang_translate("comment dit on 'WebEcology' en francais?", dest_lang="en")
{'translatedText': 'how it says 'WebEcology' in French?', 'detectedSourceLanguage': 'fr'}

We used it ourselves to detect the language of each tweet in a sample of 1 million tweets from our database, with the following results:

We’ve also found it easy to combine the tool with SQLAlchemy to create metadata tables with linguistic information.

It is our hope that this small, MIT/X11-licensed release will prove useful to some in the Web Ecology community. Until we figure out which platform we’re going to use for open repository hosting, you can download the file here. And if you would like to contribute patches or additions, or if you have any questions, feel free to send them to Jon.Beilin@webecologyproject.org

I would also like to thank Sam Gilbert for his invaluable contributions, feedback, and support.