Today I discovered Telenovelas. Telenovelas are short limited run programs similar to soap opera, they are popular in Spanish language counties and they are serious business. I stumbled across a clip on youtube and was instantly hooked. Check this out:

I headed to Wikipedia to find out more only to find that Telenovelas is a very large phenomenon, so large there is so much information I didn’t know where to start. So I headed to DBpedia to do some basic exploring.

One of the first pages in wikipedia I stumbled across was about a popular childrens telenovela chiquititas. I think checking an article in DBpedia is always a good place to start, so I decided to check out the resource by its URI using DBpedias install of Virtuoso here: http://dbpedia.org/page/Chiquititas. I noticed there was it had the property dbprop:genre  with value  dbpedia:Telenovela.

dbepedia:genre

 

Using this its quite easy to write a SPARQL query that can then pull back a list of all Telenovela articles in wikipeida.

telenovela sparql

 

Which gave me a list of all the programs in the Genre in Wikipedia, theres 139 of them apparently, which seems a very low number, but then we don’t know how many articles are missing the structured information for the query to work.

list of telenovela from wikipedia

 

I wondered if we could work out which country had the most of them, I appended the query to include :

 

?show dbpprop:country ?country .

?country foaf:name ?countryname

 

which I could then plot with the following:

plotting a barchart using R

 

To get:

Telenovela articles in Wikipedia by Country

 

So Brazilians can’t get enough of the stuff. I’m not surprised either judging from the earlier clip I saw.

I decided to take a look at the cast members members of each Telenovia series. This has to be taken lightly as Wikipedia is short on this information. The SPARQL query looked like this:

sparql query to find how telenovelas stars are related by the shows they are in

This pulls back 645 names and the show they were in, some of the names are the same because they have appeared in more than one show. I created a matrix of who has been in shows with who and then plotted it as a GraphML so I could explore the data in Gephi.

creating a matrix from dataframe

 

 

To finally create a graph, zoom in for names. It seems Sabine Moussier, Elaine Giardine, Tony Ramos are all big names central to telenovela. The big blue circle at the top is from an american soap called Hollywood High.

links of actors in telenovela

 

The final code:

 

 

[codesyntax lang=”text”]

library("SPARQL")
library("igraph")

endpoint = "http://dbpedia.org/sparql"

query = "SELECT ?name ?countryname {
?show dbpedia-owl:genre dbpedia:Telenovela .
?show foaf:name ?name .
?show dbpprop:country ?country .
?country foaf:name ?countryname
}"

qd= SPARQL(endpoint, query)
df = qd$results

counts <-table(df$countryname)
barplot(counts, main="Telenovela articles in Wikipedia by Country", 
xlab="Country")

query = "SELECT ?name ?showname where {
?person foaf:name ?name .
?show dbpprop:starring ?person .
?show dbpedia-owl:genre dbpedia:Telenovela .
?show foaf:name ?showname .
}"

qd= SPARQL(endpoint, query)
df = qd$results

M = as.matrix( table(df) )
iMrow = graph.adjacency(Mrow, mode = "undirected")
E(iMrow)$weight <- count.multiple(iMrow)
iMrow <- simplify(iMrow)
write.graph(iMrow, file="graph.graphml", format="graphml");

[/codesyntax]

 


1 Comment

Paddy 2013 post roundup · January 2, 2014 at 10:50 am

[…] Exploring Telenovela with DBpedia, R and Gephi. While there wasn’t really anything new or exciting from a technical standpoint in this post, I […]

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *

css.php