When Arlinda asked me to write something about my master’s thesis on the company blog, I was stumbled. Thus is not because I wouldn’t have anything to tell about the thesis. Instead, I have been basically living the thesis for the last eight months, which makes it difficult to pick any single viewpoint to the work.
The topic of my thesis work is Bayesian clustering of huge friendship networks. It discusses methods to find structure in large networks, such as networks of friendships and presents an algorithm for this purpose, originally devised by Janne Sinkkonen, the chief researcher here at Xtract.
The algorithm finds overlapping group (i.e., cluster) structure of nodes. In each group the nodes are expected to possess similar traits and each node may belong to multiple groups.
The method studied in the thesis uses a Bayesian model of friendship formation, which is based on the theory of homophily studied by sociologists since 1950’s. What is nice about the approach is that it not only tells into which groups each node belongs, but also about the certainty of the group assignments.
It was interesting to do the thesis for a company, because there was a real need for the algorithm: The algorithm and its implementation are now being taken into use in customer projects. This is different from the academics, where the developer may often be the only one to ever use a reference implementation.
You can read more about the algorithm from the abstract we wrote for the MLG’07 -workshop: “Inferring vertex properties from topology in large networks”.
