Archive

Archive for the ‘Laaarge data sets’ Category

Christoffer Langenskiƶld
User Experience designer
Chris 

Social networking websites – Japan vs US

When I read Jay Alabaster’s article on the Japanese behavior on social networking websites, it made me realise how difficult it must be for some companies to get any customer insight from their customer base.

According to Jay, “the vast majority of mixi’s roughly 15 million users don’t reveal anything about themselves” and keep in tight groups, to which he adds that “fewer than half of Match’s paying members in Japan are willing to post their photos, compared with nearly all members in the U.S”.

Must be so frustrating to sit on so much data and not be able to get any useful insight extracted. I wonder how companies like pixi.jp handle it, considering users have fake profiles, or then companies like match.com considering how differently users behave from culture to culture around the same service.


Xtract
admin 

Matrices in the land of Tintin

This week, the University of Antwerp has been hosting the ECML-PKDD conference. It is a good opportunity to hear the newest thinking in machine learning and knowledge discovery, and talk directly to researchers. The organizers have worked very hard to make the conference a success. One of their many good ideas is to have every paper be presented both as a talk and as a poster, so if you have questions that were not answered in the talk, the author can explain the work again using the poster as an aid.

On Tuesday I had the opportunity to chair the Matrix Factorization session, arguably the highest-quality research session at the conference, since out of the four papers presented one received the Best Paper in Machine Learning award, and another one the Best Student Paper in Knowledge Discovery award.

To those of us who didn’t take Linear Algebra 101, Matrix Factorization may sound imposing, but really it is a beautiful, unifying idea behind many techniques such as community discovery, document classification (e.g. into spam and non-spam emails), and collaborative filtering, which is what Amazon or Netflix does when they recommend an item for you based on your previous purchases compared to those of other customers.

In the session, Ajit Singh gave a talk on how the matrix factorization idea encompasses several methods that might not look like matrix algebra on the surface. Alexandros Karatzoglou explained several improvements on Maximum Margin Matrix Factorization, one of the hottest collaborative filtering methods around. Pauli Miettinen discussed factorizing binary matrices, which is quite a different problem from usual linear algebra methods, and Bin Cao et al.’s paper was about a new adaptive way to compute a similarity metric for collaborative filtering.

Dessert in style

Dessert served in style at the conference banquet on Wednesday

Date
Thursday, September 18th, 2008

Tags

Academic, Academical, Communities, Laaarge data sets
Tags: , , ,