Archive

Archive for the ‘Academic’ Category

Xtract
admin 

Matrices in the land of Tintin

This week, the University of Antwerp has been hosting the ECML-PKDD conference. It is a good opportunity to hear the newest thinking in machine learning and knowledge discovery, and talk directly to researchers. The organizers have worked very hard to make the conference a success. One of their many good ideas is to have every paper be presented both as a talk and as a poster, so if you have questions that were not answered in the talk, the author can explain the work again using the poster as an aid.

On Tuesday I had the opportunity to chair the Matrix Factorization session, arguably the highest-quality research session at the conference, since out of the four papers presented one received the Best Paper in Machine Learning award, and another one the Best Student Paper in Knowledge Discovery award.

To those of us who didn’t take Linear Algebra 101, Matrix Factorization may sound imposing, but really it is a beautiful, unifying idea behind many techniques such as community discovery, document classification (e.g. into spam and non-spam emails), and collaborative filtering, which is what Amazon or Netflix does when they recommend an item for you based on your previous purchases compared to those of other customers.

In the session, Ajit Singh gave a talk on how the matrix factorization idea encompasses several methods that might not look like matrix algebra on the surface. Alexandros Karatzoglou explained several improvements on Maximum Margin Matrix Factorization, one of the hottest collaborative filtering methods around. Pauli Miettinen discussed factorizing binary matrices, which is quite a different problem from usual linear algebra methods, and Bin Cao et al.’s paper was about a new adaptive way to compute a similarity metric for collaborative filtering.

Dessert in style

Dessert served in style at the conference banquet on Wednesday

Date
Thursday, September 18th, 2008

Tags

Academic, Academical, Communities, Laaarge data sets
Tags: , , ,

Xtract
admin 

See you in MLG’08!

Xtract is sponsoring the MLG 2008 Event

Xtract is proud to sponsor the 6th International Workshop on Mining and Learning in Graphs that features such keynote speakers as Fan Chung (University of California, San Diego), Thorsten Joachims (Cornell University), Mohammad Mahdian (Yahoo! Research) and Hannu Toivonen (University of Helsinki). The registration is still open; for a discount price for today only. The workshop will be held in Helsinki, our home city, on 4-5 July.

Quoting from the conference web-site, MLG’08

“will be the premier forum for bringing together different sub-disciplines within Machine Learning and Data Mining that focus on the analysis of structured data. Of particular interest is data that consists of interrelated parts or is characterized by collections of objects that are interrelated and linked together into complex graphs and structures.”

Last year our team participated in MLG’07 in Venice with the paper Inferring vertex properties from topology in large networks (Janne Sinkkonen – Xtract, Janne Aukia – Xtract, Samuel Kaski – TKK) and won a prize for distinguished contribution.

Our team has a paper in the workshop this year, too. I’m excited in meeting you all there in a cozy scientific atmosphere and venue for insightful presentations and discussions.

Date
Wednesday, June 18th, 2008

Tags

Academic, Social Network Analytics, research
Tags: , , , ,

Christoffer Langenskiöld
User Experience designer
Chris 

Personality recogniser

Last week I bumped into Francois Mairesse’s open source Personality recogniser, of which he has a web demo where you can input emails, essays, chat logs, thoughts or other texts produced by you and get your personality scores for all Big Five traits (Extraversion, Emotional stability, Conscientiousness and Openness to experience) as well as the model used to compute them. Statistical models which can be used are SVM with Linear Kernel, M5 Model Tree, M5 Regression Tree and Linear Regression, the Support Vector Machine (SVM) being the most general one.

Obviously, the more text you feed the recogniser the more accurate it gets.

How this works is quite interesting. Here’s a summary of how markers are grouped (Mairesse, et. al., 2007):

  • standard counts (e.g. words count, word per sentence, syllables per words, frequency of use, words longer than 6 letters, negations, articles, pronouns)
  • psychology processes (e.g. positive & negative emotions, causation, tentative, references to people)
  • relativity (e.g. past tense verb, future tense verb, up, down, inclusive, exclusive, motion)
  • personal concerns (e.g. school, work, achievements, TV, movies, music, moeny, religion, death, sexuality, eating, sleeping)
  • utterance (e.g. ratio of commands, questions, assertions)
  • other dimensions (e.g. ponctuation, swear words, fillers, familiarity rating, meaning fullness Paivio norm, different Kucera-Francis frequencies)
Date
Friday, June 6th, 2008

Tags

Academic, research
Tags: ,

Janne Aukia
Janne A 

From technical jargon to marketing speak

Around one month ago I moved from the research team here at Xtract to marketing. This change is an interesting one, since often people consider research and marketing to be opposites: marketing deals with presenting concepts to customers in a concrete fashion as quickly as possible, while researchers work on abstract concepts which might take years to develop into something useful.

What is surprising is that marketing and research are in many ways quite similar and even many of the skills required are the same: in both jobs, one needs to be able to innovate, present ideas effectively and communicate by writing. Also, in both jobs one has to have a vision on where the business is moving and how we can help our customers with our tools and skills.

Now I just need to get some grasp of the marketing speak, which is quite different from the technical jargon I am more familiar with. Instead of “mixture models”, “bayesian inference” and “functional programming”, I now need to be able to talk fluently about “permission marketing”, “value propositions” and “behavior targeting”. In a way, this is like learning a new languge.

Date
Tuesday, May 20th, 2008

Tags

Academic, Marketing

Janne Aukia
Janne A 

Award for the Best Pattern Recognition Master’s Thesis in 2007

I shall start with a question: “Is there still snow in Oulu?”. To this one I happen to know the answer.

Most of my last year was spent in studying methods for finding communities in social networks. Although interesting, writing the thesis was quite a job, since I wasn’t familiar with the area of study.

This is why I was flattered to receive an award for “The Best Pattern Recognition Master’s Thesis in Finland in 2007″ by the Pattern Recognition Society of Finland. The only challenge was that to get the award, I had to travel to Oulu, a city up north where I had never been before.

Despite my prejudices, the people I met in Oulu were friendly and it was fascinating to hear about the research they do on pattern recognition, machine learning and robotics. They had a lot of co-operation with industry partners, such as Nokia, in implementing intelligent mobile solutions and building robot cars.

The presentations by the other award receivers provided a wide scope into the different disciplines combining intelligence with data analysis. The other award receivers were Esa Rahtu (Doctoral thesis), Mathias Creutz (Doctoral thesis), and Kimmo Palander (Master’s thesis).

Rahtu spoke about methods for aligning images and Creutz gave an presentation on methods for finding the structure in written text automatically. Palander spoke about methods for aligning microscope images for 3d modeling. Quite exciting, indeed!

Finally, to answer the question, no, sadly there wasn’t any snow left in Oulu! Oh well, maybe next time.

Date
Friday, May 9th, 2008

Tags

Academic, Events

Christoffer Langenskiöld
User Experience designer
Chris 

“How to model personality traits and possibly affects from mobile user experience data”

… is the topic of my master’s thesis for HUT . I use “possibly” when speaking of modeling affects from mobile user experience data, because I’ve gotten the feeling these last weeks that the “recognising affects”-part just might be the topic of my PhD :) Now roughly in the middle of my thesis, I give you a summary of what’s happened.

What field is this exactly?

The fields concerned are affective computing, behaviorism, personality psychology and social psychology.

What do I call “mobile user experience data”?

Tom Guarriello defines in his article “Experiencing Experience” the ultimate experience is a cluster of smaller experiences. Consequently, he continues that measuring the user experience is best attained when considering as many elements in the cluster as possible.
For example, if you would consider a romantic dinner, what makes it a great experience is the cluster: candles, the dim light, the flower, the view, the slow background music, the quietness and sharing it with the other person? Take one of these alone doesn’t make the romantic. Two, maybe a little. The more element the better the experience. This is an analogy to multimodality. The more the merrier. Multimodality should be at the core of accurate affect recognition models, as emotions are intrinsically multifaceted. Having multiple modalities not only brings the accuracy of triangulation, but also by not fusing the behavioral components of moods allows to present the emotional context and maybe help to draw other conclusions if complemented by other contextual information.

How do you recognise emotions?

Recognising the user’s emotions can be done through many modalities, the most popular ones being: heart rate, skin galvanic resistance, speech tones and patterns, facial expressions, body postures and self-report. But baring in mind the current mobile phones and that the goal, being to get as large amount of users as possible with as little trouble as possible, it wouldn’t make sense to collect that kind of data, so I am focusing mainly on behavior data. This will also give a lower accuracy in time, so considering the spectrum of affective phenomena (Fig. 1) with time as main differentiating variable, I will focus on moods rather than emotions. Privacy being a concern, it is essential to compute private data in the mobile phone at low processing and battery cost.


Fig.1: Spectrum of affective phenomena (from Oatly, 2006)

These phenomena can be categorized in:

  • episodes of emotions (seconds)
  • moods (from hours to weeks)
  • emotional disorder (from months to years)
  • personality traits (lifetime)

In the context of advertising and communication, moods give a good emotional context for mobile ad targeting and filtering.

How do you recognise personality traits?

There are 2 main theories of personality: Eysenck’s model of personality (P-E-N) and the Big Five personality traits. Both have are based on the concept that there are few (respectively 3 or 5) broad dimensions or factors to describe the personality. Two factors both models have in common is neuroticism (tendency to experience negative emotions) and extraversion (tendency to enjoy positive events, especially social events).

Data I am focusing on to recognise personality traits is mobile phone usage, communication behavior, mobile internet behavior and social networks characteristics.

General mobile phone usage:

  • How much people personalise their phone (wallpaper and ringing-tones)
  • Time spent playing with the phone

Communication behavior:

  • The time spent making calls
  • The amount of incoming calls
  • Time spent sending and receiving SMS messages
  • Preference of voice communication or messaging
  • How many people are around the person when he speaks on the phone
  • The time laps people take to respond to a missed call or SMS
  • Speaks on the phone without headset while driving
  • The amount of sent MMS
  • How often one checks if he has missed any calls/SMSs

Internet behavior:

  • Time spent on mobile Internet
  • Time spent for social purpose
  • Time spent searching

Social network characteristics:

  • Size of social network
  • Amount of contacts

Once personality traits are modeled, positive and negative moods could be extracted using relevant theories of personality (Gray, Eysenck or Newman) relating to mood, personality and behavior draw a positive correlation between extraversion and positive mood, and between neuroticism (emotional stability) and negative mood (Gomez, 2000).

Triggers to Negative mood:

  • Frustrative nonreward (Gray)
  • Punishment (Gray)
  • Novelty (inversed) (Gray)
  • Displeasure (Eysenck)

Triggers to Positive mood:

  • Reward (Gray)
  • Nonpunishment (Gray)
  • Pleasure (Eysenck)

What could correspond to such stimuli for a mobile user?
= What makes you in a good / bad mood when you use your phone?

E.g.

  • Out of credit = displeasure
  • Often receiving happy smileys = pleasure, reward
  • Calls not returned = frustrative nonreward
  • Regularity in patterns (or entropy of life) = novelty