Xtract Logo

Archive

Archive for the ‘Academic’ Category

jaakkos 

See you in MLG’08!

Xtract is sponsoring the MLG 2008 Event

Xtract is proud to sponsor the 6th International Workshop on Mining and Learning in Graphs that features such keynote speakers as Fan Chung (University of California, San Diego), Thorsten Joachims (Cornell University), Mohammad Mahdian (Yahoo! Research) and Hannu Toivonen (University of Helsinki). The registration is still open; for a discount price for today only. The workshop will be held in Helsinki, our home city, on 4-5 July.

Quoting from the conference web-site, MLG’08

“will be the premier forum for bringing together different sub-disciplines within Machine Learning and Data Mining that focus on the analysis of structured data. Of particular interest is data that consists of interrelated parts or is characterized by collections of objects that are interrelated and linked together into complex graphs and structures.”

Last year our team participated in MLG’07 in Venice with the paper Inferring vertex properties from topology in large networks (Janne Sinkkonen - Xtract, Janne Aukia - Xtract, Samuel Kaski - TKK) and won a prize for distinguished contribution.

Our team has a paper in the workshop this year, too. I’m excited in meeting you all there in a cozy scientific atmosphere and venue for insightful presentations and discussions.

Date
Wednesday, June 18th, 2008

Tags

Academic, Social Network Analytics, research
Tags: , , , ,

Christoffer Langenskiöld
User Experience designer
Chris 

Personality recogniser

Last week I bumped into Francois Mairesse’s open source Personality recogniser, of which he has a web demo where you can input emails, essays, chat logs, thoughts or other texts produced by you and get your personality scores for all Big Five traits (Extraversion, Emotional stability, Conscientiousness and Openness to experience) as well as the model used to compute them. Statistical models which can be used are SVM with Linear Kernel, M5 Model Tree, M5 Regression Tree and Linear Regression, the Support Vector Machine (SVM) being the most general one.

Obviously, the more text you feed the recogniser the more accurate it gets.

How this works is quite interesting. Here’s a summary of how markers are grouped (Mairesse, et. al., 2007):

  • standard counts (e.g. words count, word per sentence, syllables per words, frequency of use, words longer than 6 letters, negations, articles, pronouns)
  • psychology processes (e.g. positive & negative emotions, causation, tentative, references to people)
  • relativity (e.g. past tense verb, future tense verb, up, down, inclusive, exclusive, motion)
  • personal concerns (e.g. school, work, achievements, TV, movies, music, moeny, religion, death, sexuality, eating, sleeping)
  • utterance (e.g. ratio of commands, questions, assertions)
  • other dimensions (e.g. ponctuation, swear words, fillers, familiarity rating, meaning fullness Paivio norm, different Kucera-Francis frequencies)
Date
Friday, June 6th, 2008

Tags

Academic, research
Tags: ,