Our 21 students are working in labs from NC (Duke) to MA (Harvard and MIT), and on topics from computer languages to tissue formation. Join us here to read weekly updates from their time in the lab!

Visit the EXP page on Peddie website: peddie.org/EXP.

Thursday, June 20, 2013

NLP week 1: visualizing social networks

This summer I work at Center for Computational Learning Systems at Columbia University.  It's great to work at a computer "lab," because we basically go to work whenever we want, and leave whenever we want.  Why?  Because computer science people are typically pretty motivated and self-disciplined, and they probably work even more at home (at night) than at work during the day.

The focus of my mentor, Apoorv, is to have computers extract a social network from text--that's right, it means computers will have to "read and understand" English text!

My job this first week is to simply build a web interface to the system that accepts arbitrary text input from anyone on the Internet, passes it on to the program that Apoorv and his team have built, collects the output of their program, parses it and visualizes it in user's browser.  Sounds too abstract?  See the following example.


In the screenshot above, the circles (vertices) are "occurrences"--each time a name appears in the text, it is an occurrence.  The arrows (arcs) between vertices denote observations (as in "I see you in the restaurant," where "I" am aware of your existence, but "you" are not aware of "my" existence).  Another type of connection is called interaction, where both parties are aware of each other.  If you are interested in this notation, read Apoorv's paper.

In the next step, I will post-process the result from Apoorv's system, and merge occurrences--as you can see, Charlie appeared three times in the generated graph above, and in the next version it will be merged into one.

In another word, I am just building a demo system to allow more users to try our system.  You may have noticed that the arrows are completely in wrong directions, and some of the connections should have arrows in both directions.  Yes--it's a known bug, and we are still trying to figure out why.

Let me show you the stack.  Bottom-up, there is Apoorv's Java program, and then a TCP socket server that I wrote in Java that listens locally for requests, parses the results in .net format and generates JSON results.  A Node.js program that I wrote with the Express framework is on top of my socket server, and it simply serves this webpage above, passes English sentences down to the Java program and transfers JSON results back to the browser.  Visualization happens mostly in your browser, in which I used D3.js library to help me to calculate the locations for the circles and lines according to physics laws, and SVG to actually represent them.

Bonus: a graph for you to play around with (a modern browser that supports SVG is required).   You can drag anyone to move them around!
Hope you liked it!

The real research hasn't started yet.  Hopefully I will actually get to the natural language processing part as early as next Monday, after I finish working on this web interface.  So, see you next week!

No comments:

Post a Comment