Showing posts with label Columbia University. Show all posts

Tuesday, September 3, 2013

Computer Networking Lab Week 9

Hi this is Sohan checking in for the last time. I finished working in a computer networking lab this summer at Columbia University.

I finished on August 1st after Varun left for India for the rest of summer. I did more of the same kind of work: ran tests on the orbit lab, processed the .pcap files using the Python scripts, and then graphed the data using the MATLAB scripts. This week, however, I spent most of my time debugging parts of the MATLAB script. Some of the nodes in the Orbit-lab create .pcap files and .txt files that are different that those of the other nodes. They include extraneous information that the MATLAB script cannot process. My job was to create a code where the script could automatically detect the defective file, delete it, and then move on to the next in tact file.

On August 1st, Josiah, the visiting undergraduate student from North Arizona University presented his end project. He did a very good job and even after working with him and performing the same tasks as him, I got the chance to clarify some aspects of the work we did, and gained some insight into the changes he made to the scripts as well.

Lastly, I held off writing this post since Varun had wanted to come back and present my work and research to the lab group. Unfortunately, the curcumstances did not work out and he had to stay in India for a little while longer and thus I was unable to return. Nonetheless, I greatly enjoyed my time working under Varun and alongside the other members of Dr. Zussman's group.

Saturday, August 24, 2013

Week 7 Mendelsohn Lab

This was my last week at the Mendelsohn lab. As I approached my last week, I felt it went by very quickly. I briefly learned how to use a digital microscope to take pictures of the slides that I have stained. Other than that, I did much of the same paraffin sectioning and staining, which felt interestingly comfortable.

One of the digital microscopes

On the last day, I bought doughnuts to share and thanked everyone for giving me the opportunity to work at the lab. I really enjoyed my seven week lab experience at the Mendelsohn lab and would highly recommend it to anyone who would be interested in working with cellular biology and genetics. Hopefully, I will be able to come back next summer.

This is the building where I worked

Saturday, August 17, 2013

NLP week 7: finishing up pipeline wrapper and web interface

Hi again, my name is Jiehan Zheng. I worked on NLP and some machine learning at Columbia University.

I skipped writing about week 6 because we were working on something secret! We will publish our work on that during the upcoming fall term if things go smoothly. So I am writing about my work during my 7th week. I was too busy working on the project so I didn't have time to post updates to this blog...

Since week 7 is the last week I physically work at CCLS at Columbia University this summer, we chose to finish up things that require face-to-face collaboration first, so we don't have to wait on each other to finish our tasks. My work on the web interface and pipeline wrapper would be the thing that we have to finish together before I go--so the last week I mainly worked on pipeline wrapper and web interface.

Apoorv's work is on the pipeline that takes in sentences, gold dependency parse trees, semantic parse trees, and entity annotations. It spits out a file in graph modeling language containing the interactions between entities. In order to make the pipeline work for any unprocessed text and returns a social network, it has to be wrapped around by some wrapper code--I named that part of the code as "pipeline wrapper," and I feel like that's a smart name, isn't it?

So the pipeline wrapper has to take in raw text, split it into sentences and call various parsers and process the result from the parsers into a format that the pipeline expects. There was code on that but it no longer worked, and when it was working, it was poorly written and inefficient. I rewrote the wrapper in a more organized way. For instance, the old wrapper had to call NYU Jet's main method twice to get named entities and split sentences separately--I read Jet's source code and managed to call Jet once and get both information, making it faster. I also prevented Jet from performing useless operations that takes time, like relation extraction.

Then the pipeline gets dependency parses from Stanford parser. My refactoring effort also enables us to run multiple tasks in parallel. For instance, we are going to run CMU's SEMAFOR semantic parser as well in the future, and running SEMAFOR takes a long time. Had we added SEMAFOR to the old wrapper, it has to wait until Stanford parser finishes its job. With the new structure, SEMAFOR and Stanford parser runs in different processes, can take advantage of multiple CPU cores and run at the same time, cutting the running time by at least 50%. SEMAFOR integration is a bit harder than other parsers, so I decided to work on that after I go back to China.

After we have all the parses and other files, the wrapper calls the pipeline with the files, and waits for pipeline to finish processing the files. Once it gets the interactions in text, the wrapper calls the postprocessor that I made during week 2 which merges duplicate entities, finds out the best name for each entity, analyzes interactions and finally organizes these information and outputs a social network file.

The web interface is just some pure programming effort and is nowhere as interesting as working on the pipeline wrapper and other machine learning aspects. My work on the pipeline wrapper, postprocessor and web interface had been included in a demo paper that is going to be presented in IJCNLP 2013 this October in Japan, and I've been made an co-author on that paper--I am very excited for that!

Apoorv and I have made that arrangement with Mr. Corica that I will be continuing our work on that "secret project" as an independent project at Peddie during my fall term. This is indeed a very precious opportunity for me to learn more machine learning--from implementing tools, extracting features, run experiments and tune SVM parameters and our features, to finally evaluating the result.

As for the rest of my summer, I did figure out a way to integrate SEMAFOR so I will spend some time to make enhancements to the web interface and pipeline wrapper by adding in SEMAFOR integration. I will describe more in my next blog post!

Friday, August 16, 2013

Week 4 at Chandran Laboratory

My name is Anna, and I'm working at Dr. Kartik Chandran's Laboratory at Columbia University in Earth & Environmental Engineering.

So this week was mostly defined by transitions. Our batch reactor's finally reached their stable population so we took them and put the reactor into chemostat. Where batch reactor's have nothing going in or out (technically) and are used to watch change over time, chemostat reactors have influent and effluent moving at the same rate. This means that nothing changes: the population is constant, as is the amount of ammonia, nitrite, etc. The reactor is going to spend two weeks stabilizing in chemostat and then the real interesting stuff will begin. Even though I won't be there for it, the next step will be disturbance, or increased feed chemostat. In this phase, the bacteria will be subject to one hour of ammonia loading. Our strain, Nitrosomonas eutropha, is known to prefer larger quantities of ammonia as compared to its N. europa cousins, and hopefully this means that it produces NO and N20 gases differently (i.e. less of them.) However, this ammonia loading will require hourly testing for 12-15 hours every day, so I'm not completely heartbroken to be missing it. After those two weeks, tests will be done to see if the bacteria retained any of the previous traits.

We will be testing for the next few days to get baselines for ammonia, nitrite, hydroxalimine, and some mRNA stuff as well. We will also be creating our own standard curves for the aforementioned chemicals because we are finally getting into work that could be publish-able. Next week I will also be working on some poster drafts to present to Medini and Dr. Chandran.

In the past week, there has been an influx of people coming to the lab, including high school students, and new grad students. (Very thankful that I've had my own desk this whole time.)

Although I can't stick around any longer, I look forward to periodically seeing how this project develops and maybe working with Medini again.

Tuesday, August 13, 2013

Week 6 Mendelsohn lab

Hi again. This is a summary of what I did during my sixth week at the Mendelsohn lab.

I briefly worked on Sol's silk bladder augmentation project by paraffin sectioning some of his blocks. I found these paraffin blocks much more difficult to cut because the silk embedded into the bladder tissue made it harder to fully slice without it ripping apart.

Throughout my stay at the lab, I have been paraffin sectioning many mouse embryos, not fully understanding where and how these embryos have been embedded into the paraffin wax. During the week, Katya brought an E17 pregnant mouse into the lab to remove its embryos. She removed all 13 embryos from the mouse and placed them in 1x PBS, while I prepared 13 tubes of diluted formaldehyde fixing agent. Katya showed me the following steps to dissect a mouse embryo and told me to dissect the rest of the twelve embryos. First, I removed the amniotic sac and cut a small piece of its tail to be further genotyped through PCR. Then I bisected the embryo, under the arms, removing the upper half of the embryo (because we are only looking at the lower half). After bisecting, I moved under the microscope to clean out the rest of the embryo removing everything but the bladder and kidneys. After removing the existing limbs and tail, I placed the embryos into each of the 13 formaldehyde tubes to be fixed and eventually paraffin blocked at the histology department for future sectioning.

Mouse embryo E17

Dissecting microscope

Under the microscope

The next day we ran a PCR on the small pieces of embryo tail (that I previously mentioned) and the following day we ran a gel to confirm the Cre genotyping of the embryos to see which had the gene and which didn't because eventually we want to cross mouses that have Cre with mouses that have Apaf mutations.

Gel

If you have been reading, I haven't explained what this project has been really about. Basically, the Apaf project (the one involving mouse embryos) is about the connections between the ureter and the bladder. In embryos, the ureters are joined in the nephric duct through the common nephric duct. Normally, the ureters would detach from the nephric duct and fuse with the bladder epithelium. This project is trying to analyze Apaf (Apoptotic protease activating factor, one of the major proteins that form the apoptotic regulatory network) mutants to determine whether apoptosis is required for ureter insertion.

Week 5 Mendelsohn Lab

This is Jason again from the Mendelsohn Lab at Columbia. I have been behind writing my blogs and will update you all as soon as possible. I am writing this not actually in my fifth week, but here is what I did anyway.

Throughout the week, I have again been practicing and understanding how to stain slides using immunohistochemistry (briefly mentioned in my previous post). After paraffin sectioning and a day to let the tissue dry and settle on to the slide, these slides were ready for immunostaining. First these slides were deparaffinized in xylene solution and hydrated with ethanol so that the paraffin wax was fully dissolved, leaving only the desired tissue on the slides. After the deparaffinization process, the slides underwent heated antigen retrieval, which means the slides are placed back to back in a pH 9 buffer at 100˚C and steamed for 30 minutes. We did this because when the tissue is processed into paraffin blocks for sectioning, the tissue is added with fixatives that masks and cross-links its proteins, making successful antibody binding almost impossible. This way in the buffer and hot temperature, these fixed proteins were unfolded allowing our specific antibodies to successfully bind. After the 30 minute steam, the slides went straight into PBS .1% triton (a very common buffer solution) for 15 minutes to wash. Then horse serum blocking solution was applied to the slides for 90 minutes to reduce background or unspecific staining. After the blocking solution, the slides were ready for the specific antibody application. Lastly, I applied DAPI (a fluorescent DNA stain), washed in PBS .1% triton, and put on the cover slips.

Staining hood (Deparaffinization on the right)

Slides deparaffinized and hydrated

Heated antigen retrieval

It might seem like a lot to remember at first, but after a couple times of practice you get the hang of it.

My bench

On Wednesday, we had a formal lab meeting where everyone in the lab presented there work. I listened and learned as the other five lab members explained their projects. Katya, Kerry, Tammer, Hanbin, and Sol all amazed me with the work they were doing. The Mendelsohn lab focus spanned far beyond just bladder cancer and touched upon several different areas within the field of urology. For example Sol's project involves bladder augmentation using silk fibers as a scaffold to increase the size of the bladder and lower its pressure for impaired bladders.

I have continued my work on the BBN and Apaf projects and will explain them later in my next post.

Monday, August 5, 2013

Week 2 and 3: Reactors and Reacting

My name is Anna Piwowar, and I am currently working at Dr. Kartik Chandran's lab at Columbia University, working with ammonia oxidzing bacteria in batch reactors and studying their kinetics.

This week began with Medini and I cleaning out the reactors and setting them up for a new cycle. Between rinsing and autoclaving, it was a lengthy process. We had to overcome many problems (pieces that didn't fit, screws that had to be unscrewed, and everything in between). Finally, we thought we would be ready to inoculate our sparkling clean reactors and begin looking at cell growth. However, problems kept cropping up, and Murphy's Law held true: everything that could go wrong did. *(Not everything did go wrong, yet. So I don't want to jinx anything but we are still able to move forward with the process.) The DO (Dissolved Oxygen) probe on one of the reactors was found to be defunct, so now we are in the process of ordering a new one (and they do not come cheap). We do have the one reactor working properly enough for the time being, and hopefully we'll start collecting data on cell growth and nitrite formation.

While the process goes, I have a lot of time to read. Medini has many textbooks to offer me, and on Tuesday it was all about studying reactors, all the types and the equations. Sadly, I lack the Calculas to understand some of the things, but after an arduous tutoring session with Medini, I understand what the kinetics are and what we need and why.

By the end of the week we decided to go forward with the second reactor without the DO probe, and instead opting to manually find the dissolved oxygen. Even though we can't have simultaneous growth, we will at least have two sets of data.

The third week was all about data collection. We went forward with the second reactor, and now we're just working on maintaining the reactors and improving my laboratory skills. My PI is out a lot, but I will be meeting with him soon to talk about other things I might pursue for my last few weeks here.

Other than that, Columbia is beautiful, especially with the cooler weather, and having the chance to travel around Manhatten is wonderful.

Monday, July 29, 2013

Computer Networking Lab Week 8

Hi again, this is Sohan and I am working in a computer networking lab this summer at Columbia University.

Once again, my routine this past week was essentially the same as the weeks prior; try to run the test on the Orbit-lab, process the .pcap files, and graph the results. This last time, however, after several attempts, I was finally able to obtain some data as the test bed did not give too many insurmountable issues. With this new data, I am currently trying to run the Python scripts and MATLAB script to complete the post-processing of this new data set. instead of processing and graphing previous data sets. Given that this is a new data set, the post-processing also poses issues as some of the nodes from the test bed cannot properly create .pcap files and thus the Python script cannot process them. My new challenges include trying to figure out which .pcap files are causing the issues and to further develop the script and make adjustments to it so that it can process the .pcap files smoothly without any errors arising.

Also, this past week, we had a lab meeting where all the Ph.D students had to present their recent work, and a few had to practice their presentations for their posters in some upcoming gatherings. The most intriguing, and the one I could probably understand the most, was one on cascade power failures and how to predict which power lines will go out depending on which line fails. Since Varun was at Bell Labs that day (since our project is in conjunction with Bell Labs), me and Josiah had to present in his place. All the other Ph.D provided very constructive criticism and offered other solution to more efficiently process the .pcap files such as eliminating certain steps, like storing the values in an excel file, that can be avoided entirely and save some time without altering anything whatsoever.

Lastly, since our lab is under rennovation, we are currently working in another work area. However, this past Friday, some workers came to put in some cable trays, can be used to pass wires through and create a mini network of sorts. Instead of it taking them one hour to finish as they said it would, it took them five. Instead, the five of us working in this area went into the kitchen area and watched a few episodes of Breaking Bad to pass the time - one of my favorite days on the job!

Monday, July 22, 2013

Computer Networking Lab weeks 6&7

Hi again, this is Sohan and I am working in a computer networking lab this summer at Columbia University.

These past 2 weeks have been more of the same. I have continued to run tests on the Orbit-lab, process the .pcap files using the Python scripts, and graph the results using the MATLAB scripts. Though it may seem monotonous and redundant, each time accessing the Orbit-lab presents its own adventure with its own unique challenges.

More recently, the nodes in the network are successfully uploading the requisite version of Ubuntu which solved one of my major problems before. However, other problems have occurred such as the nodes failing to communicate effectively with one another and sometimes the drivers not being installed properly (drivers are the device which helps the software communicate with the hardware and vice versa ex: the trackball on the older mouses interpreted the motion of the mouse and sent that information to the mouse driver which interpreted the signal and made the pointer on the screen move in the designated direction). In our case, the driver helps interpret the commands we are inputting and makes the hardware act accordingly ex. if we want the wlan card in the AP to release a signal of 18mbps, we input a command, the driver interprets the commands, and send a signal to the wlan device to release a signal of 18mbps. When the drivers are not installed properly, our commands may be ineffective and futile at points making the process of running the experiment very difficult. Nonetheless, there have been times where the Orbit-lab does in fact cooperate and work adequately and I have obtained data sets to analyze.

Also, over the past few weeks, Varun has had me debug/clean-up/develop the Python and MATLAB scripts. Some of the information previously had to be manually inputted such as the time/date and now, I have made them so that they can obtain the information directly from some of the files in our data sets. Similarly, since we use 2 Python scripts to process the .pcap files and one script is based off of the outputs of the previous script, I have made them more automated where the last script can retrieve the data from the text files outputted from the first script to make the process less complicated and help it run more smoothly.

Lastly, one of the members of the lab is leaving for Korea tomorrow. He will be starting graduate school in the fall at Michigan and will spend the rest of his summer at home in Korea. We had a nice little going away party for him this past week and at the same time welcomed a new member to our lab who will essentially take his place from China.

So far, work has been going well and hopefully, we have some more productive weeks ahead!

Chandran Laboratory Week 1: Dawn in Morningside Hights

So unlike my peers, who are mostly finishing up their lab work, I have only just begun my work at the Chandran Laboratory at Columbia University, working with a graduate student studying ammonia oxidizing bacteria and nitrous oxide and nitric oxide emissions.

My new role as a commuter began when I was dropped off at the PATH station, gearing up for what I assumed to be a 2 hour commute to upper Manhattan. To my delight, everything went much faster than expected. I arrived and with only minimal trouble, found the correct building, and met my mentor, Medini, who is a first year graduate student and whose research I'm helping with. My PI, Dr. Chandran, was doing "field research" with another graduate student, and I would soon learn that he is often out of the office and very busy (although we did get a chance to meet to talk about the lab goals).

The first few days I passed the time observing Medini and her pure-culture batch reactor. Thus far all that she had been doing is observing the cell growth and troubleshooting reactor problems. The machinery is pretty cool, with the tub full of the media and cells connected via all these tubs to a fancy machine that regulates everything and shows all the things happening for you, except cell growth and product formation, which would be our job to track. Basically, her work involves Nitrosonomas eutropha, which is an ammonia oxidizing bacteria vital in nitrification, which is the change of ammonia into nitrate (via a few intermediate steps). Because the batch is a pure culture, meaning that no other bacteria are growing in it, this means that every piece of equipment and everything around it has to be extremely sterilized. Right off the bat, in order to retrieve a sample from the reactor, Medini had to use a Bunsen burner to switch out the tubing. Until my safety training, I could only observe, take notes, and hopefully absorb some skills.

Friday came quickly, and just in time for a new cycle of cell growth to begin. We had to make new media and autoclave many things and will soon be inoculating the reactor. While the cells grow in the reactor, we will be first observing their batch growth curve, which will involve lots of cell counting. Then we will be looking at product formation, in this case nitrite, using a spectrophotometer. Once the cell population stabilizes, then the real experimentation can begin.

Fortunately, the heat wave is subsiding and I am particularly grateful seeing as I'd have to walk through the heat in my lab-appropriate long pants and cardigans. Today I met another high school student who is beginning a two year stint in the lab for INTEL and other lab research work, so it's good to see a familiar unsure face. The rest of the lab is graduate students and post-doc's from all over the world. According to Medini, it's one of the most diverse group of people around. From China to Brazil to New Jersey, the lab guarantees a lot of learning experience.

Saturday, July 20, 2013

NLP week 4, 5: a crazy (but lazy) workaround and some machine learning

Hi! My name is Jiehan Zheng and I work at CCLS at Columbia University on natural language processing and machine learning. I've been doing some training data collection, evaluation and postprocessing work in previous weeks, and finally in week 4 and 5 I get to do some real machine learning! It's been busy but interesting two weeks. I did too many things and I am not sure if I can recall all of them...but let me try--

After building the model comparison framework in HTML and JavaScript, Apoorv asked for a new feature--calculating p-values from Χ^2 from McNemar's test for models to indicate how differently any two models perform on the dataset. I had no clue how to do this, even after looking at Wikipedia and several papers, so I asked Apoorv how he used to calculate the p-value before having my framework. He sent me a MATLAB function file that he used to use at IBM.

I tried the MATLAB on Columbia server and verified that this function file works. Then another software, Octave, immediately came up to my mind. Octave is open-source and although it doesn't advertise as so, it is known to be an open-source "implementation" of MATLAB's features, so that everyone can use it freely. So I ran the function in Octave as well and it works too.

I then looked into the source code of that MATLAB function (although I've never used MATLAB before...), and found out that although most of the calculation steps are fairly simple and straightforward, it calls a function called chi2cdf() from MATLAB. By looking at MATLAB documentation I found that the chi2cdf() function, as expected, contained a definite integral in it. So I basically ran into the same problem of not being able to calculate definite integral in JavaScript.

Then, I don't know why, a crazy (but lazy) idea came to my mind... If it is hard to calculate definite integral in JavaScript, then why bother doing it in JavaScript?!?! I can simply install Octave on my server and set up an API on the server to pass the X^2 to Octave and ask Octave to calculate p-value for us! I quickly installed Octave and wrote up a short Node.js program that listens for requests, and spawns a new Octave process whenever it receives a request, pass the X^2 value to Octave's chi2cdf() function and collects its output then returns it to the browser. So whenever Apoorv enters the command to calculate p-value in the browser, my code is going to send a request to my server and wait for the server's response and display that answer. And very luckily I proved this idea actually works and was eventually able to implement this in a few hours.

Well, that's all I can say for now...which is the work I did in the afternoon on July 4. For the rest of the time I wrote code to extract features and making training examples (without any annotation from human!) in preparation for a machine learning task (sequence labeling), and got some training data from several websites and stored them to a MongoDB database. I also posted an answer on Stack Overflow for the first time while looking for solutions to a PyMongo error and after reading some MongoDB's documentation! Unfortunately I can't share more details on the machine learning task for now but I will in the future!

Oh, and on July 4th Apoorv asked me if I'd like to see fireworks at Dr. Rambow's apartment building. I went and it was very amazing! I also changed my plane tickets so that I can extend my internship by a few days. So now officially I will work at CCLS with Apoorv for almost 7 weeks!

Thanks for reading! See you next week!

Thursday, July 18, 2013

Weeks 3-4 Mendelsohn Lab

Hi this is Jason from the Mendelsohn lab at the Columbia University Medical Center. Compared to my first two weeks at the lab, I have been given a lot more work to do. For most of my time, I have been able to section many paraffin blocks embedded with either a mouse embryo or a mouse bladder using a rotary microtome. By cutting and collecting the important tissue, microns in thickness, we are able to take an even clearer look at the development of bladder cancer. I also have been able to stain many of my slides using the H&E staining method or by the immunostaining method, applying specific antibodies. Yesterday, I witnessed my first mouse sacking. Katya brought one of the mice to the lab in a small white box. I didn't know what was inside until she told me. I was startled when she opened the box and held the small black rodent in her hand. Then she put one hand on its neck and the other on its tail and pulled, dislocating its head. I was really freaked out but tried not to show it. She laughed at my attempt to conceal my emotions. She cut open its abdomen and took out its bladder. I was amazed. I have dissected a number of animals in the classroom, but I have never witnessed the actual sacrifice of a live animal to be dissected seconds later. It was a whole new experience for me.

I can't believe its been almost a month since I first started and the time spent at the lab has been flying by. Overall, it has been a great experience working in the lab.

(Paraffin block-mouse embryo)

(One section of the block in the water bath)

(1 1/2 weeks of cutting and staining)

(Microtome and water bath)

Saturday, July 13, 2013

Last Week at The Park Group - Week 5

Hi, this is Alyssa and I'm writing about my last week working as an intern at The Lenfest Center for Sustainable Energy at Columbia University.

Dr. Peretz visited my lab on Monday! I showed her around the 2 labs I primarily worked in, and we chatted with my PI, PHD student as well as other people in the lab. Reflecting on my experience, I would highly recommend my lab to future EXPers. The people here are always willing to help, and Professor Park even includes us the interns in group activities. It's like a big family: we work hard, and at the same time have a lot of fun.

On Monday night, we had a group dinner at an Indian restaurant nearby to celebrate's Camille's new job in London and to say farewell to her. I'm grateful for her guidance over the past few weeks and I'm glad that I get opportunities to bond with the rest of the Park group.

The last week of my summer EXP, our experiments were mostly on Tuesday and Wednesday. Naimun and I were asked to repeat the dilution experiments on 2 more batches of samples (same experiment as the one in week 2). So we acquired the solutions through the pump, and the two of us headed to the third floor to carry out dilution for the two batches of samples as directed by the lab manual. The rest of the week, we also used the ICP for a few more times.

So that's about everything I did during my last week at the Park Group. I said goodbye to my lab on Friday and took a midnight flight back home. It was really, really nice working here this summer, and I would like to thank both Peddie and Professor Park for the opportunity. I will have a chance to present my research at Peddie next fall using powerpoint and poster, so see you then :)

Wednesday, July 10, 2013

Computer Networking Lab weeks 2-5

Hi, this is Sohan and I am working in a computer networking lab at Columbia University this summer.

Over the past few weeks, my daily work has remained very similar. It mostly consists of running our experiment remotely on the Orbit Lab. Slowly I have progressed to running the experiment on all 400 nodes, which is a bigger accomplishment that it seems. More often that not , the nodes are unresponsive, or are unable to upload the requisite operating system for us to run our experiment unto their consoles (we usually run the most basic version of Ubuntu available but other experimenters upload other various operating systems best suited to run their experiments). On a good test run, typically 100 or so are unresponsive and another 25 fail to upload the Ubuntu operating system, but obtaining data for 275 nodes is more than enough.

More recently, Varun has had be deal with other pieces of code besides shell scripts. When running the experiment, we retrieve the data and it is stored as several hundred .pcap files on the computer (.pcap files are typically used to log the traffic passing across a wireless network). In essence, the .pcap files are a collection is millions and millions of 1's and 0's which obviously mean nothing to humans, but which the computer can interpret. In order to process the .pcap files, we use a Python script (python is a programming languages) which ultimately outputs information such as the total number of bytes transmitted or the average speed at which these bytes are traveling in KBPS (kilobytes per second). Working with Python was very new to me since prior to this, I had done most of my coding in languages like Java or in C/C++, but nonetheless, learning a new language like Python can never hurt.

The last new major change in my routine has been the implementation of certain MATLAB scripts (MATLAB is also another programming language). The MATLAB scripts are designed to take the specific numbers and data outputted from the aforementioned Python scripts and create histograms based on that data. Data is usually easier to interpret when presented visually and thus we can usually analyze the histograms and see if there are any changes to be made and if the algorithm is working.

Outside of the work, the lab experience has also been helpful and entertaining. There are 4 other member of our lab in the same work area as me and we frequently have conversations as a group about things besides our research. They all give me ample advice on college and what to do/what not to do freshmen year which typically, you can only learn from someone who has experienced it themselves.

I've really enjoyed my time working here thus far and eagerly await the next few weeks of researching!

Monday, July 8, 2013

Working with the ICP - Week 4

Hi, my name is Alyssa and I'm writing about my 4th week at the Park Group at Columbia University's Environmental Engineering Department.

My PI and PHD student finally came back from the conference, and we started using the ICP to analyze our samples. Primarily, we worked in a much larger lab on the 3rd floor, where the rest of the Park Group conducts experiments.

ICP is the abbreviation of Inductively Coupled Plasma mass spectrometer. It is used to detect metals and several non-metals at extremely low concentrations. The machine ionizes the samples with inductively coupled plasma (a type of plasma source in which the energy is created by induction) and then separate the ions using a mass spectrometer.

The process involves high temperature heating in the upper chamber and a cooling mechanism below it. So as I was standing in front of the ICP, it was like being in two worlds - one sweltering and one cold.

I would say the ICP process is rather straight forward - prepare samples, position test tubes, enter commands in the computer program, and GO! But since each sample takes 6-10 minutes and I need to make sure nothing goes wrong during the process, it needs to be watched over. One night, I even had to stay until 8pm to wait till the experiment was done. But the fascinating thing about ICP and the computer program is, I got to watch the testing needle dip into each test tube accurately, knowing exactly where the samples were positioned, and all I had to do is watch this happen.

The rest of the week, I spent time reading papers from Dr. Park and helped Camille dumping some chemicals.

Wednesday, July 3, 2013

NLP week 2, 3: evaluating results and more coding

My name is Jiehan Zheng and I work at CCLS at Columbia University on a natural language processing project on extracting social networks from text with my mentor Apoorv, his colleague Anup and Dr. Rambow. Now I am into my third week here and I am going to recap what I did in my second week and first half of this week. In the first week, I worked on visualizing the generated social network.

I worked on postprocessing and evaluating the results from the NER (named entity recognition) system first. A named entity recognizer takes raw text as input and outputs locations of grouped entity mentions (spans of character offsets counting from the very beginning of text, and by "grouped" I mean entity mentions of the same entity are grouped together in a XML structure under a node) and types of entities (organizations, people, etc). Our team did not write the NER system ourselves because NER is not Apoorv's focus--his thesis is on social network extraction. Anyways, so we have to know how well NER is performing and try to "improve" its result without digging into the NER itself.

There were two problems. First, the NER sometimes mistakenly splits entities that are meant to be the same into multiple entities. This messed up the generated social network because then you are having more than two vertices for the same person, thus distracting the viewer. For instance, for Alice in Alice in Wonderland, entity #1 (first entity that NER gave us) had 67 entity mentions of "alice", 3 of "poor alice" among many other entity mentions like "she", "her", etc. Entity #81, meanwhile, had 38 mentions of "alice" and 3 mentions of "alice hastily", etc. We need to merge these. Clearly, to humans we know that 1 and 81 both refer to the main character Alice in the novel, yet how can we have computers to make similar decisions?

Our solution was to find all the different entity mentions from the output and create feature vectors with them. For the sake of simplicity, let's say that in a NER output, if we ignore all the words like "she" and "he", only "alice", "poor alice", "a little girl", "queen", "alice hastily", "her sister", "king" were mentioned. We will create a feature vector of (# of occurrences of "alice", # of occurrences of "poor alice", ..., # of "king") for each entity in the output. Maybe E1 = {"alice"x67, "poor alice"x3}, then E1 will be given a feature vector of (67, 3, 0, 0, 0, 0, 0). Similarly, E81 will have (38, 0, 0, 0, 3, 0, 0). Then if we think of them being vectors in 7-dimensional space (about which I have no idea) and calculate their cosine similarity (just learned this last week from Apoorv), they will have a surprisingly high similarity (> 0.99). The implementation of entity merging was to generate a mapping from IDs of one or more entities to the ID of one entity (actually this part of the code is in the screenshot). Say, 1, 13 and 81 are all actually Alice, then we will have a map that maps 13 to 1 and 81 to 1. Then when we present the result to users, I check if the entity is in this duplication mapping.

Running the code to merge entities and guess names

The second problem was that the NER gives us no information about a person's real name or best name. I wrote some code to address this problem. For instance, for entity #6 we have (after removing words like "she" and "he"):

{"a white rabbit"=1, "the rabbit"=16, "the white rabbit"=11, "the white rabbit, who was peeping"=1, "the white rabbit, who said,"=1}

. Clearly this is talking about "the rabbit". In this case, "the rabbit" is the most frequently used entity mention, so my program would pick "the rabbit" as the best name. The reason why we remove the common pronouns is that otherwise we would see a lot of "she" and "he" being picked as the best name, which wouldn't make sense because no one wants to see a social network graph with vertices called "she" and "he" all over the place and interacting with each other. When the entity mentions counts suggest a tie, we choose the first entity mention because most of the time the name is clear when a character is formally introduced.

After this, I wrote a simple script in Python to crawl a website to obtain test text data for later use.

Then I wrote a program to evaluate NER by comparing NER output against the gold standard by paid human annotators. I used a simple spans exact match and it didn't work very well. For instance, if there is a span 10000-10002 corresponding to "cat", and there is another span 9996-10002 corresponding to "the cat", my current program would give a score of zero--yet this "cat" and "the cat" mistake is not a serious one, and shouldn't be punished so badly. Because I was interrupted to do some other programs, I didn't get to implement a more flexible span matching method yet, but I will. After this, we also map this into a multidimensional space and calculate the similarity between each output entity from NER and each entity from the gold to see how similar they are.

This Monday I started another small project using Java, HTML and JavaScript to help Apoorv, Anup and Dr. Rambow analyze experiment results for a paper that is due this Friday (I know, it's so close to the deadline now...)! Basically the program makes machine learning examples output from Java, displays them in a webpage, and dynamically inserts columns from experiments provided in JSON format that maps example IDs to scores. It also colors results that agree with the gold green, and red otherwise. The user can type commands in the web console to filter rows (the one in the screenshot means that I want to see only the examples that model 1 got right but on which models 2, 3, 4, 5 all failed. It makes comparing machine learning models much easier.

I expected to do the sentiment analysis starting from week 2, but obviously that didn't happen...but working on postprocessing and making all kinds of utilities is fun, too. Hopefully I will start the coolest part of the research soon!

By the way, we still go to work on July 4th!

Monday, July 1, 2013

Week 1-2 at the Mendelsohn Lab

Hi my name is Jason and I am working at the Mendelsohn lab for the summer. I am beginning my second week here at the lab and already feeling more comfortable. When I arrived at the Columbia Irving Cancer Research Center and stepped into the lab, almost 10 days ago, I was nervous and really early. I got there thirty minutes early thinking that was the right thing to do, but ultimately waited for the next thirty minutes worrying that I was in the wrong building or that they had forgotten about me. Carolina, one of the researchers at the lab, arrived minutes later after my minor panic attack and I was relieved. She was told by my PI, Dr. Mendelsohn, that this was my first day and showed me to the lab. I soon got to meet all six other students and researchers. They were a mix of a medical student, graduate students, and post docs and all welcomed me to the lab. The Mendelsohn lab studies urological cell biology, specifically bladder cancer. Because bladder cancer has many variations and can only be differentiated through histology (the study of tissue through examination under the microscope), the course of treatment may be inaccurate depending on what each histologist sees. This makes potential treatment options useless if the histologist makes the wrong diagnosis. The primary goal of this lab is to find certain genetics markers in the mass growing on the bladder to determine which kind of cancer it is (Transitional cell bladder cancer, non-muscle invasive bladder cancer, invasive bladder cancer, or squamous cell bladder cancer). For the first couple days, I spent my time following, watching, and reading. I eventually learned what my primary job would be at the lab; put simply, it is to paraffin section and to stain. Paraffin sectioning is the process of cutting thin films (5nm) of tissue, in my case mouse bladder tissue or a mouse embryo, embedded into a block of paraffin wax to be then put on slides for further examination. Dan, a medical student, taught me how to paraffin section and it was very difficult at first. The thin film of tissue is so delicate and sticky that it either folds on itself or just rips apart before I could even get close to putting it into the water and onto the slide. After about two days, I got the technique down. Once I would finish the paraffin sectioning, I would move these slides for staining. I have learned two types of staining so far: ABC staining and H&E staining. I just follow the protocol and time how long I put the slides into the numerous solutions; it is very straightforward. (Actually, tomorrow I am going to learn a new type of staining technique called fluorescent staining. I don't really know what it is yet, but it sounds very interesting.) After an hour of reading if I have time, I end my day hopping on the A train and then the M66 to make my way home.

--- [I'll put up photos on the next post.]

Friday, June 28, 2013

More Carbon Capture Experiments & Responsibilities - Week 3

Hi, I'm Alyssa. Just as a reminder, I'm working on Fluidized Bed and Carbon Capture at Columbia University.

The third week, Dr. Park, my PHD student and a few other members of the Park Group went on a science forum in Delaware on carbon sequestration, so I worked for a Post-Doc candidate, Camille, on her carbon capture experiments.

Besides myself and Camille, a few more people were in "The crew": two graduate student interns Flora and Sarah, and Post-Doc June who just arrived at our lab on Monday. We performed 3 main experiments this week. I didn't take pictures of the machines, but I did find some detailed diagrams explaining the experiments.

1. Synthesis of Solvent for CO2 Capture: we made a nano-material solvent to capture CO2 from air. This went on for the whole week, so we did it step-by-step. The final solution came out on Friday, and all of us finally felt relaxed, since we'd be back to square one if we messed up a part of the process.

2. Differential Scanning Calorimetry: we carried out experiments to measure the melting points and glass transition temperatures of about 8 elements/compounds ranging from Boron to CsCl; We also observed the entropy of melting. This experiment was not too hard to operate, because all we had to do was to put the sample in and enter a few commands. It was a very time consuming experiment. We spent on average 2 hours on testing each sample and squeezed time in between testings to carry out experiment 1. We finished it on Wednesday.

Differential Scanning Calorimetry

3. FTIR Spectroscope: The objective of this experiment was to determine characterization of molecular bonds. We put a drop of aqueous solution onto a plate that has a diamond on it (not for decoration purpose, of course =P. Because the diamond is a very reflective material, it enables good reflection and enhances signals). And then, we connected the machine with an oxygen channel. Even though Camille did the first setup herself, she gave us opportunities to connect, disconnect, and operate the system. It's quite a complicated system, but it was fun learning how to link the spectroscope with different tubes. This was the shortest experiment out of all three - a one-day process.

The first experiment was performed at my main laboratory on 10th floor, belonging to Environmental Engineering Department; the last two were done at our other lab on 3rd floor, at Engineering Terrace. So you can imagine us running up and down in the Columbia Mudd Building, allocating time slots to fit all three experiments into our schedule. It's been a fulfilling week, and it was a great chance for me to learn more about carbon capture as well as get to know more people.

And lastly, a GOOD NEWS: ICP machine is back!! My PHD student Helios, undergraduate intern Naimun and I will be using it next week. This is something I've always been looking forward to, and I'm very excited for week 4!

Thursday, June 20, 2013

NLP week 1: visualizing social networks

This summer I work at Center for Computational Learning Systems at Columbia University. It's great to work at a computer "lab," because we basically go to work whenever we want, and leave whenever we want. Why? Because computer science people are typically pretty motivated and self-disciplined, and they probably work even more at home (at night) than at work during the day.

The focus of my mentor, Apoorv, is to have computers extract a social network from text--that's right, it means computers will have to "read and understand" English text!

My job this first week is to simply build a web interface to the system that accepts arbitrary text input from anyone on the Internet, passes it on to the program that Apoorv and his team have built, collects the output of their program, parses it and visualizes it in user's browser. Sounds too abstract? See the following example.

In the screenshot above, the circles (vertices) are "occurrences"--each time a name appears in the text, it is an occurrence. The arrows (arcs) between vertices denote observations (as in "I see you in the restaurant," where "I" am aware of your existence, but "you" are not aware of "my" existence). Another type of connection is called interaction, where both parties are aware of each other. If you are interested in this notation, read Apoorv's paper.

In the next step, I will post-process the result from Apoorv's system, and merge occurrences--as you can see, Charlie appeared three times in the generated graph above, and in the next version it will be merged into one.

In another word, I am just building a demo system to allow more users to try our system. You may have noticed that the arrows are completely in wrong directions, and some of the connections should have arrows in both directions. Yes--it's a known bug, and we are still trying to figure out why.

Let me show you the stack. Bottom-up, there is Apoorv's Java program, and then a TCP socket server that I wrote in Java that listens locally for requests, parses the results in .net format and generates JSON results. A Node.js program that I wrote with the Express framework is on top of my socket server, and it simply serves this webpage above, passes English sentences down to the Java program and transfers JSON results back to the browser. Visualization happens mostly in your browser, in which I used D3.js library to help me to calculate the locations for the circles and lines according to physics laws, and SVG to actually represent them.

Bonus: a graph for you to play around with (a modern browser that supports SVG is required). You can drag anyone to move them around!

Hope you liked it!

The real research hasn't started yet. Hopefully I will actually get to the natural language processing part as early as next Monday, after I finish working on this web interface. So, see you next week!

Wednesday, June 19, 2013

An Opportunity to Learn about Carbon Capture - Week 2

This is Alyssa, I've just finished up my second week of research on fluidized bed at Columbia University.

The beginning of week 2, I finished up looking at my assigned reading from the book Fluidization Engineering. At first, I thought I was going to carry out the fluidization experiments this week, but it turned out that the computer still needed a compatible program. So most of the week a master student at my lab was working on MATLAB and I helped out at another project- carbon capture.

One thing I'm glad I did was reaching out to another intern to learn about his carbon capture work. It seemed to be very interesting when I watched him do his experiments, but I did hesitate a bit before asking if he needed help because he was quite busy. Nevertheless, I approached him with questions regarding his work and offered help. He explained to me what carbon capture is and gave me a few things to do - mixing solutions, testing pH values, acquiring data... It is indeed my area of interest and I enjoyed studying carbon capture very much. I learned from my experience that helping out at another project is always a good way to know more about other areas of study. Dr. Peretz's advice was very useful - we should always look for things to do when we are free and learn about what others in our lab are doing.

Another great experience this week was joining the other members of the Park Group to attend a meeting about the future of carbon capture. The lecturer talked about the present situation of the project and pointed out a few problems about it. At first, the researchers saw it a promising project because according to the experts, the price of building carbon pipes would be quite cheap. Yet, from what the lecturer said, the old age of power plants and the unexpectedly high cost of building carbon pipes (20-30 Million USD per 50 miles) are problems researchers face. Then, Dr. Park led us to discuss whether this project would have a future, since the current economic situation does not provide researchers with sufficient capital (the marginal profit for carbon capture is low) and people are not really aware of the environmental benefits it would bring in the long run. Through the discussion, I realized that there are a lot of factors influencing the popularity and plausibility of one research, and scientists need to be critical of the information given because it often changes.

To elaborate on carbon capture, it is basically a way of storing carbon and lowering greenhouse gas emission. Here are some pictures of the carbon capture experiments I did.

pH value measuring instrument

Pump used for water dripping experiment. Typically we put it up on a shelf, insert syringe containing DI water and mineral, then get samples in time intervals of 5s, 10s, 30s, and 60s

Test tubes containing liquid samples obtained in 4 attempts

I had a fun and educational time in my lab this past week. I hope everyone else's research is going well too!