Peddie School EXP 2013: 2013-08-11

Saturday, August 17, 2013

NLP week 7: finishing up pipeline wrapper and web interface

Hi again, my name is Jiehan Zheng. I worked on NLP and some machine learning at Columbia University.

I skipped writing about week 6 because we were working on something secret! We will publish our work on that during the upcoming fall term if things go smoothly. So I am writing about my work during my 7th week. I was too busy working on the project so I didn't have time to post updates to this blog...

Since week 7 is the last week I physically work at CCLS at Columbia University this summer, we chose to finish up things that require face-to-face collaboration first, so we don't have to wait on each other to finish our tasks. My work on the web interface and pipeline wrapper would be the thing that we have to finish together before I go--so the last week I mainly worked on pipeline wrapper and web interface.

Apoorv's work is on the pipeline that takes in sentences, gold dependency parse trees, semantic parse trees, and entity annotations. It spits out a file in graph modeling language containing the interactions between entities. In order to make the pipeline work for any unprocessed text and returns a social network, it has to be wrapped around by some wrapper code--I named that part of the code as "pipeline wrapper," and I feel like that's a smart name, isn't it?

So the pipeline wrapper has to take in raw text, split it into sentences and call various parsers and process the result from the parsers into a format that the pipeline expects. There was code on that but it no longer worked, and when it was working, it was poorly written and inefficient. I rewrote the wrapper in a more organized way. For instance, the old wrapper had to call NYU Jet's main method twice to get named entities and split sentences separately--I read Jet's source code and managed to call Jet once and get both information, making it faster. I also prevented Jet from performing useless operations that takes time, like relation extraction.

Then the pipeline gets dependency parses from Stanford parser. My refactoring effort also enables us to run multiple tasks in parallel. For instance, we are going to run CMU's SEMAFOR semantic parser as well in the future, and running SEMAFOR takes a long time. Had we added SEMAFOR to the old wrapper, it has to wait until Stanford parser finishes its job. With the new structure, SEMAFOR and Stanford parser runs in different processes, can take advantage of multiple CPU cores and run at the same time, cutting the running time by at least 50%. SEMAFOR integration is a bit harder than other parsers, so I decided to work on that after I go back to China.

After we have all the parses and other files, the wrapper calls the pipeline with the files, and waits for pipeline to finish processing the files. Once it gets the interactions in text, the wrapper calls the postprocessor that I made during week 2 which merges duplicate entities, finds out the best name for each entity, analyzes interactions and finally organizes these information and outputs a social network file.

The web interface is just some pure programming effort and is nowhere as interesting as working on the pipeline wrapper and other machine learning aspects. My work on the pipeline wrapper, postprocessor and web interface had been included in a demo paper that is going to be presented in IJCNLP 2013 this October in Japan, and I've been made an co-author on that paper--I am very excited for that!

Apoorv and I have made that arrangement with Mr. Corica that I will be continuing our work on that "secret project" as an independent project at Peddie during my fall term. This is indeed a very precious opportunity for me to learn more machine learning--from implementing tools, extracting features, run experiments and tune SVM parameters and our features, to finally evaluating the result.

As for the rest of my summer, I did figure out a way to integrate SEMAFOR so I will spend some time to make enhancements to the web interface and pipeline wrapper by adding in SEMAFOR integration. I will describe more in my next blog post!

Friday, August 16, 2013

Week 8 - 10

Hello, this is Jacky Jiang from McAlpine group in Princeton. It has been a long time from last post. In the past three weeks, we keep on testing the efficiency of our methods to make thylakoid and the concentration of our chlorophyll.

To get better concentration, I need to come up with different methods of producing the thylakoid. As I have tried the classic procedures, the basic steps would be similar. First of all, I need to use a lab blender to blend the mixture of alginate leaf pieces with grinding buffer. After we get the solution, we will put it into the centrifuge for the pellet. The pellet we get need to go through another step, which is called resuspension. In this step, we put the pellet into the washing buffer and do the resuspension procedure. Then, we need to go through few more times of centrifuge to get the concentrated thylakoid. To make modification, I tried different kinds of centrifuge rate, which made the composition of the pellet different. This change could be very critical. Since the nuclei and other fragments of plant cells have different density, the centrifuge rate determines which component would be at the bottom of the pellet. What’s more, I also change the grinding buffer I used for blending the mixture. The different concentration of tricine would make the grinding level different, so that the size of membrane fragments would be different, too.

After I tried different methods, the results comes out that the classic steps with appropriate concentration of grinding buffer and high centrifuge rate worked the best. I also tried to change the sequence of centrifuging and buffer mixing, which didn’t turn out well in the end.

To determine if our thylakoid is efficient enough, we still need to determine it by the concentration of chlorophyll. The experiment methods are the same as we did last time. The chlorophyll concentration in the thylakoid suspension is determined by adding 0.10 mL of the suspension to 10 mL of 80% acetone in a test tube. This solution is mixed by inverting several times and then filtered through a Whatman filter paper into a large cuvette using a 50 mL glass funnel. The absorbance of the green solution is measured at 663 nm and at 645 nm using 80% acetone to zero the spectrophotometer. The concentration of chlorophyll in the original sample is calculated using the relative equation.

Our thylakoid concentration has improved a lot after we modify the methods. In the rest of the summer, we will move on to the electrical part of our project.

Week 4 at Chandran Laboratory

My name is Anna, and I'm working at Dr. Kartik Chandran's Laboratory at Columbia University in Earth & Environmental Engineering.

So this week was mostly defined by transitions. Our batch reactor's finally reached their stable population so we took them and put the reactor into chemostat. Where batch reactor's have nothing going in or out (technically) and are used to watch change over time, chemostat reactors have influent and effluent moving at the same rate. This means that nothing changes: the population is constant, as is the amount of ammonia, nitrite, etc. The reactor is going to spend two weeks stabilizing in chemostat and then the real interesting stuff will begin. Even though I won't be there for it, the next step will be disturbance, or increased feed chemostat. In this phase, the bacteria will be subject to one hour of ammonia loading. Our strain, Nitrosomonas eutropha, is known to prefer larger quantities of ammonia as compared to its N. europa cousins, and hopefully this means that it produces NO and N20 gases differently (i.e. less of them.) However, this ammonia loading will require hourly testing for 12-15 hours every day, so I'm not completely heartbroken to be missing it. After those two weeks, tests will be done to see if the bacteria retained any of the previous traits.

We will be testing for the next few days to get baselines for ammonia, nitrite, hydroxalimine, and some mRNA stuff as well. We will also be creating our own standard curves for the aforementioned chemicals because we are finally getting into work that could be publish-able. Next week I will also be working on some poster drafts to present to Medini and Dr. Chandran.

In the past week, there has been an influx of people coming to the lab, including high school students, and new grad students. (Very thankful that I've had my own desk this whole time.)

Although I can't stick around any longer, I look forward to periodically seeing how this project develops and maybe working with Medini again.

Tuesday, August 13, 2013

Week 6 Mendelsohn lab

Hi again. This is a summary of what I did during my sixth week at the Mendelsohn lab.

I briefly worked on Sol's silk bladder augmentation project by paraffin sectioning some of his blocks. I found these paraffin blocks much more difficult to cut because the silk embedded into the bladder tissue made it harder to fully slice without it ripping apart.

Throughout my stay at the lab, I have been paraffin sectioning many mouse embryos, not fully understanding where and how these embryos have been embedded into the paraffin wax. During the week, Katya brought an E17 pregnant mouse into the lab to remove its embryos. She removed all 13 embryos from the mouse and placed them in 1x PBS, while I prepared 13 tubes of diluted formaldehyde fixing agent. Katya showed me the following steps to dissect a mouse embryo and told me to dissect the rest of the twelve embryos. First, I removed the amniotic sac and cut a small piece of its tail to be further genotyped through PCR. Then I bisected the embryo, under the arms, removing the upper half of the embryo (because we are only looking at the lower half). After bisecting, I moved under the microscope to clean out the rest of the embryo removing everything but the bladder and kidneys. After removing the existing limbs and tail, I placed the embryos into each of the 13 formaldehyde tubes to be fixed and eventually paraffin blocked at the histology department for future sectioning.

Mouse embryo E17

Dissecting microscope

Under the microscope

The next day we ran a PCR on the small pieces of embryo tail (that I previously mentioned) and the following day we ran a gel to confirm the Cre genotyping of the embryos to see which had the gene and which didn't because eventually we want to cross mouses that have Cre with mouses that have Apaf mutations.

Gel

If you have been reading, I haven't explained what this project has been really about. Basically, the Apaf project (the one involving mouse embryos) is about the connections between the ureter and the bladder. In embryos, the ureters are joined in the nephric duct through the common nephric duct. Normally, the ureters would detach from the nephric duct and fuse with the bladder epithelium. This project is trying to analyze Apaf (Apoptotic protease activating factor, one of the major proteins that form the apoptotic regulatory network) mutants to determine whether apoptosis is required for ureter insertion.

Week 5 Mendelsohn Lab

This is Jason again from the Mendelsohn Lab at Columbia. I have been behind writing my blogs and will update you all as soon as possible. I am writing this not actually in my fifth week, but here is what I did anyway.

Throughout the week, I have again been practicing and understanding how to stain slides using immunohistochemistry (briefly mentioned in my previous post). After paraffin sectioning and a day to let the tissue dry and settle on to the slide, these slides were ready for immunostaining. First these slides were deparaffinized in xylene solution and hydrated with ethanol so that the paraffin wax was fully dissolved, leaving only the desired tissue on the slides. After the deparaffinization process, the slides underwent heated antigen retrieval, which means the slides are placed back to back in a pH 9 buffer at 100˚C and steamed for 30 minutes. We did this because when the tissue is processed into paraffin blocks for sectioning, the tissue is added with fixatives that masks and cross-links its proteins, making successful antibody binding almost impossible. This way in the buffer and hot temperature, these fixed proteins were unfolded allowing our specific antibodies to successfully bind. After the 30 minute steam, the slides went straight into PBS .1% triton (a very common buffer solution) for 15 minutes to wash. Then horse serum blocking solution was applied to the slides for 90 minutes to reduce background or unspecific staining. After the blocking solution, the slides were ready for the specific antibody application. Lastly, I applied DAPI (a fluorescent DNA stain), washed in PBS .1% triton, and put on the cover slips.

Staining hood (Deparaffinization on the right)

Slides deparaffinized and hydrated

Heated antigen retrieval

It might seem like a lot to remember at first, but after a couple times of practice you get the hang of it.

My bench

On Wednesday, we had a formal lab meeting where everyone in the lab presented there work. I listened and learned as the other five lab members explained their projects. Katya, Kerry, Tammer, Hanbin, and Sol all amazed me with the work they were doing. The Mendelsohn lab focus spanned far beyond just bladder cancer and touched upon several different areas within the field of urology. For example Sol's project involves bladder augmentation using silk fibers as a scaffold to increase the size of the bladder and lower its pressure for impaired bladders.

I have continued my work on the BBN and Apaf projects and will explain them later in my next post.