Natural Language Processing and Me – 1

I have been working as an undergrad researcher on the topic of  “Natural Language Processing” for like one year now, and I haven’t written any details about it!

I made a post about some introduction to Mizo language processing. I mainly focus on Machine Translation only, and that is also a very basic introduction! So, I’d like to share a few things that I knew about Natural Language Processing. I will make a post about NLP  in series because NLP is a very vast field and consists of many sub-topics.

It will not be like a lecture or instruction, but merely my opinions and my ideas. Some of it may be wrong, may not be applicable in NLP. But still, I’d like to share my opinions!

Me and my colleagues are working on Information Retrieval for Indian Languages, mainly Mizo language! Our research paper haven’t been published yet, so I will speak later.

Last winter vacation i.e December 2015 to January 2016, we are working on Mizo Parts of Speech Tagging, which is the first time for Mizo language. We have also created Mizo POS Tagger, which will soon be available to download for the public!

Mizo language, in particular, has lots of ambiguity! Sometimes working on it makes me feel very dizzy! But, as I am working with a great programmer and a brilliant mind, we are able to overcome most of the ambiguities!

Part of Speech Tagging for Mizo somehow requires a good knowledge of grammar! Since POST is one of the very important pieces of NLP, there is no other option other than to solve the ambiguities.

If you may wonder what’s the necessities of it, you can see one application called Stanford Parser. Each word in a sentence is a part of speech!

Consider a sentence:

Jeremy is a good programmer living in India

If you parse it in Stanford Parser, you will find the POS tagging result as:


Jeremy/NNPis/VBZa/DTgood/JJprogrammer/NNliving/VBGin/INIndia/NNP

NNP, NNVBG, VBZ, JJ, DT, etc. are the part of speech tags! The different languages will have different POS tags as the part of speech for different languages is different!

If you wanna read more about Part of Speech tagging, you could find more about it on Wikipedia.
As I said, I don’t have deep knowledge about POS tagging, I better not talk much about it!

One thing I knew about NLP is that, before we let our machine does work on its own, we have to do it for them manually!
Our POS tagger is also one such application that will be used to tag a POS manually! Once we finished all the documentation, I will share it for download! Maybe someday it will be one important tool for future researchers of Mizo NLP!