This is a list of NLP projects we would like to see realized someday, all involve a major software component for which programming skills are required. It may provide inspiration for student projects:
Reimplementation of Dependency Parser in Frog NLP suite(contact: Antal van den Bosch)
The current implementation of the parser module is in Python, but this needs to be rewritten in C++ within the existing framework. The quality of the parser isn't so good either, so improvements are very welcome!
Adaptation of context-sensitive spelling check system Valkuil/Fowlt for other languages – (contact: Antal van den Bosch)
This work would mostly entail gathering training data and divising and training various classifiers specialised in the correction of a word/phrase. The website backend is written in Python/Django and the classifier modules are written in C and use Timbl.
Adaptation of Frog NLP suite for other languages – (contact: Antal van den Bosch)
Frog is an NLP suite for Dutch that does Part-of-Speech tagging, morphological analysis, lemmatisation, Named Entity Recognition, Shallow Parsing and Dependency Parsing. Most modules are based on machine learning techniques with Timbl. Adaptation for other languages would involve finding new training sources and building training data for the various modules, extending them where necessary. Some C++ knowledge is necessary.
wiki/software_project_ideas.txt · Last modified: 2014/09/15 13:12 by http://openid.science.ru.nl/proycon/