User Tools

Site Tools


wiki:ponyland:software

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

wiki:ponyland:software [2019/04/25 15:23] (current)
Line 1: Line 1:
 +====== Software on Ponyland ====== ​
  
 +Various categories of software are available on ponyland. There are the globally installed packages, automatically available for everyone, there is an extensive LaMachine installation with a lot more software which you explicitly need to activate, and there is extra software for which you also have to explicitly opt-in. The latter is organized in so called //​namespaces//​.
 +
 +===== Globally installed =====
 +
 +Amongst the globally installed packages are all common unix tools, various compilers and interpreters. A full exhaustive list of all installed packaged can be obtained through ''​dpkg -l''​. Below is a list of the most notable software.
 +
 +==== Experiment framework ====
 +
 +  * [[wiki:​ponyland:​sge:​start|SLURM]]
 +  * parallel
 +
 +==== Compilers and interpreters ====
 +
 +  * gcc/g++ (5.4)
 +  * python (2.7)
 +  * python3 (3.5)
 +  * perl (5.22)
 +  * java/javac (Java 8, OpenJDK 1.8)
 +  * R (3.2)
 +  * matlab
 +  * octave (open-source matlab alternative)
 +
 +
 +==== Research Tools ====
 +  * [[http://​rug-compling.github.com/​dact/​|Dact]] //Dact is a tool for viewing and analyzing Alpino corpora//
 +  * [[http://​www.fon.hum.uva.nl/​praat/​|Praat]] //doing phonetics by computer//
 +==== Version control ====
 +
 +//Using some form version control for maintaining your code is highly recommended!//​
 +
 +  * [[http://​learn.github.com/​p/​intro.html|git]] (See: [[wiki:​git:​start]])
 +  * svn (subversion)
 +  * bzr (bazaar)
 +  * hg (mercurial)
 +  * cvs
 +
 +==== Editors ====
 +
 +  * vi/vim
 +  * emacs
 +  * joe
 +  * nano/​pico ​ (easiest for beginners, but very few features)
 +  * Graphical Editors (Use [[wiki:​unix:​ssh:​xforwarding]])
 +    * gedit
 +    * sublime-----
 +text (namespace ''​texteditors''​)
 +
 +==== Other ====
 +
 +  *  latex/​pdflatex/​tex
 +  *  gnuplot
 +  *  pdf2txt
 +  *  graphviz (dot/​circo/​etc)
 +  *  antiword
 +
 +===== LaMachine =====
 +//​(Maintained by Maarten van Gompel)//
 +
 +LaMachine groups a large collection of NLP software, including all in-house software developed by the Language Machines research group. Most of the software by Antal van den Bosch, Maarten van Gompel, Ko van der Sloot, and others can be found here. LaMachine will always contain the latest stable releases of our software. See https://​proycon.github.io/​LaMachine for complete information.
 +
 +As a lot of our software and development is Python based, LaMachine comes with a sizeable collection of third party libraries, in addition to our own libraries. The LaMachine installation on Ponyland is built against the globally available Python 3.5.2.
 +
 +LaMachine on Ponyland is set up as a //Virtual Environment//,​ to activate LaMachine from the shell (bash or zsh only), or from within a shell script, just type:
 +
 +<code bash>
 +lm
 +</​code>​
 +
 +or full long form:
 +
 +<code bash>
 +source /​vol/​customopt/​bin/​lamachine-activate
 +</​code>​
 +
 +Your prompt will then be prepended with ''​(lamachine)''​ and your ''​$PATH''​ and ''​$LD_LIBRARY_PATH''​ will be updated. Note that ''​python''​ and ''​python3''​ will now both refer to the Python 3 interpreter (outside of the environment,​ ''​python''​ would refer to the global Python 2.7 installation).
 +
 +
 +Our software:
 +  * **Timbl** [http://​ilk.uvt.nl/​timbl] - Tilburg Memory Based Learner
 +  * **Ucto** [http://​ilk.uvt.nl/​ucto] - Tokenizer
 +  * **Frog** [http://​ilk.uvt.nl/​frog] - Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch.
 +  * **Mbt** [http://​ilk.uvt.nl/​mbt] - Memory-based Tagger
 +  * **Wopr** [http://​ilk.uvt.nl/​wopr] - Memory-based Word Predictor
 +  * **FoLiA-tools** [http://​proycon.github.io/​folia] - Command line tools for working with the FoLiA format
 +  * **PyNLPl** [https://​pypi.python.org/​pypi/​PyNLPl] - Python Natural Language Processing Library (Python 2 & 3)
 +  * **Colibri Core** [http://​proycon.github.io/​colibri-core/​] - Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool colibri-patternmodeller which allows you to build, view, manipulate and query pattern models.
 +  * //C++ libraries// - ticcutils [http://​ilk.uvt.nl/​ticcutils],​ libfolia [http://​proycon.github.io/​folia]
 +  * //​FoLiA-Stats//​
 +  * //Python 3 bindings// - python-ucto [https://​github.com/​proycon/​python-ucto],​ python-frog [https://​github.com/​proycon/​python-frog],​ python-timbl [https://​github.com/​proycon/​python-timbl] ​
 +
 +Third-party Python libraries:
 +  * Numpy
 +  * Scipy
 +  * Matplotlib
 +  * Seaborn
 +  * Pandas
 +  * Scikit Learn
 +  * PyCrypto
 +  * PyCurl
 +  * Django
 +  * Cython
 +  * CherryPy
 +  * IPython
 +  * NLTK
 +  * Textblob
 +  * Keras
 +  * Theano
 +  * Tensorflow
 +
 +**NOTE: The default LaMachine environment contains the latest stable releases of our own and third-party software and will always be updated regularly. Different versions may affect the outcome of experiments! Always log the software version for full reproducibility! See ''​lamachine-list''​**
 +
 +In addition to the stable versions, you can also opt for a LaMachine virtual environment that contains the very latest bleeding edge git versions of all software. This is more volatile, may break at any moment, and is intended for testing the newest features. Activate the development LaMachine as follows:
 +
 +<code bash>
 +lmdev
 +</​code>​
 +
 +or full long form:
 +
 +<code bash>
 +source /​vol/​customopt/​bin/​lamachine-dev-activate
 +</​code>​
 +
 +The stable LaMachine on ponyland will be updated regularly. For the development LaMachine, you can trigger an update yourself by running ``trigger-update.sh`` on ``applejack (mlp01)``.
 +
 +LaMachine is also available as a Virtual Machine and as a Docker container, enabling you to run our software locally and facilitate deployment. See https://​proycon.github.io/​LaMachine for more details. ​
 +
 +If you have own software you want to include as part of LaMachine, or you have third-party Python software you want to be included as part of the ponyland LaMachine virtual environment,​ contact [[proycon@anaproy.nl]] . 
 +
 +
 +===== Opt-in: namespaces =====
 +
 +Namespaces group certain software for which you have to explicitly opt-in. This is done so the software may not cause any conflicts with any versions of the same software you may have installed locally. Setting a namespace means that your ''​$PATH'',​ ''​$LD_LIBRARY_PATH'',​ ''​$PYTHONPATH''​ and various other environment variables will be extended to include software in the namespace. All namespaces are in ''/​vol/​customopt/''​.
 +
 +To use a namespace, add either of the following to the bottom of the file ''​~/​.bash_profile'':​
 +
 +<code bash>
 +# Load namespace:
 +. pathadd $namespace
 +</​code>​
 +
 +<code bash>
 +# Silently load namespace:
 +. pathadd -s $namespace
 +</​code>​
 +
 +The initial ''​.''​ is important (it is an alias to ''​source'',​ but works in all shells). Substitute ''​$namespace''​ by the namespace, for example ''​uvt-ru''​. You can add multiple lines with multiple namespaces.
 +
 +The following namespaces exist and contain the software mentioned below:
 +
 +
 +==== 2. machine-translation ====
 +//​(Maintained by Maarten van Gompel)//
 +
 +  . pathadd machine-translation
 +
 +Contains 3rd party software for Machine Translation:​
 +
 +  * [[http://​www.statmt.org/​moses|Moses]] -- Moses SMT Decoder & Framework
 +  * [[http://​code.google.com/​p/​giza-pp/​|GIZA++]] -- Word alignment algorithms (IBM Models + HMM)
 +  * [[http://​www.speech.sri.com/​projects/​srilm/​|srilm]] - SRI Language Modelling Toolkit
 +  * Machine Translation Evaluation Scripts (BLEU, Meteor, NIST, PER, WER, TER) are available separately in ''/​vol/​customopt/​machine-translation/​mtevalscripts''​
 +
 +
 +==== 3. machine-learning ====
 +//​(Maintained by [[admin]], requested by Iris Hendrickx)//​
 +
 +  . pathadd machine-learning
 +
 +Contains Machine Learning tools
 +
 +  * [[http://​nlp.stanford.edu/​projects/​glove/​|Glove]]
 +  * LCS 
 +  * [[http://​homepages.inf.ed.ac.uk/​lzhang10/​maxent_toolkit.html#​intro|Maxent]]
 +  * [[https://​github.com/​proycon/​paramsearch|Paramsearch]]
 +  * [[http://​svmlight.joachims.org/​|SVM]]
 +    * svm_classify
 +    * svm_learn
 +  * [[http://​disi.unitn.it/​moschitti/​Tree-Kernel.htm|Treekernels]]:​
 +    * svm_classify_tk
 +    * svm_learn_tk
 +  * [[http://​www.lsi.upc.edu/​~nlp/​SVMTool/​|SVMTool v 1.3.2]]  ​
 +  * [[http://​crfpp.googlecode.com/​svn/​trunk/​doc/​index.html|Conditional random fields]]
 +    * crf_learn
 +    * crf_test
 +  * [[http://​mallet.cs.umass.edu|Mallet 2.07]]:
 +    * mallet
 +  * Ripper
 +  * [[http://​www.cs.waikato.ac.nz/​ml/​weka/​|Weka 3.6.8]]
 +    *  weka \\ //Use X-forwarding//​
 +  * Word2Vec
 +
 +== LCS quirks ==
 +
 +The admin gets a lot of questions about the apparent non-functioning of LCS. It does work, but needs a number of requirements satisfied. Please read this before you ask for help:
 +  * Java requires the option to present something visually (but does not use it). Make sure you have X-forwarding enabled, or run in headless mode (-Djava.awt.headless=true)
 +  * LCS only seems to work in your home directory; not on network disks.
 +  * LCS caches files. Remove the cache folder before each run.
 +  * What LCS calls '​data'​ is where your store your models, what LCS calls '​files'​ is where your store your data.
 +  * "​Missing too many documents in index" can also refer to the fact that your models folder (called '​files'​ int he config file) or cache folder is not reachable by Winnow.
 +  * When the test file has an incorrect format, LCS strangely complains about the training file being a directory.
 +  * When you are using the incorrect version of Java (correct version is Java 6), LCS in some cases complains with the error '​comparisonmethod violates its general contract'​.
 +  * The details of the results seem to be influenced by the names of the training files.
 +
 +==== 4. SyntaxNet ====
 +
 +Because SyntaxNet has so many dependencies,​ it did not fit into an existing namespace. Instead, it is a separate virtualenv that you can access by running this command:
 +
 +  source /​vol/​customopt/​syntaxnet/​bin/​activate
 +
 +==== 5. alpino ====
 +//​(obsoleted in favour of LaMachine )//
 +
 +
 +==== 6. nlptools ====
 +//​(Maintained by Maarten van Gompel & [[admin]])//​
 +
 +  . pathadd nlptools
 +
 +Contains various 3rd party software for Natural Language Processing
 +
 +  * opennlp (java)
 +  * Stanford Core NLP (java)
 +  * [[http://​nlp.stanford.edu/​software/​CRF-NER.shtml#​Download|Stanford NER]] (Named Entity Recognizer)
 +  * Python binding for Stanford Core NLP
 +  * FreeLing
 +  * tdp (''​tdparse'',​ ''​tdptrain''​)
 +  * Treetagger
 +  * [[http://​www.d.umn.edu/​~tpederse/​nsp.html|Ngram Statistics Package]]
 +  * [[http://​www.gurobi.com/​index|Gurobi]]
 +
 +Stanford Ner:
 +
 +  # Use shortcut:
 +  ner myfile.txt
 +  ​
 +  # Or
 +  cd /​vol/​customopt/​nlptools/​stanford-ner
 +  java -mx600m edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier classifiers/​english.all.3class.distsim.crf.ser.gz -textFile sample.txt
 +  ​
 +If you want to use Gurobi, please obtain your own Gurobi license by:
 +  * Requesting one here: http://​user.gurobi.com/​download/​licenses/​free-academic
 +  * ''​grbgetkey <​YOUR_LICENSE_KEY>''​
 +  * Save it to your homedir, and you should be ready to go!
 +==== 7. texteditors ====
 +//​Maintained by [[admin]]//
 +
 +  . pathadd texteditors
 +
 +  * [[http://​www.sublimetext.com|sublime-text]]
 +  * [[https://​www.jetbrains.com/​pycharm/​|PyCharm]]
 +
 +==== 8. mongodb ====
 +//​Maintained by [[admin]], requested by Ali Huerriyetoglu//​
 +
 +  . pathadd mongodb
 +
 +
 +  mkdir ~/mongodb
 +  PORT=3573 # Choose a port above 1024
 +  mongod --dbpath ~/mongodb --port $PORT
 +  ​
 +See http://​docs.mongodb.org/​manual/​reference/​program/​mongod/#​bin.mongod for more options.
 +
 +==== 9. nodebox ====
 +//Requested by Peter Berck//
 +
 +  . pathadd nodebox
 +
 +Python Library ([[http://​nodebox.net/​code/​index.php/​Linguistics|Nodebox Linguistics Library]])
 +
 +  * linguistics library (''​import en''​)
 +
 +==== 10. r3 ====
 +
 +   . pathadd r3
 +
 +R, software package for statistical computing. This updates the older, globally installed version (2.14.1) to R3.1.2, of January 2015. 
 +
 +==== 11. python3 ====
 +//​(Obsolete,​ abandoned) //
 +
 +
 +==== 12. python2-packages ====
 +
 +//(No longer maintained, we recommend you use Python 3 instead (with LaMachine if needed)). Set up your own Python2.7 virtualenv otherwise //
 +
 +
 +==== 14. python36 ====
 +//​(Maintained by [[admin]])//​
 +
 +  . pathadd python36
 +
 +==== 15. python3-packages ====
 +//(No longer maintained, use LaMachine instead)//
wiki/ponyland/software.txt ยท Last modified: 2019/04/25 15:23 (external edit)