A curated list of resources dedicated to Natural Language Processing (NLP) for the, or with support to, the Portuguese language.
Generic Software Suites
- FreeLing: an open source language analysis tool suite, providing language analysis functionalities (morphological analysis, named entity detection, PoS-tagging, parsing, Word Sense Disambiguation, Semantic Role Labeling, etc.) for a variety of languages (English, Spanish, Portuguese, Italian, French, German, Russian, Catalan, Galician, Croatian, Slovene, among others).
- Natural Language Toolkit: platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries.
- Lingua::Jspell: an open source morphological analyzer based on the well known ispell orthographic corrector. It includes dictionaries for Portuguese, English and Latin.
- TreeTagger: Closed source but free to use Part-Of-Speech tagger. Includes models for a variety of languages including English, German, Portuguese, Russian and others.
- Per-Fide Project: provides a set of parallel corpora, of different domains, in several languages (including Portuguese).
- Universal Dependencies: provides a set of annotated treebanks for different languages, including Portuguese.
- Lingua-FreeLing3: a library for language analysis with FreeLing3.
- Lingua-FreeLing3-Utils: text processing utilities using FreeLing3 Perl interface.
- Lingua-TreeTagger-Installer: a module to help installing TreeTagger binary and managing dictionaries.
- Lingua-TreeTagger: a module to use TreeTagger from within Perl.