Resources

A curated list of resources dedicated to Natural Language Processing (NLP) for the, or with support to, the Portuguese language.

Generic Software Suites

  • FreeLing: an open source language analysis tool suite, providing language analysis functionalities (morphological analysis, named entity detection, PoS-tagging, parsing, Word Sense Disambiguation, Semantic Role Labeling, etc.) for a variety of languages (English, Spanish, Portuguese, Italian, French, German, Russian, Catalan, Galician, Croatian, Slovene, among others).
  • Natural Language Toolkit: platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries.

Morphological Analyzer

  • Lingua::Jspell: an open source morphological analyzer based on the well known ispell orthographic corrector. It includes dictionaries for Portuguese, English and Latin.

POS Taggers

  • TreeTagger: Closed source but free to use Part-Of-Speech tagger. Includes models for a variety of languages including English, German, Portuguese, Russian and others.

Corpora

  • Per-Fide Project: provides a set of parallel corpora, of different domains, in several languages (including Portuguese).
  • Universal Dependencies: provides a set of annotated treebanks for different languages, including Portuguese.

Perl Modules

Dictionaries