Tools for Corpus Linguistics

A hopefully comprehensive list of currently 266 tools used in corpus compilation and analysis.

This list is kept up to date by its users. Hence, please feel free to contribute by suggesting new tools. You can also make suggestions, e.g., corrections, regarding individual tools by clicking the symbol. As this is a non-commercial side (side, side) project, checking and incorporating updates usually takes some time.

Suggest a Tool

Tools [annotation]

Tool Description Tags Platforms Pricing
@nnotate Semi-automatic annotation of corpus dataannotationSolaris, LinuxFree (with licence agreement)
ACTRES Corpus Manager A corpus compilation and analysis platform with a focus on multilingual and parallel corpora.compilation, corpus management, annotation, multilingualWebCommercial
AMALGAM Tool for grammatical annotation (POS and phrase structure). Tagging a text that was entered via email.annotationWebFree
ANVIL A tool for video, annotationWindows, Linux, MacFree
Atomic Multi-layer corpus annotation platform.annotationLinux, Mac, WindowsFree
BFSU Qualitative Coder A tool for manual coding of corporacoding, annotationWindowsFree
CATMA (Computer Assisted Text Markup and Analysis) An undogmatic, complex annotation and analysis package.markup, analysis, visualization, annotationWebFree
CorefAnnotator An annotation tool for coreference.corerference, annotationWindows, Linux, MacOpen Source
Corpona A Python library for processing XML- and JSON-based corpora.library, XML, JSON, annotationPythonOpen Source
DART An annotation tool and research environment for annotating dialogues.dialogues, annotationWindowsFree
Dexter Tool for text annotationannotationLinux, Mac, WindowsFree
DISCO Corpus pre-processing tool for a variety of languages that Dallows to retrieve the semantic similarity between arbitrary words and phrasestokenization, annotationWindows, Linux, Solaris, and MacOSFree
DisMo An automatic multi-level annotator for spoken language corpora.spoken, multilevel, multi-layer, pos tagger, annotation, tagging
ELAN Transcription and annotation of sound or video filestranscription, annotationLinux, Mac, WindowsFree
Emdros A database engine fpr analyzed and annotated text.database, annotation, queryWindows, Linux, MacFree, Open Source
EXMARaLDA Tool for transcription, annotation, corpus analysis of spoken datatranscription, annotation, analysisFree
INCEpTION A semantic annotation platform that offfers intelligent annotation assistance and knowledge managementannotation, multi-layer annotation, computer-assisted annotation, web-basedWebFree, Open Source
Lexonomy A tool for writing and publishing dictionaries and other dictionary-like things.dictionary, publishing dictionary, annotationWebFree
LightTag A commercial text annotation tool focused on managing and working with teams of annotators.annotation, tagging, ai-taggingWebCommercial
MMAX2 A multi-level annotation toolannotation, multilevel, multi-layerJavaFree, Open Source
PACTE A flexible collaborative text annotation platform that is currently in development.annotationWebFree (for research)
PALinkA Annotation toolannotationDown
Praaline Praaline is a system for metadata management, annotation, visualisation and analysis of spoken language corpora.speech, prosody, spoken, annotation, concordancer, search, visualization, converter, analysisWindows, Mac, LinuxFree / Open Source (GPL3)
RSTTool Tool that can annotate texts for constituency and rhetorical structureannotationWindows, Macintosh, UNIX and LINUX Free
Sketch Engine A corpus manager and text analysis software developed by Lexical Computing.annotation, concordancer, tagging, sampling, search, visualization, wordlists, keywords, compilation, text analysis, n-grams, collocation, statistics, segmentation, analysis, crawler, parallel, colligation, annotations, tokenization, query, ngrams, boilerplate remover, comparison, frequency analysis, information retrieval, data, sentence boundary, corpus creation, duplicate remover, regex, thesaurus, meta modelling, dictionary, text-processing, xml, frequency, trends patterns, web-based, collocates, collocation analysis, word cloud, coocurence, KWIC, corpus management, multilingual, NLP, diachronic analysis, term extraction, keyword extraction, bilingual term extraction30-day free trial then starts at 4.83 €/month
SLATE SLATE is a python-based CLI annotation tool. It is very lightweight and can be used for various types of span-based annotation.annotationPythonFree, Open Source
SPPAS A tool for the automatic annotation and analysis of speech.speech, spoken, annotationWindows, Mac, LinuxFree, Open Source
SPre Tool for segmenting and annotating textsannotationFree
Synpathy Tool for manual syntactic annotationannotationWindows, Mac, LinuxFree
tagtog A text annotation tool specifically built to train AI/ML models.machine learning, annotationCloud-BasedCommercial
The Simple Corpus Tool A corpus analysis toolkit that supports XML annotations.concordancer, annotation, xml, frequencyWindowsFree
TreeTagger Tool for annotating text with part-of-speech and lemma informationpos tagger, annotationWindows, Mac, LinuxFree
UAM CorpusTool Text annotation tool and statistics for various types of linguistic analysis and multilayer annotationannotation, multi-layer annotation, computer-assisted annotationFree
UAM ImageTool Image annotation tool for visual data corporaannotationFree
UBIAI A NLP-oriented text annotation platform for teams with comprehensive auto-annotation features.annotation, NLPWebCommercial
VideoAnt A web-based tool to annotate and discuss web-hosted videos.annotation, videoWebFree
WebAnno A web-based annotation toolannotation, web-basedWebFree
WebLicht WebLicht is an execution environment for automatic annotation of text corpora embedded with the CLARIN-D project.annotationWebFree (CLARIN-D Account needed)
Worldbuilder Tool for annotation and visualisation in analysis applying text-world-theoryannotation, visualization
YEDDA YEDDA is a python-based collaborative text span annotation tool with support for a very wide variety of languages including Chinese.annotationPythonFree, Open Source

Last Updated: May 15, 2022.

In case you are interested, the data is also available in JSON format.