A hopefully comprehensive list of currently 284 tools used in corpus compilation and analysis.
This list is kept up to date by its users. Hence, please feel free to contribute by suggesting new tools.
You can also make suggestions, e.g., corrections, regarding individual tools by clicking the ✎ symbol. As this is a non-commercial side (side, side) project, checking and incorporating updates usually takes some time.
There is also a comprehensive list of all tags in the database.
Tool | Description | Tags | Platforms | Pricing |
---|---|---|---|---|
@nnotate ✎ | Semi-automatic annotation of corpus data | annotation | Solaris, Linux | Free (with licence agreement) |
ACTRES Corpus Manager ✎ | A corpus compilation and analysis platform with a focus on multilingual and parallel corpora. | compilation, corpus management, annotation, multilingual | Web | Commercial |
AMALGAM ✎ | Tool for grammatical annotation (PoS and phrase structure). Tagging a text that was entered via email. | annotation | Web | Free |
ANVIL ✎ | A tool for video annoation. | video, annotation | Windows, Linux, Mac | Free |
Atomic ✎ | Multi-layer corpus annotation platform. | annotation | Linux, Mac, Windows | Free |
BFSU Qualitative Coder ✎ | A tool for manual coding of corpora | coding, annotation | Windows | Free |
CATMA (Computer Assisted Text Markup and Analysis) ✎ | An undogmatic, complex annotation and analysis package. | markup, analysis, visualization, annotation | Web | Free |
CorefAnnotator ✎ | An annotation tool for coreference. | corerference, annotation | Windows, Linux, Mac | Open Source |
Corpona ✎ | A Python library for processing XML- and JSON-based corpora. | library, XML, JSON, annotation | Python | Open Source |
DART ✎ | An annotation tool and research environment for annotating dialogues. | dialogues, annotation | Windows | Free |
Dexter ✎ | Tool for text annotation | annotation | Linux, Mac, Windows | Free |
DISCO ✎ | Corpus pre-processing tool for a variety of languages that Dallows to retrieve the semantic similarity between arbitrary words and phrases | tokenization, annotation | Windows, Linux, Solaris, and MacOS | Free |
DisMo ✎ | An automatic multi-level annotator for spoken language corpora. | spoken, multilevel, multi-layer, pos tagger, annotation, tagging | ||
ELAN ✎ | Transcription and annotation of sound or video files | transcription, annotation | Linux, Mac, Windows | Free |
Emdros ✎ | A database engine fpr analyzed and annotated text. | database, annotation, query | Windows, Linux, Mac | Free, Open Source |
EXMARaLDA ✎ | Tool for transcription, annotation, corpus analysis of spoken data | transcription, annotation, analysis | Free | |
INCEpTION ✎ | A semantic annotation platform that offfers intelligent annotation assistance and knowledge management | annotation, multi-layer annotation, computer-assisted annotation, web-based | Web | Free, Open Source |
Lexonomy ✎ | A tool for writing and publishing dictionaries and other dictionary-like things. | dictionary, publishing dictionary, annotation | Web | Free |
LightTag ✎ | A commercial text annotation tool focused on managing and working with teams of annotators. | annotation, tagging, ai-tagging | Web | Commercial |
MMAX2 ✎ | A multi-level annotation tool | annotation, multilevel, multi-layer | Java | Free, Open Source |
PACTE ✎ | A flexible collaborative text annotation platform that is currently in development. | annotation | Web | Free (for research) |
PALinkA ✎ | Annotation tool | annotation | Down | |
Praaline ✎ | Praaline is a system for metadata management, annotation, visualisation and analysis of spoken language corpora. | speech, prosody, spoken, annotation, concordancer, search, visualization, converter, analysis | Windows, Mac, Linux | Free / Open Source (GPL3) |
RSTTool ✎ | Tool that can annotate texts for constituency and rhetorical structure | annotation | Windows, Macintosh, UNIX and LINUX | Free |
Sketch Engine ✎ | A corpus manager and text analysis software developed by Lexical Computing. | annotation, concordancer, tagging, sampling, search, visualization, wordlists, keywords, compilation, text analysis, n-grams, collocation, statistics, segmentation, analysis, crawler, parallel, colligation, annotations, tokenization, query, ngrams, boilerplate remover, comparison, frequency analysis, information retrieval, data, sentence boundary, corpus creation, duplicate remover, regex, thesaurus, meta modelling, dictionary, text-processing, xml, frequency, trends patterns, web-based, collocates, collocation analysis, word cloud, coocurence, KWIC, corpus management, multilingual, NLP, diachronic analysis, term extraction, keyword extraction, bilingual term extraction | 30-day free trial then starts at 4.83 €/month | |
SLATE ✎ | SLATE is a python-based CLI annotation tool. It is very lightweight and can be used for various types of span-based annotation. | annotation | Python | Free, Open Source |
SPPAS ✎ | A tool for the automatic annotation and analysis of speech. | speech, spoken, annotation | Windows, Mac, Linux | Free, Open Source |
SPre ✎ | Tool for segmenting and annotating texts | annotation | Free | |
Synpathy ✎ | Tool for manual syntactic annotation | annotation | Windows, Mac, Linux | Free |
tagtog ✎ | A text annotation tool specifically built to train AI/ML models. | machine learning, annotation | Cloud-Based | Commercial |
The Simple Corpus Tool ✎ | A corpus analysis toolkit that supports XML annotations. | concordancer, annotation, xml, frequency | Windows | Free |
TreeTagger ✎ | Tool for annotating text with part-of-speech and lemma information | pos tagger, annotation | Windows, Mac, Linux | Free |
UAM CorpusTool ✎ | Text annotation tool and statistics for various types of linguistic analysis and multilayer annotation | annotation, multi-layer annotation, computer-assisted annotation | Free | |
UAM ImageTool ✎ | Image annotation tool for visual data corpora | annotation | Free | |
UBIAI ✎ | A NLP-oriented text annotation platform for teams with comprehensive auto-annotation features. | annotation, NLP | Web | Commercial |
VideoAnt ✎ | A web-based tool to annotate and discuss web-hosted videos. | annotation, video | Web | Free |
WebAnno ✎ | A web-based annotation tool | annotation, web-based | Web | Free |
WebLicht ✎ | WebLicht is an execution environment for automatic annotation of text corpora embedded with the CLARIN-D project. | annotation | Web | Free (CLARIN-D Account needed) |
Worldbuilder ✎ | Tool for annotation and visualisation in analysis applying text-world-theory | annotation, visualization | ||
YEDDA ✎ | YEDDA is a python-based collaborative text span annotation tool with support for a very wide variety of languages including Chinese. | annotation | Python | Free, Open Source |
QualCoder ✎ | An open source tool for qualitative data analysis that supports coding text and images. | qda, annotation | Windows, Mac, Linux, Python | Free, Open Source |
TEITOK ✎ | A web-based platform for viewing, creating, and editing corpora with rich textual mark-up and linguistic annotation. | visualization, TEI, mark-up, annotation | Linux, Mac | Free, Open Source |
Sanchay ✎ | An open source multi-purpose platform focused on South Asian languages. | annotation, tagging, chunking | Windows, Linux | Free, Open Source |
Text-Fabric ✎ | A Python library for processing corpora (especially based on ancient texts) as annotated graphs. | graph model, annotation, python | Python | Free, Open Source |
Last Updated: December 11, 2024.
In case you are interested, the data is also available in JSON format.