Tools for Corpus Linguistics

A comprehensive list of 262 tools used in corpus analysis.

Please feel free to contribute by suggesting new tools or by pointing out mistakes in the data.

Suggest a Tool

Tags

Everything
annotation
concordancer
tagging
compilation
corpus management
multilingual
rhetorics
parser
pos tagger
text complexity
readability
ANC
sampling
search
visualization
wordlists
keywords
text analysis
converter
n-grams
p-frames
lexical bundles
lexical frames
parser generator
video
qda
mixed methods
discourse
voice
collocation
statistics
segmentation
parallel
coding
concordaner
ddl
pedagogy
language learning
analysis
crawler
parsing
markup
child language
CHILDES
cohesion
coherence
textual analysis
colligation
matching
vocabulary
lexis
web-based
concgram
python
conversational ana ...
social media
query
corerference
library
XML
JSON
exploration
searching
NER
topic models
word2vec
database
dialogues
collexeme
cleaning
annotations
tokenization
spoken
multilevel
multi-layer
rhetorical analysis
transcription
grammar
ESL/EFL
CEFR
lexical analysis
poem analysis
metaphor interpret ...
metaphor identific ...
semantics
metaphors
finnish
downloader
constructions
semantic parser
frequency
network analysis
graphs
ngrams
pattern matching
KWIC
temporal tagger
timex3
language detection
semantic tagger
ICE
multi-layer annotation
computer-assisted ...
stylometry
management
tokenizer
boilerplate remover
textual criticism
witnesses
patterns
comparison
correspondance ana ...
collocation analysis
frequency analysis
sociolinguistics
CADS
images
text mining
lexicometrics
information retrieval
lemmaizer
news
data
syntagmatic
slots
collocations
DDL
machine learning
ai-tagging
morphological tagger
syntax
style
statistical nlp
dependency parsing
MDA
sentence boundary ...
variation
dialectal data
historical
tagger
multilevel tagger
corpus creation
semantic analysis
duplicate remover
editing
regex
conversion
phonology
speech
prosody
phonetics
R
thesaurus
voabulary
meta modelling
tokenizing
kwic
r
sentence boundary
dictionary
text-processing
xml
trends patterns
collocates
word cloud
coocurence
NLP
diachronic analysis
term extraction
keyword extraction
bilingual term ext ...
pos
topic modeling
lexical sophistication
word clouds
variation analysis
phraseology
twitter
SPAADIA
efl
esl
linguistics
search tool
scraping
uralic
inflection
variant detector
reading
key phrases
ebooks
political science
close reading
vocabulary profiling
language teaching
distributional sem ...
indexing
chinese
ocr
text processing
publishing dictionary

Tools

Tool Description Categories Platform Pricing
@nnotateSemi-automatic annotation of corpus dataannotationSolaris, LinuxFree (with licence agreement)
ACTRES Corpus ManagerA corpus compilation and analysis platform with a focus on multilingual and parallel corpora.compilation, corpus management, annotation, multilingualWebCommercial
AMALGAMTool for grammatical annotation (POS and phrase structure). Tagging a text that was entered via email.annotationWebFree
ANVILA tool for video annoation.video, annotationWindows, Linux, MacFree
AtomicMulti-layer corpus annotation platform.annotationLinux, Mac, WindowsFree
BFSU Qualitative CoderA tool for manual coding of corporacoding, annotationWindowsFree
CATMA (Computer Assisted Text Markup and Analysis)An undogmatic, complex annotation and analysis packagemarkup, analysis, visualization, annotationWebFree
CorefAnnotatorAn annotation tool for coreference.corerference, annotationWindows, Linux, MacOpen Source
CorponaA Python library for processing XML- and JSON-based corpora.library, XML, JSON, annotationPythonOpen Source
DARTAn annotation tool and research environment for annotating dialogues.dialogues, annotationWindowsFree
DexterTool for text annotationannotationLinux, Mac, WindowsFree
DISCOCorpus pre-processing tool for a variety of languages that Dallows to retrieve the semantic similarity between arbitrary words and phrasestokenization, annotationWindows, Linux, Solaris, and MacOSFree
DisMoAn automatic multi-level annotator for spoken language corpora.spoken, multilevel, multi-layer, pos tagger, annotation, tagging
ELANTranscription and annotation of sound or video filestranscription, annotationLinux, Mac, WindowsFree
EmdrosA database engine fpr analyzed and annotated text.database, annotation, queryWindows, Linux, MacFree, Open Source
EXMARaLDATool for transcription, annotation, corpus analysis of spoken datatranscription, annotation, analysisFree
INCEpTIONA semantic annotation platform that offfers intelligent annotation assistance and knowledge managementannotation, multi-layer annotation, computer-assisted annotation, web-basedWebFree, Open Source
LightTagA commercial text annotation tool focused on managing and working with teams of annotators.annotation, tagging, ai-taggingWebCommerical
MMAX2A multi-level annotation toolannotation, multilevel, multi-layerJavaFree, Open Source
PACTEA flexible collaborative text annotation platform that is currently in development.annotationWebFree (for research)
PALinkAAnnotation toolannotationDown
PraalinePraaline is a system for metadata management, annotation, visualisation and analysis of spoken language corpora.speech, prosody, spoken, annotation, concordancer, search, visualization, converter, analysisWindows, Mac, LinuxFree / Open Source (GPL3)
RSTToolTool that can annotate texts for constituency and rhetorical structureannotationWindows, Macintosh, UNIX and LINUX Free
Sketch EngineA corpus manager and text analysis software developed by Lexical Computing.annotation, concordancer, tagging, sampling, search, visualization, wordlists, keywords, compilation, text analysis, n-grams, collocation, statistics, segmentation, analysis, crawler, parallel, colligation, annotations, tokenization, query, ngrams, boilerplate remover, comparison, frequency analysis, information retrieval, data, sentence boundary, corpus creation, duplicate remover, regex, thesaurus, meta modelling, dictionary, text-processing, xml, frequency, trends patterns, web-based, collocates, collocation analysis, word cloud, coocurence, KWIC, corpus management, multilingual, NLP, diachronic analysis, term extraction, keyword extraction, bilingual term extraction30-day free trial then starts at 4.83 €/month
SLATESLATE is a python-based CLI annotation tool. It is very lightweight and can be used for various types of span-based annotation.annotationPythonFree, Open Source
SPPASA tool for the automatic annotation and analysis of speech.speech, spoken, annotationWindows, Mac, LinuxFree, Open Source
SPreTool for segmenting and annotating textsannotationFree
SynpathyTool for manual syntactic annotationannotationWindows, Mac, LinuxFree
tagtogA text annotation tool specifically built to train AI/ML models.machine learning, annotationCloud-BasedCommerical
The Simple Corpus ToolA corpus analysis toolkit that supports XML annotations.concordancer, annotation, xml, frequencyWindowsFree
TreeTaggerTool for annotating text with part-of-speech and lemma informationpos tagger, annotationWindows, Mac, LinuxFree
UAM CorpusToolText annotation tool and statistics for various types of linguistic analysis and multilayer annotationannotation, multi-layer annotation, computer-assisted annotationFree
UAM ImageToolImage annotation tool for visual data corporaannotationFree
UBIAIA NLP-oriented text annotation platform for teams with comprehensive auto-annotation features.annotation, NLPWebCommerical
VideoAntA web-based tool to annotate and discuss web-hosted videos.annotation, videoWebFree
WebAnnoA web-based annotation toolannotation, web-basedWebFree
WebLichtWebLicht is an execution environment for automatic annotation of text corpora embedded with the CLARIN-D project.annotationWebFree (CLARIN-D Account needed)
WorldbuilderTool for annotation and visualisation in analysis applying text-world-theoryannotation, visualization
YEDDAYEDDA is a python-based collaborative text span annotation tool with support for a very wide variety of languages including Chinese.annotationPythonFree, Open Source
LexonomyA tool for writing and publishing dictionaries and other dictionary-like things.dictionary, publishing dictionary, annotationWebFree