Tools for Corpus Linguistics

A comprehensive list of 228 tools used in corpus analysis.

Please feel free to contribute by suggesting new tools or by pointing out mistakes in the data.

Suggest a Tool

Tags

Everything
annotation
concordancer
parser
pos tagger
tagging
ANC
sampling
search
visualization
wordlists
keywords
compilation
text analysis
converter
n-grams
p-frames
lexical bundles
lexical frames
text complexity
video
qda
mixed methods
discourse
voice
collocation
statistics
segmentation
coding
concordaner
ddl
pedagogy
language learning
analysis
crawler
parallel
markup
cohesion
coherence
colligation
matching
concgram
parsing
exploration
searching
database
dialogues
lexis
collexeme
cleaning
annotations
tokenization
spoken
multilevel
multi-layer
rhetorical analysis
transcription
query
downloader
constructions
readability
semantic parser
word2vec
network analysis
graphs
ngrams
pattern matching
temporal tagger
timex3
language detection
semantic tagger
ICE
stylometry
management
tokenizer
boilerplate remover
textual criticism
witnesses
patterns
comparison
sociolinguistics
frequency analysis
images
text mining
lexicometrics
topic models
information retrieval
lemmaizer
news
data
syntagmatic
slots
machine learning
morphological tagger
lexical analysis
style
statistical nlp
MDA
sentence boundary ...
historical
python
tagger
multilevel tagger
corpus creation
semantic analysis
duplicate remover
editing
vocabulary
regex
conversion
phonology
speech
prosody
phonetics
R
thesaurus
meta modelling
tokenizing
kwic
r
topic modeling
lexical sophistication
word clouds
variation analysis
dictionary
text-processing
semantics
phraseology
twitter
social media
xml
frequency
SPAADIA
efl
esl
linguistics
search tool
scraping
variant detector
reading
metaphor identific ...
metaphors
web-based
ebooks
political science
close reading
indexing
chinese
corerference
pos
CADS
collocation analysis
voabulary
word cloud
vocabulary profiling
language teaching
correspondance ana ...
dependency parsing
syntax
grammar
parser generator
collocations
DDL
distributional sem ...
coocurence

Tools

Tool Description Categories Platform Pricing
@nnotateSemi-automatic annotation of corpus dataannotationSolaris, LinuxFree (with licence agreement)
AMALGAMTool for grammatical annotation (POS and phrase structure). Tagging a text that was entered via email.annotationWebFree
ANVILA tool for video annoation.video, annotationWindows, Linux, MacFree
AtomicMulti-layer corpus annotation platform.annotationLinux, Mac, WindowsFree
BFSU Qualitative CoderA tool for manual coding of corporacoding, annotationWindowsFree
DARTAn annotation tool and research environment for annotating dialogues.dialogues, annotationWindowsFree
DexterTool for text annotationannotationLinux, Mac, WindowsFree
DISCOCorpus pre-processing tool for a variety of languages that Dallows to retrieve the semantic similarity between arbitrary words and phrasestokenization, annotationWindows, Linux, Solaris, and MacOSFree
DisMoAn automatic multi-level annotator for spoken language corpora.spoken, multilevel, multi-layer, pos tagger, annotation, tagging
ELANTranscription and annotation of sound or video filestranscription, annotationLinux, Mac, WindowsFree
EmdrosA database engine fpr analyzed and annotated text.database, annotation, queryWindows, Linux, MacFree, Open Source
EXMARaLDATool for transcription, annotation, corpus analysis of spoken datatranscription, annotation, analysisFree
MMAX2A multi-level annotation toolannotation, multilevel, multi-layerJavaFree, Open Source
PALinkAAnnotation toolannotationDown
PraalinePraaline is a system for metadata management, annotation, visualisation and analysis of spoken language corpora.speech, prosody, spoken, annotation, concordancer, search, visualization, converter, analysisWindows, Mac, LinuxFree / Open Source (GPL3)
RSTToolTool that can annotate texts for constituency and rhetorical structureannotationWindows, Macintosh, UNIX and LINUX Free
SPPASA tool for the automatic annotation and analysis of speech.speech, spoken, annotationWindows, Mac, LinuxFree, Open Source
SPreTool for segmenting and annotating textsannotationFree
SynpathyTool for manual syntactic annotationannotationWindows, Mac, LinuxFree
The Simple Corpus ToolA corpus analysis toolkit that supports XML annotations.concordancer, annotation, xml, frequencyWindowsFree
TreeTaggerTool for annotating text with part-of-speech and lemma informationpos tagger, annotationWindows, Mac, LinuxFree
UAM CorpusToolText annotation tool and statistics for various types of linguistic analysis and multilayer annotationannotation, multi-layerFree
UAM ImageToolImage annotation tool for visual data corporaannotationFree
WebAnnoA web-based annotation toolannotation, web-basedWebFree
WebLichtWebLicht is an execution environment for automatic annotation of text corpora embedded with the CLARIN-D project.annotationWebFree (CLARIN-D Account needed)
WorldbuilderTool for annotation and visualisation in analysis applying text-world-theoryannotation, visualization
CorefAnnotatorAn annotation tool for coreference.corerference, annotationWindows, Linux, MacOpen Source