Tools for Corpus Linguistics

A comprehensive list of 262 tools used in corpus analysis.

Please feel free to contribute by suggesting new tools or by pointing out mistakes in the data.

Suggest a Tool

Tags

Everything
annotation
concordancer
tagging
compilation
corpus management
multilingual
rhetorics
parser
pos tagger
text complexity
readability
ANC
sampling
search
visualization
wordlists
keywords
text analysis
converter
n-grams
p-frames
lexical bundles
lexical frames
parser generator
video
qda
mixed methods
discourse
voice
collocation
statistics
segmentation
parallel
coding
concordaner
ddl
pedagogy
language learning
analysis
crawler
parsing
markup
child language
CHILDES
cohesion
coherence
textual analysis
colligation
matching
vocabulary
lexis
web-based
concgram
python
conversational ana ...
social media
query
corerference
library
XML
JSON
exploration
searching
NER
topic models
word2vec
database
dialogues
collexeme
cleaning
annotations
tokenization
spoken
multilevel
multi-layer
rhetorical analysis
transcription
grammar
ESL/EFL
CEFR
lexical analysis
poem analysis
metaphor interpret ...
metaphor identific ...
semantics
metaphors
finnish
downloader
constructions
semantic parser
frequency
network analysis
graphs
ngrams
pattern matching
KWIC
temporal tagger
timex3
language detection
semantic tagger
ICE
multi-layer annotation
computer-assisted ...
stylometry
management
tokenizer
boilerplate remover
textual criticism
witnesses
patterns
comparison
correspondance ana ...
collocation analysis
frequency analysis
sociolinguistics
CADS
images
text mining
lexicometrics
information retrieval
lemmaizer
news
data
syntagmatic
slots
collocations
DDL
machine learning
ai-tagging
morphological tagger
syntax
style
statistical nlp
dependency parsing
MDA
sentence boundary ...
variation
dialectal data
historical
tagger
multilevel tagger
corpus creation
semantic analysis
duplicate remover
editing
regex
conversion
phonology
speech
prosody
phonetics
R
thesaurus
voabulary
meta modelling
tokenizing
kwic
r
sentence boundary
dictionary
text-processing
xml
trends patterns
collocates
word cloud
coocurence
NLP
diachronic analysis
term extraction
keyword extraction
bilingual term ext ...
pos
topic modeling
lexical sophistication
word clouds
variation analysis
phraseology
twitter
SPAADIA
efl
esl
linguistics
search tool
scraping
uralic
inflection
variant detector
reading
key phrases
ebooks
political science
close reading
vocabulary profiling
language teaching
distributional sem ...
indexing
chinese
ocr
text processing
publishing dictionary

Tools

Tool Description Categories Platform Pricing
aConCordeMultilingual concordance tool (English and Arabic)concordancerLinux, Mac, WindowsFree
AntConcCorpus analysis toolkitwordlists, concordancer, keywordsLinux, Mac, WindowsFree
AntPConcCorpus analysis toolkit designed for working with parallel corpora.wordlists, concordancerWindows, MacFree
BFSU ParaConcA parallel concordancerconcordancer, parallelWindowsFree
BFSU PowerConcA fairly powerful concordancerconcordancerWindowsFree
BNCWebBNCweb is a web-based client program for searching and retrieving lexical, grammatical and textual data from the British National Corpus (BNC).analysis, concordancerWebFree
buzzA python-based linguistic analysis tool.parsing, concordancer, visualizationPythonFree, Open Source
CasualConcCasualConc is a concordance program that runs natively on Mac 10.9 or lateconcordancerOSXFree
CLiCA corpus tool to support the analysis of literary texts.concordancerWebFree
CollocateTool for the extraction of concordances and collocationsconcordancerWindows35 USD
Concordance RandomizerA concordance randomizerconcordancerWindowsFree
ConcordancerOnline tool for frequency counts and text cloudsconcordancerWebFree
CorpKitAn advanced modern corpus toolkit with an emphasis on visualization and annotated corpora.wordlists, parsing, concordancer, visualizationLinux, Mac, Windows (Python)Free
Corpus PresenterTree tagger and corpus analysis softwarewordlists, parsing, concordancer, visualizationWindowsFree
gwicA very basic KWIC tool written in Go.concordancer, KWICWindows, Mac, LinuxOpen Source
HeidelGram Web-Based ToolsBasic corpus analysis toolkit for the HeidelGram Corpuswordlists, concordancerWebFree
IMS Corpus WorkbenchTool for sorting frequencies in corporawordlists, concordancerWeb and local versionFree
KAT ToolGrouping patterns based on search termspatterns, concordancerWindowsFree
KWordsA tool for keyword identification and analysis.keywords, CADS, concordancer, collocation analysisWindows, Linux, MacFree
Lextutor Web ConcordancersWeb concordancers targeted towards DDLcollocations, concordancer, DDLWebFree
MLCTTool for building and processing corporaconcordancer, sentence boundary detectorFree
MonoConc EsyConcordancing and text search tool that allows primary and secondary concordancingconcordancer, sentence boundary detectorFree for non-commerical research
OpenConcTool for concordancingconcordancerFree
ParaConcA bilingual/multilingual concordancerconcordancerNon-Free
PhraseContextTool for wordlists, concordancing, collocation, TTR, wordlists, concordancer35€
PraalinePraaline is a system for metadata management, annotation, visualisation and analysis of spoken language corpora.speech, prosody, spoken, annotation, concordancer, search, visualization, converter, analysisWindows, Mac, LinuxFree / Open Source (GPL3)
PyXMLConcConcordancer for XML files with automatic tag and attribute detection.concordancerMulti (Python), WindowsFree, Open Source
ShinyconcShinyConc is a framework for generating custom web-based concordancers and is written in R and R Shiny.concordancer, kwic, rOpen Source / RFree
Simple Concordance ProgramTool for concordance and word listing that works with many languagesconcordancerWindows, MacFree
Sketch EngineA corpus manager and text analysis software developed by Lexical Computing.annotation, concordancer, tagging, sampling, search, visualization, wordlists, keywords, compilation, text analysis, n-grams, collocation, statistics, segmentation, analysis, crawler, parallel, colligation, annotations, tokenization, query, ngrams, boilerplate remover, comparison, frequency analysis, information retrieval, data, sentence boundary, corpus creation, duplicate remover, regex, thesaurus, meta modelling, dictionary, text-processing, xml, frequency, trends patterns, web-based, collocates, collocation analysis, word cloud, coocurence, KWIC, corpus management, multilingual, NLP, diachronic analysis, term extraction, keyword extraction, bilingual term extraction30-day free trial then starts at 4.83 €/month
Text Analysis Computing Tools (TACT)A simple, fairly old concordancer.concordancerCommercial
TextanzLanguage analysis program that produces frequency lists, word lists, parts of speech tags.wordlists, concordancer, pos tagger, dictionaryAny OSFree, Open Source
TextSTATTool for creation and manipulation of linguistic data from different languagescorpus creation, concordancerWindows, GNU/Linux und MacOSFree
The Simple Corpus ToolA corpus analysis toolkit that supports XML annotations.concordancer, annotation, xml, frequencyWindowsFree
The SPAADIA concordancerA concordancer for the SPAADIA corpusconcordancer, SPAADIAWindowsFree
The Text Feature AnalyserA tool for investigating textual features and various meassurestext analysis, concordancerWindowsFree
TXMXML & TEI compatible text analysis software based on TreeTagger, the CQP search engine and the R statistical environment.text analysis, concordancer, r, statistics, search tool, tokenizer, xmlWindows,Mac,Linux,TomcatFree
WConcord 3.0A full featured concordancerconcordancerFree
WmatrixTool for corpus analysis and comparison. Provides access to CLAWS and USAS.wordlists, concordancer, pos tagger, semantic tagger, keywords, web-basedWeb£50 per username per year
WordCruncherA tool for searching, studying, and analyzing digital texts and corpora. The tool has been tested for corpora up to a billion words.concordancer, wordlists, collocates, n-grams, keywords, key phrases, ebooksWindows, Mac, iOSFree
WordsmithOne of the most established corpus toolkits providing a variety of functionalityconcordancer, wordlists, statistics, keywordsWindows60€ per licence
WordstatixCorpus analysis toolconcordancerFree
WordWandererA web-based visualization/analysis tool which allows its users to "wander" a text.visualization, concordancerWebFree