Tools for Corpus Linguistics

A comprehensive list of 188 tools used in corpus analysis.

Please feel free to contribute by suggesting new tools or by pointing out mistakes in the data.

Suggest a Tool

Tags

Everything
annotation
concordancer
parser
pos tagger
search
visualization
wordlists
compilation
text analysis
converter
n-grams
p-frames
lexical bundles
lexical frames
text complexity
collocation
statistics
segmentation
coding
concordaner
ddl
pedagogy
analysis
crawler
parallel
tagging
colligation
parsing
collocations
exploration
searching
database
dialogues
cleaning
annotations
tokenization
transcription
downloader
readability
semantic parser
word2vec
ngrams
pattern matching
temporal tagger
timex3
network analysis
semantic tagger
ICE
tokenizer
boilerplate remover
patterns
comparison
keywords
sociolinguistics
frequency analysis
lexis
lemmaizer
news
data
machine learning
morphological tagger
statistical nlp
MDA
sentence boundary ...
tagger
multilevel tagger
corpus creation
semantic analysis
duplicate remover
editing
vocabulary
constructions
regex
conversion
phonology
speech
prosody
spoken
phonetics
query
thesaurus
meta modelling
tokenizing
kwic
r
topic modeling
cohesion
lexical sophistication
word clouds
variation analysis
dictionary
text-processing
python
phraseology
xml
frequency
SPAADIA
efl
esl
linguistics
search tool
multi-layer
variant detector
reading
metaphor identific ...
metaphors
ebooks
political science
indexing
chinese
graphs
rhetorical analysis
textual criticism
witnesses
close reading
stylometry
management
twitter
web-based
coherence
lexical analysis
style
video
discourse
images
multilevel
qda
mixed methods
markup
anc
sampling
matching

Tools

Tool Description Categories Platform Pricing
aConCordeMultilingual concordance tool (English and Arabic)concordancerLinux, Mac, WindowsFree
AntConcCorpus analysis toolkitwordlists, concordancerLinux, Mac, WindowsFree
AntPConcCorpus analysis toolkit for files encoded with UTF-8wordlists, concordancerWindows, MacFree
BNCWebBNCweb is a web-based client program for searching and retrieving lexical, grammatical and textual data from the British National Corpus (BNC).analysis, concordancerWebFree
BSFU ParaConcA parallel concordancerconcordancer, parallelWindowsFree
BSFU PowerConcA fairly powerful concordancerconcordancerWindowsFree
CasualConcCasualConc is a concordance program that runs natively on Mac 10.9 or lateconcordancerOSXFree
CLiCA corpus tool to support the analysis of literary texts.concordancerWebFree
CollocateTool for the extraction of concordances and collocationsconcordancerWindows35 USD
Concordance RandomizerA concordance randomizerconcordancerWindowsFree
ConcordancerOnline tool for frequency counts and text cloudsconcordancerWebFree
CorpKitAn advanced modern corpus toolkit with an emphasis on visualization and annotated corpora.wordlists, parsing, concordancer, visualizationLinux, Mac, Windows (Python)Free
Corpus PresenterTree tagger and corpus analysis softwarewordlists, parsing, concordancer, visualizationWindowsFree
HeidelGram Web-Based ToolsBasic corpus analysis toolkit for the HeidelGram Corpuswordlists, concordancerWebFree
IMS Corpus WorkbenchTool for sorting frequencies in corporawordlists, concordancerWeb and local versionFree
KAT ToolGrouping patterns based on search termspatterns, concordancerWindowsFree
MLCTTool for building and processing corporaconcordancer, sentence boundary detectorFree
MonoConc EsyConcordancing and text search tool that allows primary and secondary concordancingconcordancer, sentence boundary detectorFree for non-commerical research
OpenConcTool for concordancingconcordancerFree
ParaConcA bilingual/multilingual concordancerconcordancerNon-Free
PhraseContextTool for wordlists, concordancing, collocation, TTR, wordlists, concordancer35€
PraalinePraaline is a system for metadata management, annotation, visualisation and analysis of spoken language corpora.speech, prosody, spoken, annotation, concordancer, search, visualization, converter, analysisWindows, Mac, LinuxFree / Open Source (GPL3)
PyXMLConcConcordancer for XML files with automatic tag and attribute detection.concordancerMulti (Python), WindowsFree, Open Source
ShinyconcShinyConc is a framework for generating custom web-based concordancers and is written in R and R Shiny.concordancer, kwic, rOpen Source / RFree
Simple Concordance ProgramTool for concordance and word listing that works with many languagesconcordancerWindows, MacFree
Text Analysis Computing Tools (TACT)A simple, fairly old concordancer.concordancerCommercial
TextanzLanguage analysis program that produces frequency lists, word lists, parts of speech tags.wordlists, concordancer, pos tagger, dictionaryAny OSFree, Open Source
TextSTATTool for creation and manipulation of linguistic data from different languagescorpus creation, concordancerWindows, GNU/Linux und MacOSFree
The Simple Corpus ToolA corpus analysis toolkit that supports XML annotations.concordancer, annotation, xml, frequencyWindowsFree
The SPAADIA concordancerA concordancer for the SPAADIA corpusconcordancer, SPAADIAWindowsFree
The Text Feature AnalyserA tool for investigating textual features and various meassurestext analysis, concordancerWindowsFree
TXMXML & TEI compatible text analysis software based on TreeTagger, the CQP search engine and the R statistical environment.text analysis, concordancer, r, statistics, search tool, tokenizer, xmlWindows,Mac,Linux,TomcatFree
WConcord 3.0A full featured concordancerconcordancerFree
WmatrixTool for corpus analysis and comparisonwordlists, concordancer, pos tagger, semantic taggerWeb£50 per username per year
WordCruncherA tool for analyzing ebooks.concordancer, frequency, ebooksWindows, Mac, iOSFree
WordsmithOne of the most established corpus toolkits providing a variety of functionalityconcordancer, wordlists, statisticsWindows60€ per licence
WordstatixCorpus analysis toolconcordancerFree