Tools for Corpus Linguistics

A hopefully comprehensive list of currently 280 tools used in corpus compilation and analysis.

This list is kept up to date by its users. Hence, please feel free to contribute by suggesting new tools.

You can also make suggestions, e.g., corrections, regarding individual tools by clicking the symbol. As this is a non-commercial side (side, side) project, checking and incorporating updates usually takes some time.

concordancer 47
annotation 43
visualization 29
tagging 20
text analysis 20
pos tagger 18
wordlists 16
statistics 12
compilation 11
keywords 11
collocation 10
qda 10
readability 8
lexis 8
parser 8
tokenizer 8
frequency analysis 7
language learning 6
analysis 6
web-based 6
mixed methods 6
spoken 6
crawler 5
language teaching 5
xml 5

There is also a comprehensive list of all tags in the database.

Tools [concordancer]

Tool Description Tags Platforms Pricing
aConCorde Multilingual concordance tool (English and Arabic)concordancerLinux, Mac, WindowsFree
AntConc Corpus analysis toolkitwordlists, concordancer, keywordsLinux, Mac, WindowsFree
AntPConc Corpus analysis toolkit designed for working with parallel corpora.wordlists, concordancerWindows, MacFree
BFSU ParaConc A parallel concordancerconcordancer, parallelWindowsFree
BFSU PowerConc A fairly powerful concordancerconcordancerWindowsFree
BNCWeb BNCweb is a web-based client program for searching and retrieving lexical, grammatical and textual data from the British National Corpus (BNC).analysis, concordancerWebFree
buzz A python-based linguistic analysis tool.parsing, concordancer, visualizationPythonFree, Open Source
CasualConc CasualConc is a concordance program that runs natively on macOS.concordancerOSXFree
CLiC A corpus tool to support the analysis of literary texts.concordancerWebFree
Collocate Tool for the extraction of concordances and collocationsconcordancerWindows35 USD
Concordance Randomizer A concordance randomizerconcordancerWindowsFree
Concordancer Online tool for frequency counts and text cloudsconcordancerWebFree
CorpKit An advanced modern corpus toolkit with an emphasis on visualization and annotated corpora.wordlists, parsing, concordancer, visualizationLinux, Mac, Windows (Python)Free
Corpus Presenter Tree tagger and corpus analysis softwarewordlists, parsing, concordancer, visualizationWindowsFree
gwic A very basic KWIC tool written in Go.concordancer, KWICWindows, Mac, LinuxOpen Source
HeidelGram Web-Based Tools Basic corpus analysis toolkit for the HeidelGram Corpuswordlists, concordancerWebFree
IMS Corpus Workbench Tool for sorting frequencies in corporawordlists, concordancerWeb and local versionFree
KAT Tool Grouping patterns based on search termspatterns, concordancerWindowsFree
KWords A tool for keyword identification and analysis.keywords, CADS, concordancer, collocation analysisWindows, Linux, MacFree
Lextutor Web Concordancers Web concordancers targeted towards DDLcollocations, concordancer, DDLWebFree
MLCT Tool for building and processing corporaconcordancer, sentence boundary detectorFree
MonoConc Esy Concordancing and text search tool that allows primary and secondary concordancingconcordancer, sentence boundary detectorFree for non-Commercial research
OpenConc Tool for concordancingconcordancerFree
ParaConc A bilingual/multilingual concordancerconcordancerNon-Free
PhraseContext Tool for wordlists, concordancing, collocation, TTR, wordlists, concordancer35€
Praaline Praaline is a system for metadata management, annotation, visualisation and analysis of spoken language corpora.speech, prosody, spoken, annotation, concordancer, search, visualization, converter, analysisWindows, Mac, LinuxFree / Open Source (GPL3)
PyXMLConc Concordancer for XML files with automatic tag and attribute detection.concordancerMulti (Python), WindowsFree, Open Source
Shinyconc ShinyConc is a framework for generating custom web-based concordancers and is written in R and R Shiny.concordancer, kwic, rOpen Source / RFree
Simple Concordance Program Tool for concordance and word listing that works with many languagesconcordancerWindows, MacFree
Sketch Engine A corpus manager and text analysis software developed by Lexical Computing.annotation, concordancer, tagging, sampling, search, visualization, wordlists, keywords, compilation, text analysis, n-grams, collocation, statistics, segmentation, analysis, crawler, parallel, colligation, annotations, tokenization, query, ngrams, boilerplate remover, comparison, frequency analysis, information retrieval, data, sentence boundary, corpus creation, duplicate remover, regex, thesaurus, meta modelling, dictionary, text-processing, xml, frequency, trends patterns, web-based, collocates, collocation analysis, word cloud, coocurence, KWIC, corpus management, multilingual, NLP, diachronic analysis, term extraction, keyword extraction, bilingual term extraction30-day free trial then starts at 4.83 €/month
Text Analysis Computing Tools (TACT) A simple, fairly old concordancer.concordancerCommercial
Textanz Language analysis program that produces frequency lists, word lists, parts of speech tags.wordlists, concordancer, pos tagger, dictionaryAny OSFree, Open Source
TextSTAT Tool for creation and manipulation of linguistic data from different languagescorpus creation, concordancerWindows, GNU/Linux und MacOSFree
The Prime Machine A user- and mobile-friendly corpus analysis toolkit (primarily concordancing) initially designed for English language teaching.concordancer, language teaching, wordlist, keywords, efl, eslMacOS, Window, iOS, AndroidFree
The Simple Corpus Tool A corpus analysis toolkit that supports XML annotations.concordancer, annotation, xml, frequencyWindowsFree
The SPAADIA concordancer A concordancer for the SPAADIA corpusconcordancer, SPAADIAWindowsFree
The Text Feature Analyser A tool for investigating textual features and various meassurestext analysis, concordancerWindowsFree
TXM XML & TEI compatible text analysis software based on TreeTagger, the CQP search engine and the R statistical environment.text analysis, concordancer, r, statistics, search tool, tokenizer, xmlWindows,Mac,Linux,TomcatFree
WConcord 3.0 A fully featured concordancerconcordancerFree
Wmatrix Tool for corpus analysis and comparison. Provides access to CLAWS and USAS.wordlists, concordancer, pos tagger, semantic tagger, keywords, web-basedWeb£50 per username per year
WordCruncher A tool for searching, studying, and analyzing digital texts and corpora. The tool has been tested for corpora up to a billion words.concordancer, wordlists, collocates, n-grams, keywords, key phrases, ebooksWindows, Mac, iOSFree
Wordsmith One of the most established corpus toolkits providing a variety of functionalityconcordancer, wordlists, statistics, keywordsWindows60€ per licence
Wordstatix Corpus analysis toolconcordancerFree
WordWanderer A web-based visualization/analysis tool which allows its users to "wander" a text.visualization, concordancerWebFree
Just the Word A simple web interface for BNC data concordancer, frequency analysis, BNCWebFree
Wordless An Integrated corpus tool With multilingual support for the study of language, literature, and translation.concordancer, text analysis, statistics, readabilityWindows, Mac, Linux, PythonFree, Open Source
CorpusMate A web-based, streamlined, and simplified language data analysis experience for younger learners.language learning, language teaching, concordancer, frequency analysis, patternWebFree

Last Updated: November 05, 2023.

In case you are interested, the data is also available in JSON format.