Tools for Corpus Linguistics

A hopefully comprehensive list of currently 283 tools used in corpus compilation and analysis.

This list is kept up to date by its users. Hence, please feel free to contribute by suggesting new tools.

You can also make suggestions, e.g., corrections, regarding individual tools by clicking the symbol. As this is a non-commercial side (side, side) project, checking and incorporating updates usually takes some time.

Suggest a Tool

Top 25 Tags

All Tags
concordancer 48
annotation 44
visualization 29
tagging 20
text analysis 20
pos tagger 18
wordlists 16
statistics 12
compilation 11
keywords 11
collocation 10
qda 10
tokenizer 8
readability 8
parser 8
lexis 8
frequency analysis 7
language learning 6
analysis 6
spoken 6
python 6
mixed methods 6
web-based 6
segmentation 5
language teaching 5

There is also a comprehensive list of all tags in the database.


Tools

Tool Description Tags Platforms Pricing
@nnotate Semi-automatic annotation of corpus dataannotationSolaris, LinuxFree (with licence agreement)
aConCorde Multilingual concordance tool (English and Arabic)concordancerLinux, Mac, WindowsFree
ACTRES Corpus Browser A tool for retrieving tagged information in more than one language.taggingWebCommercial
ACTRES Corpus Manager A corpus compilation and analysis platform with a focus on multilingual and parallel corpora.compilation, corpus management, annotation, multilingualWebCommercial
ACTRES Rhetorical Movel Tagger A tool for tagging rhetorical moves.tagging, rhetoricsWebCommercial
almaneser / SALTA Semantic Parser and PoS Tagger for Englishparser, pos tagger, taggingFree (with licence agreement)
AMALGAM Tool for grammatical annotation (PoS and phrase structure). Tagging a text that was entered via email.annotationWebFree
AMesure A web-based system to analyse the reading complexity of French textstext complexity, readabilityWebFree
ANC2go A web service that allows users to create custom sub-corpora of the ANCANC, samplingWebFree
ANNIS Search and visualization tool for multi-layer linguistic corpora with diverse types of annotationsearch, visualizationWeb (or Linux, Mac, Windows)Free
AntCLAWSGUI Front-end interface for CLAWS taggerpos tagger, taggingWindowsFree
AntConc Corpus analysis toolkitwordlists, concordancer, keywordsLinux, Mac, WindowsFree
AntCorGen A freeware discipline-specific corpus creation tool.compilation, text analysisWindows, Mac, LinuxFree
AntFileConverter Freeware tool to convert PDF and Word (DOCX) files into plain textconverterWindows, MacFree
AntFileSplitter A freeware text file splitting tool.compilationWindows, Mac, LinuxFree
AntGram A freeware n-gram and p-frame (open-slot n-gram) generation tool.text analysis, n-grams, p-frames, lexical bundles, lexical framesWindows, Mac, LinuxFree
ANTLR ANother Tool for Language Recognition is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.parser generatorLinux, Mac, WindowsFree, Open Source
AntMover Tool for text structure (moves) analysistext analysisWindowsFree
AntPConc Corpus analysis toolkit designed for working with parallel corpora.wordlists, concordancerWindows, MacFree
AntWordProfiler Tool for profiling vocabulary level and text complexitytext complexityLinux, Mac, WindowsFree
ANVIL A tool for video annoation.video, annotationWindows, Linux, MacFree
ATLAS.ti A sophistaticated QDA software for mixed methods approachesqda, mixed methodsWindows, Mac, Android, iOSCommercial
Atomic Multi-layer corpus annotation platform.annotationLinux, Mac, WindowsFree
Authorial Voice Analyzer (AVA) A tool for the analysis of interactional metadiscourse features.discourse, voiceMacFree
BFSU Collocator A collocation analysis toolkitcollocation, statisticsWindowsFree
BFSU ConcGram Lite A tool for retrieving bigrams with directional variations.bigrams, concgramsWindowsFree
BFSU English Sentence Segmenter A simple sentence segmentersegmentationWindowsFree
BFSU ParaConc A parallel concordancerconcordancer, parallelWindowsFree
BFSU PowerConc A fairly powerful concordancerconcordancerWindowsFree
BFSU Qualitative Coder A tool for manual coding of corporacoding, annotationWindowsFree
BFSU Sentence Collector A pedagogic concordancerconcordaner, ddl, pedagogy, language learningWindowsFree
BFSU Stanford Parser A simple parserparserWindowsFree
BFSU Stanford PoS Tagger (Light) A GUI for the Standford PoS taggerpos tagger, taggingWindowsFree
BNCWeb BNCweb is a web-based client program for searching and retrieving lexical, grammatical and textual data from the British National Corpus (BNC).analysis, concordancerWebFree
BootCat Tool for crawling and compiling data from the web with a list of seed words.crawler, compilation
Bow Statistical Language Modeling, Text Retrieval, Classification and Clusteringtext analysisUNIX, LinuxFree
buzz A python-based linguistic analysis tool.parsing, concordancer, visualizationPythonFree, Open Source
Calc: Corpus Calculator A web-based tool to calculate basic corpus statistics, for example, comparing frequencies across corpora.statisticsWebFree
CasualConc CasualConc is a concordance program that runs natively on macOS.concordancerOSXFree
CATMA (Computer Assisted Text Markup and Analysis) An undogmatic, complex annotation and analysis package.markup, analysis, visualization, annotationWebFree
CEFRLex A web-based tool to analyse the lexical complexity of words in texts according to the CEFR scale in various languages.text complexity, readability, language learningWebFree
Chared Tool for detecting the character encoding of a texttext analysisPython 2.6 or laterFree
Chi-Square and Log Likelihood Calculator A simple tool for calculating Chi-squared and LLstatisticsWindowsFree
CLAN A tool for searching and analyzing child language data in the CHAT transcription format.search, wordlists, collocation, child language, CHILDESWindows, Mac, UnixFree, Open Source
CLaRK XML Based System For Corpora DevelopmentcompilationFree (with licence agreement)
CLAWS PoS-Tagger The CLAWS part-of-speech tagger.pos tagger, taggingWebVia licence or in-house tagging at Lancaster
CLiC A corpus tool to support the analysis of literary texts.concordancerWebFree
COCA_MWU20 ColloGram A collocation analysis tool based on a COCA collocation family list.collocationWindowsFree
Coh-Metrix Coh-Metrix is a system for computing computational cohesion and coherence metrics for written and spoken texts. It allows readers, writers, educators, and researchers to instantly gauge the difficulty of written text for the target audience.cohesion, coherence, readability, textual analysisWebFree
Colligator 2.0 A colligation query/analysis toolkitcolligationWindowsFree
Collocate Tool for the extraction of concordances and collocationsconcordancerWindows35 USD
CoMOn A tooil for corpus matching analysismatchingWebFree
Compleat Lexical Tutor A website featuring various tools and materials for data-driven language learning.vocabulary, language learning, lexis, web-based, ddlWebFree
ConcGramCore A modern rewrite of ConcGram (Greaves 2005) that allows efficiently searching for concgrams.collocation, concgramWindowsOpen Source
Concordance Randomizer A concordance randomizerconcordancerWindowsFree
Concordancer Online tool for frequency counts and text cloudsconcordancerWebFree
ConvoKit A toolkit for extracting conversational features and analyzing social phenomena in conversations, using an interface inspired by (and compatible with) scikit-learn.python, conversational analysis, social mediaPythonFree, Open Source
Coquery A free corpus query tool to search, analyze, and visualize corporaquery, visualizationLinux, Mac, WindowsFree
CorefAnnotator An annotation tool for coreference.corerference, annotationWindows, Linux, MacOpen Source
CorpKit An advanced modern corpus toolkit with an emphasis on visualization and annotated corpora.wordlists, parsing, concordancer, visualizationLinux, Mac, Windows (Python)Free
Corpona A Python library for processing XML- and JSON-based corpora.library, XML, JSON, annotationPythonOpen Source
CorporaCoCo A set of R functions used to compare co-occurrence between corporacollocationRFree
Corpus Presenter Tree tagger and corpus analysis softwarewordlists, parsing, concordancer, visualizationWindowsFree
Corpus Text Processor Corpus Text Processor is a downloadable application that provides batched operations for common corpus processing tasks such as encoding or standardization.compilation, corpus management, text processingWindows, MacFree, Open Source
CorpusExplorer A complex corpus analysis toolkit combining 45 interactive tools.visualization, exploration, tagging, text analysisWindowsFree, Open Source
CorpusSearch Searches parsed corpora in the Penn Treebank formatsearching, penn treebank
Corpustools An R package for managing, querying, and analyzing texts.text analysis, RRFree, Open Source
Cortext Manager A scriptable "ecosystem" for modeling and exploring corpora. Especially useful for creating topic models and co-occurence networks.NER, topic models, visualization, word2vec, collocation, keywordsWebFree
CPQWeb Overview of and access to a wide range of corporadatabaseWebFree (once registered)
DART An annotation tool and research environment for annotating dialogues.dialogues, annotationWindowsFree
DepCluster A tool used for lexeme-based collexeme analysis.lexis, collexeme, CxG, LBCA
DeTagging Tool A tool that strips annotation/tags from files.cleaning, annotationsWindowsFree
Dexter Tool for text annotationannotationLinux, Mac, WindowsFree
DISCO Corpus pre-processing tool for a variety of languages that Dallows to retrieve the semantic similarity between arbitrary words and phrasestokenization, annotationWindows, Linux, Solaris, and MacOSFree
DisMo An automatic multi-level annotator for spoken language corpora.spoken, multilevel, multi-layer, pos tagger, annotation, tagging
DocuScope A tool for computer-aided rhetorical anyalysisrhetorical analysis, text analysis, visualizationWindows (Java)Free
ELAN Transcription and annotation of sound or video filestranscription, annotationLinux, Mac, WindowsFree
Emdros A database engine fpr analyzed and annotated text.database, annotation, queryWindows, Linux, MacFree, Open Source
EncodeAnt Tool for the detection and conversion of character encodingsconverterWindows, MacFree
English Grammar Profiler A CEFR grammar profiler for ESL/EFL.grammar, parsing, CEFR, esl, eflWebFree
EXMARaLDA Tool for transcription, annotation, corpus analysis of spoken datatranscription, annotation, analysisFree
f4analyse QDA software specifically geared towards interview (spoken) dataqda, spokenWindows, Mac, LinuxCommercial
f4transkript Software for transcribing audio datatranscription, spokenWindows, Max, LinuxCommercial
FinMeter A tool for analyzing Finnish poetry in terms of meter, rhyme, semantics, metaphors etc.lexical analysis, rhetorical analysis, poem analysis, metaphor interpretation, metaphor identification, semantics, metaphors, finnishLinux, Mac, WindowsFree
FireAnt Social media analysis toolkitdownloader, converterWindows, MacFree
FLAIR (2.0) An online tool for language teachers and learners that analyzes grammatical constructions and readability on the fly.constructions, readabilityWebFree
Flesh PC Calculating Flesh-scoresreadability, statisticsWindowsFree
FrameNet Dictionary of more than 10,000 word senses, tagged for semantic roles (according to Fillmorean Frame Semantics)semantic parserWebFree
Frequency Program (Paul Nation) A tool that turns a text or texts into a word list with frequency figures.vocabulary, frequency, lexisWindowsFree
gensim Deep learning via word2vecword2vecMulti (Python)Free, Open Source
Gephi A toolkit for network analysisnetwork analysis, graphsWindows, Linux, MacFree
GOLD Parsing System A parsing system that can be used to develop programming languages, scripting languages and interpreters.parser generatorLinux, Mac, WindowsFree
Google Ngrams An ngram-viewer for the whole of Google BooksngramsWebFree
GraphColl Tool for building and exploring networks of linguistic collocationsvisualizationWindows, MacFree
Gsearch Tool for syntactic pattern matchingpattern matching?Down
gwic A very basic KWIC tool written in Go.concordancer, KWICWindows, Mac, LinuxOpen Source
HeidelGram Web-Based Tools Basic corpus analysis toolkit for the HeidelGram Corpuswordlists, concordancerWebFree
HeidelTime A multilingual, domain-sensitive temporal taggertemporal tagger, timex3JavaFree, Open Source
Heimdall A tool that searches a text for sequences written in other languages.language detectionLinux, Windows, MacOpen Source
HGSimpleCorpusNetwork Batch frequency analysis on corrupted (e.g. OCR) corpus data and generation of network analysis data.wordlists, network analysisMulti (Python)Free, Open Source
HTST Samuels Historical Thesaurus Semantic Tagger via web-interfacesemantic taggerWebFree
ICARUS Search and visualization tool for dependency treesvisualizationFree
ICECUP The ICE Corpus Utility Program (ICECUP) is a corpus exploration tools for parsed corpora such as ICE-GB and DCPSE.ICE, explorationFree
ICEweb A tool for compiling, downloading, and analyzing web corpora in accordance with the ICEICE, compilation, crawlerWindowsFree
IMS Corpus Workbench Tool for sorting frequencies in corporawordlists, concordancerWeb and local versionFree
INCEpTION A semantic annotation platform that offfers intelligent annotation assistance and knowledge managementannotation, multi-layer annotation, computer-assisted annotation, web-basedWebFree, Open Source
Intelligent Archive Managing corpora for stylometrystylometry, managementWindows, Unix, Linux, MacFree
JavaCC A popular parser generator for use with Java applications.parser generatorLinux, Mac, WindowsFree
jTokenizer Tokenizing natural languagetokenizerFree
JusText Tool for removing boilerplate content, such as navigation links, headers, and footers from HTML pagesboilerplate removerPythonFree
juxta Comparing and collating multiple witnesses to single textual workstextual criticism, witnessesWindows, Unix, Linux, MacFree
Kaleidographic A dynamic and interactive visualization tool for multivariate data.visualizationWebFree
KAT Tool Grouping patterns based on search termspatterns, concordancerWindowsFree
kdiff3 KDiff3 is a diff and merge program.comparisonWindows, Linux, OSXFree, Open Source
Keyword Plus A keyword generation/analysis toolkeywordsWindowsFree
kfNgram A simple tool for generating n-gramsn-grams, p-framesWindowsFree
KHCoder A free software for quantitative content analysis or text mining that supports multiple languages.correspondence, collocation analysis, frequency analysisWindows, Mac, LinuxFree, Open Source
Khepri A view-based toolfor exploring (historical sociolinguistic) datasociolinguistics, visualizationJavaScript, WebFree, Open Source
KoGra-R An R-based online tool that provides statistical measures for corpus-based frequenciesstatistics, frequency analysisWebFree
KorAP A complex platform for corpus analysis developed at the IDS in Mannheimanalysis, multilevel, multi-layerWebFree, Open Source
KWords A tool for keyword identification and analysis.keywords, CADS, concordancer, collocation analysisWindows, Linux, MacFree
LancsBox The Lancaster Desktop Corpus Toolbox; Software package for the analysis of language data and corporacollocation, frequency analysis, keywordsJavaFree (CC)
langid.py A standalone language identification tool written in Python.language detectionLinux, Windows, MacOpen Source
LDA-Toolkit A toolkit for linguistic discourse and image analysis.discourse, imagesWindowsFree
Leipzig Corpus Miner A modern text mining infrastructure for qualitative data analysisqda, mixed methods, text mining, lexicometrics, topic models, information retrievalLinux, Windows, Mac (via VM)Free
LEXA A complex lemmatizer.lexis, lemmaizerFree
LexisNexis A database containing (new and old) news articles. They also have other (business) data.news, dataWebCommercial
Lexonomy A tool for writing and publishing dictionaries and other dictionary-like things.dictionary, publishing dictionary, annotationWebFree
lexpan A tool to analyze syntagmatic structures in corpora. Especially useful to analyze fillers and slots.syntagmatic, slotsWindows, Linux, MacFree
Lextutor Web Concordancers Web concordancers targeted towards DDLcollocations, concordancer, DDLWebFree
LightSide A machine learning workbench.machine learningLinux, WindowsFree, Open Source
LightTag A commercial text annotation tool focused on managing and working with teams of annotators.annotation, tagging, ai-taggingWebCommercial
Linguistica Word segmentation and morphological analysis?segmentation, morphological taggerLinux, Mac, WindowsFree
Link Grammar Parser A syntactic parser of English, Russian, Arabic and Persian (and others), based on Link Grammar.parser, syntax, grammarLinux, Mac, WindowsFree
LIWC A tool that tries to compute scores for different emotions, thinkings styles, and social concerns.lexical analysis, styleWebFree (but Commercial)
Log-Likelihood and Effect-Size Calculator An online calculator for log-likelihoof and effect sizes.statisticsWebFree
MALLET Package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to textstatistical nlpWindowsFree
MaltOptimizer A system for parser optimization using the open-source system MaltParser.parser, dependency parsingWindows, Mac, LinuxFree
MaltParser A system for data-driven dependency parsing, which can be used to induce a parsing model from treebank data and to parse new data using an induced model. parser, dependency parsingWindows, Mac, LinuxFree
MAT - Multidemensional Analysis Tagger A tagger for MDA (Biber et al.) by Andrea Nini.tagging, MDAWindows, MacFree
MAXQDA Sophisticated QDA software that works with multimodal data and supports mixed methods approachesqda, mixed methodsWindows, Mac, Android, iOSCommercial
MLCT Tool for building and processing corporaconcordancer, sentence boundary detectorFree
MMAX2 A multi-level annotation toolannotation, multilevel, multi-layerJavaFree, Open Source
MonoConc Esy Concordancing and text search tool that allows primary and secondary concordancingconcordancer, sentence boundary detectorFree for non-Commercial research
MorphAdorner Tool for performing morphological tagging of textsmorphological taggerFree
Murre A tool for normalising and generating dialectal Finnish and Swedishpython, variation, dialectal data, finnishLinux, Mac, WindowsFree
N-Gram Processor (NGP) A perl based tool for the creation and processing of n-gram lists out of text files.n-gramsLinux, Windows, MacOpen Source
NATAS A spacy-based library for processing historical corpora (with a focus on neologisms).historical, python, lexisLinux, Windows, MacOpen Source
Natural Language Toolkit Platform for building Python programs to work with human language datatokenizer, taggerUnix, Mac, Windows (+Python 3.4)Free
NooJ Tags texts and corpora (i.e. sets of text files) at the Orthographical, Lexical, Morphological, Syntactic and Semantic levelsmultilevel taggerWindows, Mac, LINUX and BSD UnixFree
NoSketch Engine Word sketches, thesaurus, keyword computation, corpus creationcorpus creation, semantic analysis, wordlistsFree
NVIVO A commercial Computer-Assisted Qualitative Data Analysis Software (CAQDAS) software that works with both qualitative and mixed methods dataqda, mixed methodsWindows, MacCommercial
OneClick Terms An online term extractor with monolingual and bilingual term extraction capabilities.keywords, term extraction, bilingual term extractionWebFree (limited version), 4.83€ / month
Onion Tool for removing duplicate parts from large collections of textsduplicate removerFree
Online Graded Text Editor Tool for profiling a text's vocabulary level and complexitytext analysis, editing, vocabularyOSX, WindowsFree
OpenConc Tool for concordancingconcordancerFree
PACTE A flexible collaborative text annotation platform that is currently in development.annotationWebFree (for research)
PALinkA Annotation toolannotationDown
ParaConc A bilingual/multilingual concordancerconcordancerNon-Free
Pareidoscope Pareidoscope is a collection of tools for determining the association between arbitrary linguistic structures, such as collocations, collostructions or between structures.collocation, constructionsFree
PatCount A pattern counting tool with powerful statistic capabilities and regex supportpatternsWindowsFree
Pattern Builder A tool helping with regular expressions and PoS tagsregex, taggingWindowsFree
Pepper Conversion between linguistic formats, e.g. from TEI to ANNIS to Tiger XML to EXMARaLDA.conversionFree
Phonological CorpusTools (PCT) Phonological analysis on transcribed corporaphonologyMulti (Python)Free
PhraseContext Tool for wordlists, concordancing, collocation, TTR, wordlists, concordancer35€
Pipoca (formerly openQDA) A web-based QDA softwareqda, mixed methodsWebFree, Open Source
Praaline Praaline is a system for metadata management, annotation, visualisation and analysis of spoken language corpora.speech, prosody, spoken, annotation, concordancer, search, visualization, converter, analysisWindows, Mac, LinuxFree / Open Source (GPL3)
PRAAT A tool for doing phonetics by computerphonetics, spokenWindows, Mac, LinuxOpen Source
ProtAnt Tool for prototypical text analysiswordlistsWindows, MacFree
pysupersensetagger Analyses texts for MWE and supersenses.text analysisUnix, Mac (Python)Free
PyXMLConc Concordancer for XML files with automatic tag and attribute detection.concordancerMulti (Python), WindowsFree, Open Source
QDA Miner A commercial QDA tool for coding, annotating, retrieving and analyzing collections of documents and images.qda, mixed methods, text analysisWindowsCommercial
QualCoder QualCoder is free, open source software for qualitative data analysis.qda, text analysisLinux, Mac, WindowsFree, Open Source
Quanteda A python library used to study neologisms in historical English corpora.RLinux, Windows, MacOpen Source
Query Tool for the Edenburgh Associative Thesaurus A query tool for the EATquery, thesaurusWindowsFree
Range Program (formerly VocabProfiler) (Paul Nation) A tool for for analyzing the vocabulary load of texts.voabulary, lexisWindowsFree
RDQA An R package for Qualitative Data Analysis (QDA).qdaWindows, Linux/FreeBSD, MacFree
Readability Analyzer A tool for generating various readability statisticsreadability, statisticsWindowsFree
Readability Webfx A tool to check how easy or difficult (readability) a given text is.readabilityWebFree
Rescribe Rescribe is an OCR service/tool geared towards historical texts.ocrWindows, Linux, MacFree
RSTTool Tool that can annotate texts for constituency and rhetorical structureannotationWindows, Macintosh, UNIX and LINUX Free
Salt Meta models for linguistic data.meta modellingFree
SarAnt Tool for batch search and replacingediting, searchingWindowsFree
SegmentAnt Tool for the segmentation of Japanese and Chinesesegmentation, tokenizingWindows, Mac, LinuxFree
Shinyconc ShinyConc is a framework for generating custom web-based concordancers and is written in R and R Shiny.concordancer, kwic, rOpen Source / RFree
Simple Concordance Program Tool for concordance and word listing that works with many languagesconcordancerWindows, MacFree
SKELL A simple tool for language learners and teachers.language learning, language teachingWebFree
Sketch Engine A corpus manager and text analysis software developed by Lexical Computing.annotation, concordancer, tagging, sampling, search, visualization, wordlists, keywords, compilation, text analysis, n-grams, collocation, statistics, segmentation, analysis, crawler, parallel, colligation, annotations, tokenization, query, ngrams, boilerplate remover, comparison, frequency analysis, information retrieval, data, sentence boundary, corpus creation, duplicate remover, regex, thesaurus, meta modelling, dictionary, text-processing, xml, frequency, trends patterns, web-based, collocates, collocation analysis, word cloud, coocurence, KWIC, corpus management, multilingual, NLP, diachronic analysis, term extraction, keyword extraction, bilingual term extraction30-day free trial then starts at 4.83 €/month
SLATE SLATE is a python-based CLI annotation tool. It is very lightweight and can be used for various types of span-based annotation.annotationPythonFree, Open Source
SoMaJo A tokenizer and sentence splitter for German and English web and social media texts.tokenizer, sentence boundary detectorLinux, Mac, WindowsFree, Open Source
SoMeWeTa A part-of-speech tagger with support for domain adaptation and external resources.tagging, pos, pos taggerLinux, Mac, WindowsFree, Open Source
SpiderLing Software for obtaining text from the web useful for building text corporacrawlerFree
SPPAS A tool for the automatic annotation and analysis of speech.speech, spoken, annotationWindows, Mac, LinuxFree, Open Source
SPre Tool for segmenting and annotating textsannotationFree
Stanford Log-linear POS Tagger PoS Tagger (with Penn Treebank Tagset) for English, Arabic, Chinese, Germanpos tagger, taggingFree
Stanford Topic Modeling Toolbox The Stanford Topic Modeling Toolbox (TMT) allows users to perform topic modeling on texts imported from spreadsheets. It supports both LDA and labelled LDA.topic modelingJavaFree
Stylo for R Tool for computational stylistic analysis (authorship attribution, genre analysis)text analysisFree
Sub-Corpus Creator A tool for creating sub-corpora based on search searchs and metadatacompilationWindowsFree
Synpathy Tool for manual syntactic annotationannotationWindows, Mac, LinuxFree
TAACO TAACO is a tool that calculates 150 indices of textual/lexical cohesion.cohesion, lexical sophisticationAllFree, Open Source
TAALES TAALES measures over 400 indices of lexical sophistication.lexical sophisticationMac, Linux, WindowsOpen Source
TagAnt Part-of-speech tagging tool built on Tree Taggerpos tagger, taggingWindows, Mac, LinuxFree
TagCrowd A simple tool for generating tag/word clouds onlineword clouds, visualizationWebFree
tagtog A text annotation tool specifically built to train AI/ML models.machine learning, annotationCloud-BasedCommercial
Tagxedo A tool for generating word clouds.word clouds, visualizationWebFree
TASX-Annotator Tool for multilevel annotation and transcription of (multi-channel) video and audio data.multilevel tagger, transcriptionWindows, Mac, Linux, SolarisDown
Text Analysis Computing Tools (TACT) A simple, fairly old concordancer.concordancerCommercial
Text Variation Explorer The Text Variation Explorer TVE is a tool for exploring the effect of window size on various common linguistic measures. It visualizes these measures and allows for PCA/Cluster analysis.visualization, variation analysisJavaFree
Text Visualization Browser A survey/gallery of text visualizationsvisualizationWebFree
Textanz Language analysis program that produces frequency lists, word lists, parts of speech tags.wordlists, concordancer, pos tagger, dictionaryAny OSFree, Open Source
TextArc A tool for visualizing the structure of texts.visualization
TextDirectory TextDirectory is a tool for aggregating text files based on various filters and transformation functions.compilation, text-processing, pythonWindows, Linux, OSXFree, Open Source
Textplot A tool for mapping a document into a network of terms in order to visualize the topic structure.visualization, network analysis, semantics, graphsPythonFree, Open Source
TextSmith Tools A tool for genre-informed phraseological profilesphraseology, segmentationWindowsFree
TextSTAT Tool for creation and manipulation of linguistic data from different languagescorpus creation, concordancerWindows, GNU/Linux und MacOSFree
The (Phonetic) Transcription Editor An editor for creating phonetic transcriptionstranscriptionWindowsFree
The Great American Word Mapper A visualization tool for the top 100,000 words used in American English twitter data.twitter, lexis, social mediaWebFree
The Prime Machine A user- and mobile-friendly corpus analysis toolkit (primarily concordancing) initially designed for English language teaching.concordancer, language teaching, wordlist, keywords, efl, eslMacOS, Window, iOS, AndroidFree
The Simple Corpus Tool A corpus analysis toolkit that supports XML annotations.concordancer, annotation, xml, frequencyWindowsFree
The Simple PoS Tagger A simply PoS-tagger utilizing Perl Lingua::EN:Taggerpos tagger, taggingWindowsFree
The SPAADIA concordancer A concordancer for the SPAADIA corpusconcordancer, SPAADIAWindowsFree
The Text Feature Analyser A tool for investigating textual features and various meassurestext analysis, concordancerWindowsFree
Thesaurus.com English language thesaurus with links to English dictionary and translation sites.efl, esl, linguisticsNot sure, I'm not a programmer or geek.Free
TigerSearch Tool for searching syntactically and PoS-tagged corporasearch tool, pos taggerFree
TnT - Thorsten Brants's PoS Tagger A simple PoS-Taggerpos tagger, tagger, taggingWindows/UnixAvailable via Stanford
Trafilatura Trafilatura is a Python package and command-line tool which seamlessly downloads, parses, and scrapes web page data.corpus creation, python, R, compilation, crawler, boilerplate remover, data, xml, scrapingPythonFree, Open Source
Tree Editor TrEd 2.0 Graphical editor and viewer for tree-like structures.visualizationWindows, GNU/Linux und MacOSFree
TreeTagger Tool for annotating text with part-of-speech and lemma informationpos tagger, annotationWindows, Mac, LinuxFree
TurboParser Multilingual dependency parser with linear programmingparserFree
Twarc A command line tool (and Python library) for archiving Twitter JSONtwitter, social mediaPython, Windows, Linux, MacFree, Open Source
Tweet NLP Tweet tokenizer, PoS Tagger, hierarchical word clusters, and a dependency parser for tweets, along with annotated corpora and web-based annotation tools. Clusters: http://www.cs.cmu.edu/~ark/TweetNLP/cluster_viewer.html pos tagger, tokenizer, parserFree
TWINT A Twitter scraping tool written in Python that allows for scraping Tweets from Twitter profiles without using Twitter's API.twitter, social media, scrapingLinux, Windows, MacOpen Source
TXM XML & TEI compatible text analysis software based on TreeTagger, the CQP search engine and the R statistical environment.text analysis, concordancer, r, statistics, search tool, tokenizer, xmlWindows,Mac,Linux,TomcatFree
UAM CorpusTool Text annotation tool and statistics for various types of linguistic analysis and multilayer annotationannotation, multi-layer annotation, computer-assisted annotationFree
UAM ImageTool Image annotation tool for visual data corporaannotationFree
UBIAI A NLP-oriented text annotation platform for teams with comprehensive auto-annotation features.annotation, NLPWebCommercial
UCREL Semantic Analysis System (USAS) An automatic semantic tagger for different languages (e.g., English, Chinese, Italian, Dutch, Portuguese, Spanish).semantic annotation, tagging, semanticsFree
UCS Toolkit A toolkit (libraries and scripts) for the statistical analysis of coocurence data.collocation, coocurence, statisticsR, PerlFree
Unitok An annotation-aware tokenizer that splits text into line-by-line tokens.tokenizerFree
UralicNLP NLP tools (primarily) for Uralic languagesuralic, parser, pos tagger, tagging, inflection, morphological taggerLinux, Mac, WindowsFree
VARD Spelling variant detection and deletion in historical corpora (particularly EModE)variant detectorFree (with academic email)
VariAnt Tool for the detection of spelling variantsvariant detectorWindowsFree
VideoAnt A web-based tool to annotate and discuss web-hosted videos.annotation, videoWebFree
Voyant Tools A web-based reading/analysis toolkit for digital texts.reading, text analysis, visualization, trends patternsWebFree, Open Source
VU Amsterdam Metaphor Identification Corpus Corpus tool for metaphor identificationmetaphor identification, metaphorsWeb and local versionFree
WConcord 3.0 A fully featured concordancerconcordancerFree
WebAnno A web-based annotation toolannotation, web-basedWebFree
WebLicht WebLicht is an execution environment for automatic annotation of text corpora embedded with the CLARIN-D project.annotationWebFree (CLARIN-D Account needed)
wiki2corpus The tool downloads Wikipedia and converts them into clean text files.wikipedia, web as corpusPythonFree
Wmatrix Tool for corpus analysis and comparison. Provides access to CLAWS and USAS.wordlists, concordancer, pos tagger, semantic tagger, keywords, web-basedWeb£50 per username per year
WordCruncher A tool for searching, studying, and analyzing digital texts and corpora. The tool has been tested for corpora up to a billion words.concordancer, wordlists, collocates, n-grams, keywords, key phrases, ebooksWindows, Mac, iOSFree
WordFish Extract political positions from text documents.political scienceRFree
WordHoard Close reading and scholarly analysis of deeply tagged textsclose readingWindows, Unix, Linux, MacFree
Wordle A tool for generating word clouds.word clouds, visualizationWebFree
WordMap A simple web-based word-map / wordcloud generator.visualization, web-basedWebFree
Wordscores A tool (approach) to extract dimensional information from political textspolitical science, information retrievalFree
WordSift A word cloud generator, with dynamic filters, links to images, and KWIC capabilities. Works with various types/formats of word lists. word cloud, vocabulary profiling, lexis, vocabulary, language teachingWebFree
Wordsmith One of the most established corpus toolkits providing a variety of functionalityconcordancer, wordlists, statistics, keywordsWindows60€ per licence
wordspace An R package for distributional semanticssemantics, distributional semantics, RRFree
Wordstatix Corpus analysis toolconcordancerFree
WordWanderer A web-based visualization/analysis tool which allows its users to "wander" a text.visualization, concordancerWebFree
Worldbuilder Tool for annotation and visualisation in analysis applying text-world-theoryannotation, visualization
Xaira A tool for indexing and analyzing XML resources.indexing, xmlWindowsFree, Open Source
YACSI Chinese Tokeniser / PoS Tagger A Chinese tokenizer and PoS taggerchinese, tokenizer, pos taggerWindowsFree
YEDDA YEDDA is a python-based collaborative text span annotation tool with support for a very wide variety of languages including Chinese.annotationPythonFree, Open Source
BabelNet A multilingual encyclopedic dictionary featuring a semantic network/ontology.dictionary, ontology, semantics, NLPWebFree
FLAX FLAX (Flexible Language Acquisition) is a set of tools and applications to automate the production and delivery of interactive digital language collections.language learning, language teaching, text analysisJava, MoodleFree, Open Source
Just the Word A simple web interface for BNC data concordancer, frequency analysis, BNCWebFree
Orange Data Mining An open source machine learning and data visualization platform based on workflows.text analysis, visualization, time seriesWindows, Unix, Linux, MacFree, Open Source
QualCoder An open source tool for qualitative data analysis that supports coding text and images.qda, annotationWindows, Mac, Linux, PythonFree, Open Source
TEITOK A web-based platform for viewing, creating, and editing corpora with rich textual mark-up and linguistic annotation.visualization, TEI, mark-up, annotationLinux, MacFree, Open Source
Wordless An Integrated corpus tool With multilingual support for the study of language, literature, and translation.concordancer, text analysis, statistics, readabilityWindows, Mac, Linux, PythonFree, Open Source
WebCorp Live A tool for accessing the Web as a corpus.web-as-a-corpusWebFree
CorpusMate A web-based, streamlined, and simplified language data analysis experience for younger learners.language learning, language teaching, concordancer, frequency analysis, patternWebFree
MetaPak A tool to assist metadiscourse analysis based on Hyland's framework.metadiscourseWindowsFree
NeoSCA A syntactic complexity analyzer for written English. It is a fork of L2SCA with various additional features.syntactic complexity, constituency parsing, pattern matching, tregex, command lineWindows, Mac, LinuxFree, Open Source
Sanchay An open source multi-purpose platform focused on South Asian languages.annotation, tagging, chunkingWindows, Linux Free, Open Source
LogosLink A tool for corpus management and ontological augmentation for discourse analysis.discourse analysis, corpus managementWindowsFree
Word Frequency Analyser A web-based tool for analyzing word frequencies that also produces frequency charts and word clouds.pos tagger, tokenizer, lemmatizer, frequency analysisWebFree
Discourse Analyzer An AI (LLM) powered platform for conducting discourse analysis.discourse analysis, llm, generative AIWebPaid
Turkish-English Learner Corpus – Error Tagging TELC is a lexical-error tagged learner corpus compiled in the Turkish setting. It features a web-based error tagging tool.learner corpus, error taggingWebFree
AutoSearch A cloud-based corpus query engine that supports the upload of corpora.concordancer, corpus query engineWebFree
Text-Fabric A Python library for processing corpora (especially based on ancient texts) as annotated graphs.graph model, annotation, pythonFree, Open Source

Last Updated: October 13, 2024.

In case you are interested, the data is also available in JSON format.