Tools for Corpus Linguistics

A comprehensive list of 262 tools used in corpus analysis.

Please feel free to contribute by suggesting new tools or by pointing out mistakes in the data.

Suggest a Tool


corpus management
pos tagger
text complexity
text analysis
lexical bundles
lexical frames
parser generator
mixed methods
language learning
child language
textual analysis
conversational ana ...
social media
topic models
rhetorical analysis
lexical analysis
poem analysis
metaphor interpret ...
metaphor identific ...
semantic parser
network analysis
pattern matching
temporal tagger
language detection
semantic tagger
multi-layer annotation
computer-assisted ...
boilerplate remover
textual criticism
correspondance ana ...
collocation analysis
frequency analysis
text mining
information retrieval
machine learning
morphological tagger
statistical nlp
dependency parsing
sentence boundary ...
dialectal data
multilevel tagger
corpus creation
semantic analysis
duplicate remover
meta modelling
sentence boundary
trends patterns
word cloud
diachronic analysis
term extraction
keyword extraction
bilingual term ext ...
topic modeling
lexical sophistication
word clouds
variation analysis
search tool
variant detector
key phrases
political science
close reading
vocabulary profiling
language teaching
distributional sem ...
text processing
publishing dictionary


Tool Description Categories Platform Pricing
@nnotateSemi-automatic annotation of corpus dataannotationSolaris, LinuxFree (with licence agreement)
aConCordeMultilingual concordance tool (English and Arabic)concordancerLinux, Mac, WindowsFree
ACTRES Corpus BrowserA tool for retrieving tagged information in more than one language.taggingWebCommerical
ACTRES Corpus ManagerA corpus compilation and analysis platform with a focus on multilingual and parallel corpora.compilation, corpus management, annotation, multilingualWebCommercial
ACTRES Rhetorical Movel TaggerA tool for tagging rhetorical moves.tagging, rhetoricsWebCommerical
almaneser / SALTASemantic Parser/POS Tagger for Englishparser, pos tagger, taggingFree (with licence agreement)
AMALGAMTool for grammatical annotation (POS and phrase structure). Tagging a text that was entered via email.annotationWebFree
AMesureA web-based system to analyse the reading complexity of French textstext complexity, readabilityWebFree
ANC2goA web service that allows users to create custom sub-corpora of the ANCANC, samplingWebFree
ANNISSearch and visualization tool for multi-layer linguistic corpora with diverse types of annotationsearch, visualizationWeb (or Linux, Mac, Windows)Free
AntCLAWSGUIFront-end interface for CLAWS taggerpos tagger, taggingWindowsFree
AntConcCorpus analysis toolkitwordlists, concordancer, keywordsLinux, Mac, WindowsFree
AntCorGenA freeware discipline-specific corpus creation tool.compilation, text analysisWindows, Mac, LinuxFree
AntFileConverterFreeware tool to convert PDF and Word (DOCX) files into plain textconverterWindows, MacFree
AntFileSplitterA freeware text file splitting tool.compilationWindows, Mac, LinuxFree
AntGramA freeware n-gram and p-frame (open-slot n-gram) generation tool.text analysis, n-grams, p-frames, lexical bundles, lexical framesWindows, Mac, LinuxFree
ANTLRANother Tool for Language Recognition is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.parser generatorLinux, Mac, WindowsFree, Open Source
AntMoverTool for text structure (moves) analysistext analysisWindowsFree
AntPConcCorpus analysis toolkit designed for working with parallel corpora.wordlists, concordancerWindows, MacFree
AntWordProfilerTool for profiling vocabulary level and text complexitytext complexityLinux, Mac, WindowsFree
ANVILA tool for video, annotationWindows, Linux, MacFree
ATLAS.tiA sophistaticated QDA software for mixed methods approachesqda, mixed methodsWindows, Mac, Android, iOSCommerical
AtomicMulti-layer corpus annotation platform.annotationLinux, Mac, WindowsFree
Authorial Voice Analyzer (AVA)A tool for the analysis of interactional metadiscourse features.discourse, voiceMacFree
BFSU CollocatorA collocation analysis toolkitcollocation, statisticsWindowsFree
BFSU English Sentence SegmenterA simple sentence segmentersegmentationWindowsFree
BFSU ParaConcA parallel concordancerconcordancer, parallelWindowsFree
BFSU PowerConcA fairly powerful concordancerconcordancerWindowsFree
BFSU Qualitative CoderA tool for manual coding of corporacoding, annotationWindowsFree
BFSU Sentence CollectorA pedagogic concordancerconcordaner, ddl, pedagogy, language learningWindowsFree
BFSU Stanford ParserA simple parserparserWindowsFree
BFSU Stanford POS Tagger (Light)A GUI for the Standford POS taggerpos tagger, taggingWindowsFree
BNCWebBNCweb is a web-based client program for searching and retrieving lexical, grammatical and textual data from the British National Corpus (BNC).analysis, concordancerWebFree
BootCatTool for crawling and compiling data from the web with a list of seed words.crawler, compilation
BowStatistical Language Modeling, Text Retrieval, Classification and Clusteringtext analysisUNIX, LinuxFree
buzzA python-based linguistic analysis tool.parsing, concordancer, visualizationPythonFree, Open Source
Calc: Corpus CalculatorA web-based tool to calculate basic corpus statistics, for example, comparing frequencies across corpora.statisticsWebFree
CasualConcCasualConc is a concordance program that runs natively on Mac 10.9 or lateconcordancerOSXFree
CATMA (Computer Assisted Text Markup and Analysis)An undogmatic, complex annotation and analysis packagemarkup, analysis, visualization, annotationWebFree
CEFRLexA web-based tool to analyse the lexical complexity of words in texts according to the CEFR scale in various languages.text complexity, readability, language learningWebFree
CharedTool for detecting the character encoding of a texttext analysisPython 2.6 or laterFree
Chi-Square and Log Likelihood CalculatorA simple tool for calculating Chi-squared and LLstatisticsWindowsFree
CLANA tool for searching and analyzing child language data in the CHAT transcription, wordlists, collocation, child language, CHILDESWindows, Mac, UnixFree, Open Source
CLaRKXML Based System For Corpora DevelopmentcompilationFree (with licence agreement)
CLAWS POS-TaggerCLAWS- POS Tagger pos tagger, taggingWebVia licence or in-house tagging at Lancaster
CLiCA corpus tool to support the analysis of literary texts.concordancerWebFree
COCA_MWU20 ColloGramA collocation analysis tool based on a COCA collocation family list.collocationWindowsFree
Coh-MetrixCoh-Metrix is a system for computing computational cohesion and coherence metrics for written and spoken texts.It allows readers, writers, educators, and researchers to instantly gauge the difficulty of written text for the target audience.cohesion, coherence, readability, textual analysisWebFree
Colligator 2.0A colligation query/analysis toolkitcolligationWindowsFree
CollocateTool for the extraction of concordances and collocationsconcordancerWindows35 USD
CoMOnA tooil for corpus matching analysismatchingWebFree
Compleat Lexical TutorA website featuring various tools and materials for data-driven language learning.vocabulary, language learning, lexis, web-based, ddlWebFree
ConcGramCoreA modern rewrite of ConcGram (Greaves 2005) that allows efficiently searching for concgrams.collocation, concgramWindowsOpen Source
Concordance RandomizerA concordance randomizerconcordancerWindowsFree
ConcordancerOnline tool for frequency counts and text cloudsconcordancerWebFree
ConvoKitA toolkit for extracting conversational features and analyzing social phenomena in conversations, using an interface inspired by (and compatible with) scikit-learn.python, conversational analysis, social mediaPythonFree, Open Source
CoqueryA free corpus query tool to search, analyze, and visualize corporaquery, visualizationLinux, Mac, WindowsFree
CorefAnnotatorAn annotation tool for coreference.corerference, annotationWindows, Linux, MacOpen Source
CorpKitAn advanced modern corpus toolkit with an emphasis on visualization and annotated corpora.wordlists, parsing, concordancer, visualizationLinux, Mac, Windows (Python)Free
CorponaA Python library for processing XML- and JSON-based corpora.library, XML, JSON, annotationPythonOpen Source
CorporaCoCoA set of R functions used to compare co-occurrence between corporacollocationRFree
Corpus PresenterTree tagger and corpus analysis softwarewordlists, parsing, concordancer, visualizationWindowsFree
Corpus-ToolsText annotation and analysis tooltext analysisFree
CorpusExplorerA complex corpus analysis toolkit combining 45 interactive tools.visualization, exploration, tagging, text analysisWindowsFree, Open Source
CorpusSearchLiteSearches parsed corpora in the Penn Treebank formatsearching
Cortext ManagerA scriptable "ecosystem" for modeling and exploring corpora. Especially useful for creating topic models and co-occurence networks.NER, topic models, visualization, word2vec, collocation, keywordsWebFree
CPQWebOverview of and access to a wide range of corporadatabaseWebFree (once registered)
DARTAn annotation tool and research environment for annotating dialogues.dialogues, annotationWindowsFree
DepClusterA tool used for lexeme-based collexeme analysis.lexis, collexeme
DeTagging ToolA tool that strips annotation/tags from filescleaning, annotationsWindowsFree
DexterTool for text annotationannotationLinux, Mac, WindowsFree
DISCOCorpus pre-processing tool for a variety of languages that Dallows to retrieve the semantic similarity between arbitrary words and phrasestokenization, annotationWindows, Linux, Solaris, and MacOSFree
DisMoAn automatic multi-level annotator for spoken language corpora.spoken, multilevel, multi-layer, pos tagger, annotation, tagging
DocuScopeA tool for computer-aided rhetorical anyalysisrhetorical analysis, text analysis, visualizationWindows (Java)Free
ELANTranscription and annotation of sound or video filestranscription, annotationLinux, Mac, WindowsFree
EmdrosA database engine fpr analyzed and annotated text.database, annotation, queryWindows, Linux, MacFree, Open Source
EncodeAntTool for the detection and conversion of character encodingsconverterWindows, MacFree
English Grammar ProfilerA CEFR grammar profiler for ESL/EFL.grammar, parsing, ESL/EFL, CEFRWebFree
EXMARaLDATool for transcription, annotation, corpus analysis of spoken datatranscription, annotation, analysisFree
f4analyseQDA software specifically geared towards interview (spoken) dataqda, spokenWindows, Mac, LinuxCommerical
f4transkriptSoftware for transcribing audio datatranscription, spokenWindows, Max, LinuxCommercial
FinMeterA tool for analyzing Finnish poetry in terms of meter, rhyme, semantics, metaphors etc.lexical analysis, rhetorical analysis, poem analysis, metaphor interpretation, metaphor identification, semantics, metaphors, finnishLinux, Mac, WindowsFree
FireAntSocial media analysis toolkitdownloader, converterWindows, MacFree
FLAIR (2.0)An online tool for language teachers and learners that analyzes grammatical constructions and readability on the fly.constructions, readabilityWebFree
Flesh PCCalculating Flesh-scoresreadability, statisticsWindowsFree
FrameNetDictionary of more than 10,000 word senses, tagged for semantic roles (according to Fillmorean Frame Semantics)semantic parserWebFree
Frequency Program (Paul Nation)A tool that turns a text or texts into a word list with frequency figures.vocabulary, frequency, lexisWindowsFree
gensimDeep learning via word2vecword2vecMulti (Python)Free, Open Source
GephiA toolkit for network analysisnetwork analysis, graphsWindows, Linux, MacFree
GOLD Parsing SystemA parsing system that can be used to develop programming languages, scripting languages and interpreters.parser generatorLinux, Mac, WindowsFree
Google NgramsAn ngram-viewer for the whole of Google BooksngramsWebFree
GraphCollTool for building and exploring networks of linguistic collocationsvisualizationWindows, MacFree
GsearchTool for syntactic pattern matchingpattern matching?Down
gwicA very basic KWIC tool written in Go.concordancer, KWICWindows, Mac, LinuxOpen Source
HeidelGram Web-Based ToolsBasic corpus analysis toolkit for the HeidelGram Corpuswordlists, concordancerWebFree
HeidelTimeA multilingual, domain-sensitive temporal taggertemporal tagger, timex3JavaFree, Open Source
HeimdallA tool that searches a text for sequences written in other languages.language detectionLinux, Windows, MacOpen Source
HGSimpleCorpusNetworkBatch frequency analysis on corrupted (e.g. OCR) corpus data and generation of network analysis data.wordlists, network analysisMulti (Python)Free, Open Source
HTST SamuelsHistorical Thesaurus Semantic Tagger via web-interfacesemantic taggerWebFree
ICARUSSearch and visualization tool for dependency treesvisualizationFree
ICECUPThe ICE Corpus Utility Program (ICECUP) is a corpus exploration tools for parsed corpora such as ICE-GB and DCPSE.ICE, explorationFree
ICEwebA tool for compiling, downloading, and analyzing web corpora in accordance with the ICEICE, compilation, crawlerWindowsFree
IMS Corpus WorkbenchTool for sorting frequencies in corporawordlists, concordancerWeb and local versionFree
INCEpTIONA semantic annotation platform that offfers intelligent annotation assistance and knowledge managementannotation, multi-layer annotation, computer-assisted annotation, web-basedWebFree, Open Source
Intelligent ArchiveManaging corpora for stylometrystylometry, managementWindows, Unix, Linux, MacFree
JavaCCA popular parser generator for use with Java applications.parser generatorLinux, Mac, WindowsFree
jTokenizerTokenizing natural languagetokenizerFree
JusTextTool for removing boilerplate content, such as navigation links, headers, and footers from HTML pagesboilerplate removerPythonFree
juxtaComparing and collating multiple witnesses to single textual workstextual criticism, witnessesWindows, Unix, Linux, MacFree
KaleidographicA dynamic and interactive visualization tool for multivariate data.visualizationWebFree
KAT ToolGrouping patterns based on search termspatterns, concordancerWindowsFree
kdiff3KDiff3 is a diff and merge program.comparisonWindows, Linux, OSXFree, Open Source
Keyword PlusA keyword generation/analysis toolkeywordsWindowsFree
kfNgramA simple tool for generating n-gramsn-grams, p-framesWindowsFree
KHCoderA free software for quantitative content analysis or text mining that supports multiple languages.correspondance analysis, collocation analysis, frequency analysisWindows, Mac, LinuxFree, Open Source
KhepriA view-based toolfor exploring (historical sociolinguistic) datasociolinguistics, visualizationJavaScript, WebFree, Open Source
KoGra-RAn R-based online tool that provides statistical measures for corpus-based frequenciesstatistics, frequency analysisWebFree
KorAPA complex platform for corpus analysis developed at the IDS in Mannheimanalysis, multilevel, multi-layerWebFree, Open Source
KWordsA tool for keyword identification and analysis.keywords, CADS, concordancer, collocation analysisWindows, Linux, MacFree
LancsBoxThe Lancaster Desktop Corpus Toolbox; Software package for the analysis of language data and corporacollocation, frequency analysis, keywordsJavaFree (CC)
langid.pyA standalone language identification tool written in Python.language detectionLinux, Windows, MacOpen Source
LDA-ToolkitA toolkit for linguistic discourse and image analysis.discourse, imagesWindowsFree
Leipzig Corpus MinerA modern text mining infrastructure for qualitative data analysisqda, mixed methods, text mining, lexicometrics, topic models, information retrievalLinux, Windows, Mac (via VM)Free
LEXAA complex lemmatizer.lexis, lemmaizerFree
LexisNexisA database containing (new and old) news articles. They also have other (business), dataWebCommercial
lexpanA tool to analyze syntagmatic structures in corpora. Especially useful to analyze fillers and slots.syntagmatic, slotsWindows, Linux, MacFree
Lextutor Web ConcordancersWeb concordancers targeted towards DDLcollocations, concordancer, DDLWebFree
LightSideA machine learning workbench.machine learningLinux, WindowsFree, Open Source
LightTagA commercial text annotation tool focused on managing and working with teams of annotators.annotation, tagging, ai-taggingWebCommerical
LinguisticaWord segmentation and morphological analysis?segmentation, morphological taggerLinux, Mac, WindowsFree
Link Grammar ParserA syntactic parser of English, Russian, Arabic and Persian (and others), based on Link Grammar.parser, syntax, grammarLinux, Mac, WindowsFree
LIWCA tool that tries to compute scores for different emotions, thinkings styles, and social concerns.lexical analysis, styleWebFree (but commerical)
Log-Likelihood and Effect-Size CalculatorAn online calculator for log-likelihoof and effect sizes.statisticsWebFree
MALLETPackage for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to textstatistical nlpWindowsFree
MaltOptimizerA system for parser optimization using the open-source system MaltParser.parser, dependency parsingWindows, Mac, LinuxFree
MaltParser A system for data-driven dependency parsing, which can be used to induce a parsing model from treebank data and to parse new data using an induced model. parser, dependency parsingWindows, Mac, LinuxFree
MAT - Multidemensional Analysis TaggerA tagger for MDA (Biber et al.) by Andrea Nini.tagging, MDAWindows, MacFree
MAXQDASophisticated QDA software that works with multimodal data and supports mixed methods approachesqda, mixed methodsWindows, Mac, Android, iOSCommerical
MLCTTool for building and processing corporaconcordancer, sentence boundary detectorFree
MMAX2A multi-level annotation toolannotation, multilevel, multi-layerJavaFree, Open Source
MonoConc EsyConcordancing and text search tool that allows primary and secondary concordancingconcordancer, sentence boundary detectorFree for non-commerical research
MorphAdornerTool for performing morphological tagging of textsmorphological taggerFree
MurreA tool for normalising and generating dialectal Finnish and Swedishpython, variation, dialectal data, finnishLinux, Mac, WindowsFree
N-Gram Processor (NGP)A perl based tool for the creation and processing of n-gram lists out of text files.n-gramsLinux, Windows, MacOpen Source
NATASA spacy-based library for processing historical corpora (with a focus on neologisms).historical, python, lexisLinux, Windows, MacOpen Source
Natural Language ToolkitPlatform for building Python programs to work with human language datatokenizer, taggerUnix, Mac, Windows (+Python 3.4)Free
NooJTags texts and corpora (i.e. sets of text files) at the Orthographical, Lexical, Morphological, Syntactic and Semantic levelsmultilevel taggerWindows, Mac, LINUX and BSD UnixFree
NoSketch EngineWord sketches, thesaurus, keyword computation, corpus creationcorpus creation, semantic analysis, wordlistsFree
NVIVOA commercial Computer-Assisted Qualitative Data Analysis Software (CAQDAS) software that works with both qualitative and mixed methods dataqda, mixed methodsWindows, MacCommercial
OnionTool for removing duplicate parts from large collections of textsduplicate removerFree
Online Graded Text EditorTool for profiling a text's vocabulary level and complexitytext analysis, editing, vocabularyOSX, WindowsFree
OpenConcTool for concordancingconcordancerFree
PACTEA flexible collaborative text annotation platform that is currently in development.annotationWebFree (for research)
PALinkAAnnotation toolannotationDown
ParaConcA bilingual/multilingual concordancerconcordancerNon-Free
PareidoscopePareidoscope is a collection of tools for determining the association between arbitrary linguistic structures, such as collocations, collostructions or between structures.collocation, constructionsFree
PatCountA pattern counting tool with powerful statistic capabilities and regex supportpatternsWindowsFree
Pattern BuilderA tool helping with regular expressions and PoS tagsregex, taggingWindowsFree
PepperConversion between linguistic formats, e.g. from TEI to ANNIS to Tiger XML to EXMARaLDA.conversionFree
Phonological CorpusTools (PCT)Phonological analysis on transcribed corporaphonologyMulti (Python)Free
PhraseContextTool for wordlists, concordancing, collocation, TTR, wordlists, concordancer35€
Pipoca (formerly openQDA)A web-based QDA softwareqda, mixed methodsWebFree, Open Source
PraalinePraaline is a system for metadata management, annotation, visualisation and analysis of spoken language corpora.speech, prosody, spoken, annotation, concordancer, search, visualization, converter, analysisWindows, Mac, LinuxFree / Open Source (GPL3)
PRAATA tool for doing phonetics by computerphonetics, spokenWindows, Mac, LinuxOpen Source
ProtAntTool for prototypical text analysiswordlistsWindows, MacFree
pysupersensetaggerAnalyses texts for MWE and supersenses.text analysisUnix, Mac (Python)Free
PyXMLConcConcordancer for XML files with automatic tag and attribute detection.concordancerMulti (Python), WindowsFree, Open Source
QDA MinerA commercial QDA tool for coding, annotating, retrieving and analyzing collections of documents and images.qda, mixed methods, text analysisWindowsCommercial
QualCoderQualCoder is free, open source software for qualitative data analysis.qda, text analysisLinux, Mac, WindowsFree, Open Source
QuantedaA python library used to study neologisms in historical English corpora.RLinux, Windows, MacOpen Source
Query Tool for the Edenburgh Associative ThesaurusA query tool for the EATquery, thesaurusWindowsFree
Range Program (formerly VocabProfiler) (Paul Nation)A tool for for analyzing the vocabulary load of texts.voabulary, lexisWindowsFree
RDQAAn R package for Qualitative Data Analysis (QDA).qdaWindows, Linux/FreeBSD, MacFree
Readability AnalyzerA tool for generating various readability statisticsreadability, statisticsWindowsFree
Readability WebfxA tool to check how easy or difficult (readability) a given text is.readabilityWebFree
RSTToolTool that can annotate texts for constituency and rhetorical structureannotationWindows, Macintosh, UNIX and LINUX Free
SaltMeta models for linguistic data.meta modellingFree
SarAntTool for batch search and replacingediting, searchingWindowsFree
SegmentAntTool for the segmentation of Japanese and Chinesesegmentation, tokenizingWindows, Mac, LinuxFree
ShinyconcShinyConc is a framework for generating custom web-based concordancers and is written in R and R Shiny.concordancer, kwic, rOpen Source / RFree
Simple Concordance ProgramTool for concordance and word listing that works with many languagesconcordancerWindows, MacFree
Sketch EngineA corpus manager and text analysis software developed by Lexical Computing.annotation, concordancer, tagging, sampling, search, visualization, wordlists, keywords, compilation, text analysis, n-grams, collocation, statistics, segmentation, analysis, crawler, parallel, colligation, annotations, tokenization, query, ngrams, boilerplate remover, comparison, frequency analysis, information retrieval, data, sentence boundary, corpus creation, duplicate remover, regex, thesaurus, meta modelling, dictionary, text-processing, xml, frequency, trends patterns, web-based, collocates, collocation analysis, word cloud, coocurence, KWIC, corpus management, multilingual, NLP, diachronic analysis, term extraction, keyword extraction, bilingual term extraction30-day free trial then starts at 4.83 €/month
SLATESLATE is a python-based CLI annotation tool. It is very lightweight and can be used for various types of span-based annotation.annotationPythonFree, Open Source
SoMaJoA tokenizer and sentence splitter for German and English web and social media texts.tokenizer, sentence boundary detectorLinux, Mac, WindowsFree, Open Source
SoMeWeTaA part-of-speech tagger with support for domain adaptation and external resources.tagging, pos, pos taggerLinux, Mac, WindowsFree, Open Source
SpiderLingSoftware for obtaining text from the web useful for building text corporacrawlerFree
SPPASA tool for the automatic annotation and analysis of speech.speech, spoken, annotationWindows, Mac, LinuxFree, Open Source
SPreTool for segmenting and annotating textsannotationFree
Stanford Log-linear POS TaggerPOS Tagger (with Penn Treebank Tagset) for English, Arabic, Chinese, Germanpos tagger, taggingFree
Stanford Topic Modeling ToolboxThe Stanford Topic Modeling Toolbox (TMT) allows users to perform topic modeling on texts imported from spreadsheets. It supports both LDA and labelled LDA.topic modelingJavaFree
Stylo for RTool for computational stylistic analysis (authorship attribution, genre analysis)text analysisFree
Sub-Corpus CreatorA tool for creating sub-corpora based on search searchs and metadatacompilationWindowsFree
SynpathyTool for manual syntactic annotationannotationWindows, Mac, LinuxFree
TAACOTAACO is a tool that calculates 150 indices of textual/lexical cohesion.cohesion, lexical sophisticationAllFree, Open Source
TAALESTAALES measures over 400 indices of lexical sophistication.lexical sophisticationMac, Linux, WindowsOpen Source
TagAntPart-of-speech tagging tool built on Tree Taggerpos tagger, taggingWindows, Mac, LinuxFree
TagCrowdA simple tool for generating tag/word clouds onlineword clouds, visualizationWebFree
tagtogA text annotation tool specifically built to train AI/ML models.machine learning, annotationCloud-BasedCommerical
TagxedoA tool for generating word clouds.word clouds, visualizationWebFree
TASX-AnnotatorTool for multilevel annotation and transcription of (multi-channel) video and audio data.multilevel tagger, transcriptionWindows, Mac, Linux, SolarisDown
Text Analysis Computing Tools (TACT)A simple, fairly old concordancer.concordancerCommercial
Text Variation ExplorerThe Text Variation Explorer TVE is a tool for exploring the effect of window size on various common linguistic measures. It visualizes these measures and allows for PCA/Cluster analysis.visualization, variation analysisJavaFree
Text Visualization BrowserA survey/gallery of text visualizationsvisualizationWebFree
TextanzLanguage analysis program that produces frequency lists, word lists, parts of speech tags.wordlists, concordancer, pos tagger, dictionaryAny OSFree, Open Source
TextArcA tool for visualizing the structure of texts.visualization
TextDirectoryTextDirectory is a tool for aggregating text files based on various filters and transformation functions.compilation, text-processing, pythonWindows, Linux, OSXFree, Open Source
TextplotA tool for mapping a document into a network of terms in order to visualize the topic structure.visualization, network analysisPythonFree, Open Source
TextplotA tool for converting documents into (semantic) networks based on KDE.semantics, network analysis, graphsLinux, Windows, MacOpen Source
TextSmith ToolsA tool for genre-informed phraseological profilesphraseology, segmentationWindowsFree
TextSTATTool for creation and manipulation of linguistic data from different languagescorpus creation, concordancerWindows, GNU/Linux und MacOSFree
The (Phonetic) Transcription EditorAn editor for creating phonetic transcriptionstranscriptionWindowsFree
The Great American Word MapperA visualization tool for the top 100,000 words used in American English twitter data.twitter, lexis, social mediaWebFree
The Simple Corpus ToolA corpus analysis toolkit that supports XML annotations.concordancer, annotation, xml, frequencyWindowsFree
The Simple PoS TaggerA simply PoS-tagger utilizing Perl Lingua::EN:Taggerpos tagger, taggingWindowsFree
The SPAADIA concordancerA concordancer for the SPAADIA corpusconcordancer, SPAADIAWindowsFree
The Text Feature AnalyserA tool for investigating textual features and various meassurestext analysis, concordancerWindowsFree
Thesaurus.comEnglish language thesaurus with links to English dictionary and translation sites.efl, esl, linguisticsNot sure, I'm not a programmer or geek.Free
TigerSearchTool for searching syntactically and POS-tagged corporasearch tool, pos taggerFree
TnT - Thorsten Brants's PoS TaggerA simple PoS-Taggerpos tagger, tagger, taggingWindows/UnixAvailable via Stanford
Tree Editor TrEd 2.0Graphical editor and viewer for tree-like structures.visualizationWindows, GNU/Linux und MacOSFree
TreeTaggerTool for annotating text with part-of-speech and lemma informationpos tagger, annotationWindows, Mac, LinuxFree
TurboParserMultilingual dependency parser with linear programmingparserFree
TwarcA command line tool (and Python library) for archiving Twitter JSONtwitter, social mediaPython, Windows, Linux, MacFree, Open Source
Tweet NLPTweet tokenizer, POS Tagger, hierarchical word clusters, and a dependency parser for tweets, along with annotated corpora and web-based annotation tools. Clusters: pos tagger, tokenizer, parserFree
TWINTA Twitter scraping tool written in Python that allows for scraping Tweets from Twitter profiles without using Twitter's API.twitter, social media, scrapingLinux, Windows, MacOpen Source
TXMXML & TEI compatible text analysis software based on TreeTagger, the CQP search engine and the R statistical environment.text analysis, concordancer, r, statistics, search tool, tokenizer, xmlWindows,Mac,Linux,TomcatFree
UAM CorpusToolText annotation tool and statistics for various types of linguistic analysis and multilayer annotationannotation, multi-layer annotation, computer-assisted annotationFree
UAM ImageToolImage annotation tool for visual data corporaannotationFree
UBIAIA NLP-oriented text annotation platform for teams with comprehensive auto-annotation features.annotation, NLPWebCommerical
UCS ToolkitA toolkit (libraries and scripts) for the statistical analysis of coocurence data.collocation, coocurence, statisticsR, PerlFree
UnitokTool that splits texts into tokenstokenizerFree
UralicNLPNLP tools (primarily) for Uralic languagesuralic, parser, pos tagger, tagging, inflection, morphological taggerLinux, Mac, WindowsFree
VARDSpelling variant detection and deletion in historical corpora (particularly EModE)variant detectorFree (with academic email)
VariAntTool for the detection of spelling variantsvariant detectorWindowsFree
VideoAntA web-based tool to annotate and discuss web-hosted videos.annotation, videoWebFree
Voyant ToolsA web-based reading/analysis toolkit for digital texts.reading, text analysis, visualization, trends patternsWebFree, Open Source
VU Amsterdam Metaphor Identification CorpusCorpus tool for metaphor identificationmetaphor identification, metaphorsWeb and local versionFree
WConcord 3.0A full featured concordancerconcordancerFree
WebAnnoA web-based annotation toolannotation, web-basedWebFree
WebLichtWebLicht is an execution environment for automatic annotation of text corpora embedded with the CLARIN-D project.annotationWebFree (CLARIN-D Account needed)
WmatrixTool for corpus analysis and comparison. Provides access to CLAWS and USAS.wordlists, concordancer, pos tagger, semantic tagger, keywords, web-basedWeb£50 per username per year
WordCruncherA tool for searching, studying, and analyzing digital texts and corpora. The tool has been tested for corpora up to a billion words.concordancer, wordlists, collocates, n-grams, keywords, key phrases, ebooksWindows, Mac, iOSFree
WordFishExtract political positions from text documents.political scienceRFree
WordHoardClose reading and scholarly analysis of deeply tagged textsclose readingWindows, Unix, Linux, MacFree
WordleA tool for generating word clouds.word clouds, visualizationWebFree
WordMapA simple web-based word-map / wordcloud generator.visualization, web-basedWebFree
WordscoresA tool (approach) to extract dimensional information from political textspolitical science, information retrievalFree
WordSiftA word cloud generator, with dynamic filters, links to images, and KWIC capabilities. Works with various types/formats of word lists. word cloud, vocabulary profiling, lexis, vocabulary, language teachingWebFree
WordsmithOne of the most established corpus toolkits providing a variety of functionalityconcordancer, wordlists, statistics, keywordsWindows60€ per licence
wordspaceAn R package for distributional semanticssemantics, distributional semantics, RRFree
WordstatixCorpus analysis toolconcordancerFree
WordWandererA web-based visualization/analysis tool which allows its users to "wander" a text.visualization, concordancerWebFree
WorldbuilderTool for annotation and visualisation in analysis applying text-world-theoryannotation, visualization
XairaIndexing and analysis of XML resources,indexing, xmlWindowsFree, Open Source
YACSI Chinese Tokeniser / PoS TaggerA Chinese tokenizer and PoS taggerchinese, tokenizer, pos taggerWindowsFree
YEDDAYEDDA is a python-based collaborative text span annotation tool with support for a very wide variety of languages including Chinese.annotationPythonFree, Open Source
RescribeRescribe is an OCR service/tool geared towards historical texts.ocrWindows, Linux, MacFree
Corpus Text ProcessorCorpus Text Processor is a downloadable application that provides batched operations for common corpus processing tasks such as encoding or standardization.compilation, corpus management, text processingWindows, MacFree, Open Source
SKELLA simple tool for language learners and teachers.language learning, language teachingWebFree
LexonomyA tool for writing and publishing dictionaries and other dictionary-like things.dictionary, publishing dictionary, annotationWebFree
OneClick TermsAn online term extractor with monolingual and bilingual term extraction capabilities.keywords, term extraction, bilingual term extractionWebFree (limited version), 4.83€ / month
TrafilaturaTrafilatura is a Python package and command-line tool which seamlessly downloads, parses, and scrapes web page data.corpus creation, python, R, compilation, crawler, boilerplate remover, data, xml, scrapingPythonFree, Open Source