Tools for Corpus Linguistics

A comprehensive list of 256 tools used in corpus analysis.

Please feel free to contribute by suggesting new tools or by pointing out mistakes in the data.

Suggest a Tool


pos tagger
text analysis
lexical bundles
lexical frames
text complexity
mixed methods
language learning
textual analysis
rhetorical analysis
semantic parser
network analysis
pattern matching
temporal tagger
language detection
semantic tagger
boilerplate remover
textual criticism
frequency analysis
text mining
topic models
information retrieval
machine learning
morphological tagger
lexical analysis
statistical nlp
sentence boundary ...
multilevel tagger
corpus creation
semantic analysis
duplicate remover
meta modelling
topic modeling
lexical sophistication
word clouds
variation analysis
social media
search tool
multi-layer annotation
computer-assisted ...
variant detector
trends patterns
metaphor identific ...
key phrases
political science
close reading
collocation analysis
word cloud
vocabulary profiling
language teaching
correspondance ana ...
dependency parsing
parser generator
distributional sem ...
child language
corpus management
dialectal data
poem analysis
metaphor interpret ...
conversational ana ...


Tool Description Categories Platform Pricing
@nnotateSemi-automatic annotation of corpus dataannotationSolaris, LinuxFree (with licence agreement)
aConCordeMultilingual concordance tool (English and Arabic)concordancerLinux, Mac, WindowsFree
almaneser / SALTASemantic Parser/POS Tagger for Englishparser, pos tagger, taggingFree (with licence agreement)
AMALGAMTool for grammatical annotation (POS and phrase structure). Tagging a text that was entered via email.annotationWebFree
ANC2goA web service that allows users to create custom sub-corpora of the ANCANC, samplingWebFree
ANNISSearch and visualization tool for multi-layer linguistic corpora with diverse types of annotationsearch, visualizationWeb (or Linux, Mac, Windows)Free
AntCLAWSGUIFront-end interface for CLAWS taggerpos tagger, taggingWindowsFree
AntConcCorpus analysis toolkitwordlists, concordancer, keywordsLinux, Mac, WindowsFree
AntCorGenA freeware discipline-specific corpus creation tool.compilation, text analysisWindows, Mac, LinuxFree
AntFileConverterFreeware tool to convert PDF and Word (DOCX) files into plain textconverterWindows, MacFree
AntFileSplitterA freeware text file splitting tool.compilationWindows, Mac, LinuxFree
AntGramA freeware n-gram and p-frame (open-slot n-gram) generation tool.text analysis, n-grams, p-frames, lexical bundles, lexical framesWindows, Mac, LinuxFree
AntMoverTool for text structure (moves) analysistext analysisWindowsFree
AntPConcCorpus analysis toolkit designed for working with parallel corpora.wordlists, concordancerWindows, MacFree
AntWordProfilerTool for profiling vocabulary level and text complexitytext complexityLinux, Mac, WindowsFree
ANVILA tool for video, annotationWindows, Linux, MacFree
ATLAS.tiA sophistaticated QDA software for mixed methods approachesqda, mixed methodsWindows, Mac, Android, iOSCommerical
AtomicMulti-layer corpus annotation platform.annotationLinux, Mac, WindowsFree
Authorial Voice Analyzer (AVA)A tool for the analysis of interactional metadiscourse features.discourse, voiceMacFree
BFSU CollocatorA collocation analysis toolkitcollocation, statisticsWindowsFree
BFSU English Sentence SegmenterA simple sentence segmentersegmentationWindowsFree
BFSU Qualitative CoderA tool for manual coding of corporacoding, annotationWindowsFree
BFSU Sentence CollectorA pedagogic concordancerconcordaner, ddl, pedagogy, language learningWindowsFree
BFSU Stanford ParserA simple parserparserWindowsFree
BNCWebBNCweb is a web-based client program for searching and retrieving lexical, grammatical and textual data from the British National Corpus (BNC).analysis, concordancerWebFree
BootCatTool for crawling and compiling data from the web with a list of seed words.crawler, compilation
BowStatistical Language Modeling, Text Retrieval, Classification and Clusteringtext analysisUNIX, LinuxFree
BSFU ParaConcA parallel concordancerconcordancer, parallelWindowsFree
BSFU PowerConcA fairly powerful concordancerconcordancerWindowsFree
BSFU Stanford POS TaggerA PoS taggerpos tagger, taggingWindowsFree
CasualConcCasualConc is a concordance program that runs natively on Mac 10.9 or lateconcordancerOSXFree
CATMA (Computer Assisted Text Markup and Analysis)An undogmatic, complex annotation and analysis packagemarkup, analysis, visualization, annotationWebFree
CharedTool for detecting the character encoding of a texttext analysisPython 2.6 or laterFree
Chi-Square and Log Likelihood CalculatorA simple tool for calculating Chi-squared and LLstatisticsWindowsFree
CLaRKXML Based System For Corpora DevelopmentcompilationFree (with licence agreement)
CLAWS POS-TaggerCLAWS- POS Tagger pos tagger, taggingWebVia licence or in-house tagging at Lancaster
CLiCA corpus tool to support the analysis of literary texts.concordancerWebFree
Coh-MetrixCoh-Metrix is a system for computing computational cohesion and coherence metrics for written and spoken texts.It allows readers, writers, educators, and researchers to instantly gauge the difficulty of written text for the target audience.cohesion, coherence, readability, textual analysisWebFree
Colligator 2.0A colligation query/analysis toolkitcolligationWindowsFree
CollocateTool for the extraction of concordances and collocationsconcordancerWindows35 USD
CoMOnA tooil for corpus matching analysismatchingWebFree
ConcGramCoreA modern rewrite of ConcGram (Greaves 2005) that allows efficiently searching for concgrams.collocation, concgramWindowsOpen Source
Concordance RandomizerA concordance randomizerconcordancerWindowsFree
ConcordancerOnline tool for frequency counts and text cloudsconcordancerWebFree
CorpKitAn advanced modern corpus toolkit with an emphasis on visualization and annotated corpora.wordlists, parsing, concordancer, visualizationLinux, Mac, Windows (Python)Free
CorporaCoCoA set of R functions used to compare co-occurrence between corporacollocationRFree
Corpus PresenterTree tagger and corpus analysis softwarewordlists, parsing, concordancer, visualizationWindowsFree
Corpus-ToolsText annotation and analysis tooltext analysisFree
CorpusExplorerA complex corpus analysis toolkit combining 45 interactive tools.visualization, exploration, tagging, text analysisWindowsFree, Open Source
CorpusSearchLiteSearches parsed corpora in the Penn Treebank formatsearching
CPQWebOverview of and access to a wide range of corporadatabaseWebFree (once registered)
DARTAn annotation tool and research environment for annotating dialogues.dialogues, annotationWindowsFree
DepClusterA tool used for lexeme-based collexeme analysis.lexis, collexeme
DeTagging ToolA tool that strips annotation/tags from filescleaning, annotationsWindowsFree
DexterTool for text annotationannotationLinux, Mac, WindowsFree
DISCOCorpus pre-processing tool for a variety of languages that Dallows to retrieve the semantic similarity between arbitrary words and phrasestokenization, annotationWindows, Linux, Solaris, and MacOSFree
DisMoAn automatic multi-level annotator for spoken language corpora.spoken, multilevel, multi-layer, pos tagger, annotation, tagging
DocuScopeA tool for computer-aided rhetorical anyalysisrhetorical analysis, text analysis, visualizationWindows (Java)Free
ELANTranscription and annotation of sound or video filestranscription, annotationLinux, Mac, WindowsFree
EmdrosA database engine fpr analyzed and annotated text.database, annotation, queryWindows, Linux, MacFree, Open Source
EncodeAntTool for the detection and conversion of character encodingsconverterWindows, MacFree
EXMARaLDATool for transcription, annotation, corpus analysis of spoken datatranscription, annotation, analysisFree
f4analyseQDA software specifically geared towards interview (spoken) dataqda, spokenWindows, Mac, LinuxCommerical
f4transkriptSoftware for transcribing audio datatranscription, spokenWindows, Max, LinuxCommercial
FireAntSocial media analysis toolkitdownloader, converterWindows, MacFree
FLAIR (2.0)An online tool for language teachers and learners that analyzes grammatical constructions and readability on the fly.constructions, readabilityWebFree
Flesh PCCalculating Flesh-scoresreadability, statisticsWindowsFree
FrameNetDictionary of more than 10,000 word senses, tagged for semantic roles (according to Fillmorean Frame Semantics)semantic parserWebFree
gensimDeep learning via word2vecword2vecMulti (Python)Free, Open Source
GephiA toolkit for network analysisnetwork analysis, graphsWindows, Linux, MacFree
Google NgramsAn ngram-viewer for the whole of Google BooksngramsWebFree
GraphCollTool for building and exploring networks of linguistic collocationsvisualizationWindows, MacFree
GsearchTool for syntactic pattern matchingpattern matching?Down
HeidelGram Web-Based ToolsBasic corpus analysis toolkit for the HeidelGram Corpuswordlists, concordancerWebFree
HeidelTimeA multilingual, domain-sensitive temporal taggertemporal tagger, timex3JavaFree, Open Source
HeimdallA tool that searches a text for sequences written in other languages.language detectionLinux, Windows, MacOpen Source
HGSimpleCorpusNetworkBatch frequency analysis on corrupted (e.g. OCR) corpus data and generation of network analysis data.wordlists, network analysisMulti (Python)Free, Open Source
HTST SamuelsHistorical Thesaurus Semantic Tagger via web-interfacesemantic taggerWebFree
ICARUSSearch and visualization tool for dependency treesvisualizationFree
ICEwebA tool for compiling, downloading, and analyzing web corpora in accordance with the ICEICE, compilation, crawlerWindowsFree
IMS Corpus WorkbenchTool for sorting frequencies in corporawordlists, concordancerWeb and local versionFree
Intelligent ArchiveManaging corpora for stylometrystylometry, managementWindows, Unix, Linux, MacFree
jTokenizerTokenizing natural languagetokenizerFree
JusTextTool for removing boilerplate content, such as navigation links, headers, and footers from HTML pagesboilerplate removerPythonFree
juxtaComparing and collating multiple witnesses to single textual workstextual criticism, witnessesWindows, Unix, Linux, MacFree
KaleidographicA dynamic and interactive visualization tool for multivariate data.visualizationWebFree
KAT ToolGrouping patterns based on search termspatterns, concordancerWindowsFree
kdiff3KDiff3 is a diff and merge program.comparisonWindows, Linux, OSXFree, Open Source
Keyword PlusA keyword generation/analysis toolkeywordsWindowsFree
kfNgramA simple tool for generating n-gramsn-grams, p-framesWindowsFree
KhepriA view-based toolfor exploring (historical sociolinguistic) datasociolinguistics, visualizationJavaScript, WebFree, Open Source
KoGra-RAn R-based online tool that provides statistical measures for corpus-based frequenciesstatistics, frequency analysisWebFree
KorAPA complex platform for corpus analysis developed at the IDS in Mannheimanalysis, multilevel, multi-layerWebFree, Open Source
LancsBoxThe Lancaster Desktop Corpus Toolbox; Software package for the analysis of language data and corporacollocation, frequency analysis, keywordsJavaFree (CC)
langid.pyA standalone language identification tool written in Python.language detectionLinux, Windows, MacOpen Source
LDA-ToolkitA toolkit for linguistic discourse and image analysis.discourse, imagesWindowsFree
Leipzig Corpus MinerA modern text mining infrastructure for qualitative data analysisqda, mixed methods, text mining, lexicometrics, topic models, information retrievalLinux, Windows, Mac (via VM)Free
LEXAA complex lemmatizer.lexis, lemmaizerFree
LexisNexisA database containing (new and old) news articles. They also have other (business), dataWebCommercial
lexpanA tool to analyze syntagmatic structures in corpora. Especially useful to analyze fillers and slots.syntagmatic, slotsWindows, Linux, MacFree
LightSideA machine learning workbench.machine learningLinux, WindowsFree, Open Source
LinguisticaWord segmentation and morphological analysis?segmentation, morphological taggerLinux, Mac, WindowsFree
LIWCA tool that tries to compute scores for different emotions, thinkings styles, and social concerns.lexical analysis, styleWebFree (but commerical)
MALLETPackage for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to textstatistical nlpWindowsFree
MAT - Multidemensional Analysis TaggerA tagger for MDA (Biber et al.) by Andrea Nini.tagging, MDAWindows, MacFree
MAXQDASophisticated QDA software that works with multimodal data and supports mixed methods approachesqda, mixed methodsWindows, Mac, Android, iOSCommerical
MLCTTool for building and processing corporaconcordancer, sentence boundary detectorFree
MMAX2A multi-level annotation toolannotation, multilevel, multi-layerJavaFree, Open Source
MonoConc EsyConcordancing and text search tool that allows primary and secondary concordancingconcordancer, sentence boundary detectorFree for non-commerical research
MorphAdornerTool for performing morphological tagging of textsmorphological taggerFree
N-Gram Processor (NGP)A perl based tool for the creation and processing of n-gram lists out of text files.n-gramsLinux, Windows, MacOpen Source
NATASA spacy-based library for processing historical corpora (with a focus on neologisms).historical, python, lexisLinux, Windows, MacOpen Source
Natural Language ToolkitPlatform for building Python programs to work with human language datatokenizer, taggerUnix, Mac, Windows (+Python 3.4)Free
NooJTags texts and corpora (i.e. sets of text files) at the Orthographical, Lexical, Morphological, Syntactic and Semantic levelsmultilevel taggerWindows, Mac, LINUX and BSD UnixFree
NoSketch EngineWord sketches, thesaurus, keyword computation, corpus creationcorpus creation, semantic analysis, wordlistsFree
OnionTool for removing duplicate parts from large collections of textsduplicate removerFree
Online Graded Text EditorTool for profiling a text's vocabulary level and complexitytext analysis, editing, vocabularyOSX, WindowsFree
OpenConcTool for concordancingconcordancerFree
PALinkAAnnotation toolannotationDown
ParaConcA bilingual/multilingual concordancerconcordancerNon-Free
PareidoscopePareidoscope is a collection of tools for determining the association between arbitrary linguistic structures, such as collocations, collostructions or between structures.collocation, constructionsFree
PatCountA pattern counting tool with powerful statistic capabilities and regex supportpatternsWindowsFree
Pattern BuilderA tool helping with regular expressions and PoS tagsregex, taggingWindowsFree
PepperConversion between linguistic formats, e.g. from TEI to ANNIS to Tiger XML to EXMARaLDA.conversionFree
Phonological CorpusTools (PCT)Phonological analysis on transcribed corporaphonologyMulti (Python)Free
PhraseContextTool for wordlists, concordancing, collocation, TTR, wordlists, concordancer35€
Pipoca (formerly openQDA)A web-based QDA softwareqda, mixed methodsWebFree, Open Source
PraalinePraaline is a system for metadata management, annotation, visualisation and analysis of spoken language corpora.speech, prosody, spoken, annotation, concordancer, search, visualization, converter, analysisWindows, Mac, LinuxFree / Open Source (GPL3)
PRAATA tool for doing phonetics by computerphonetics, spokenWindows, Mac, LinuxOpen Source
ProtAntTool for prototypical text analysiswordlistsWindows, MacFree
pysupersensetaggerAnalyses texts for MWE and supersenses.text analysisUnix, Mac (Python)Free
PyXMLConcConcordancer for XML files with automatic tag and attribute detection.concordancerMulti (Python), WindowsFree, Open Source
QuantedaA python library used to study neologisms in historical English corpora.RLinux, Windows, MacOpen Source
Query Tool for the Edenburgh Associative ThesaurusA query tool for the EATquery, thesaurusWindowsFree
Readability AnalyzerA tool for generating various readability statisticsreadability, statisticsWindowsFree
Readability WebfxA tool to check how easy or difficult (readability) a given text is.readabilityWebFree
RSTToolTool that can annotate texts for constituency and rhetorical structureannotationWindows, Macintosh, UNIX and LINUX Free
SaltMeta models for linguistic data.meta modellingFree
SarAntTool for batch search and replacingediting, searchingWindowsFree
SegmentAntTool for the segmentation of Japanese and Chinesesegmentation, tokenizingWindows, Mac, LinuxFree
ShinyconcShinyConc is a framework for generating custom web-based concordancers and is written in R and R Shiny.concordancer, kwic, rOpen Source / RFree
Simple Concordance ProgramTool for concordance and word listing that works with many languagesconcordancerWindows, MacFree
SketchEngineWord sketches, thesaurus, keyword computation, corpus creationcorpus creation, semantic analysis, wordlists, keywords30 day trial or 4,85€/month
SpiderLingSoftware for obtaining text from the web useful for building text corporacrawlerFree
SPPASA tool for the automatic annotation and analysis of speech.speech, spoken, annotationWindows, Mac, LinuxFree, Open Source
SPreTool for segmenting and annotating textsannotationFree
Stanford Log-linear POS TaggerPOS Tagger (with Penn Treebank Tagset) for English, Arabic, Chinese, Germanpos tagger, taggingFree
Stanford Topic Modeling ToolboxThe Stanford Topic Modeling Toolbox (TMT) allows users to perform topic modeling on texts imported from spreadsheets. It supports both LDA and labelled LDA.topic modelingJavaFree
Stylo for RTool for computational stylistic analysis (authorship attribution, genre analysis)text analysisFree
Sub-Corpus CreatorA tool for creating sub-corpora based on search searchs and metadatacompilationWindowsFree
SynpathyTool for manual syntactic annotationannotationWindows, Mac, LinuxFree
TAACOTAACO is a tool that calculates 150 indices of textual/lexical cohesion.cohesion, lexical sophisticationAllFree, Open Source
TAALESTAALES measures over 400 indices of lexical sophistication.lexical sophisticationMac, Linux, WindowsOpen Source
TagAntPart-of-speech tagging tool built on Tree Taggerpos tagger, taggingWindows, Mac, LinuxFree
TagCrowdA simple tool for generating tag/word clouds onlineword clouds, visualizationWebFree
TagxedoA tool for generating word clouds.word clouds, visualizationWebFree
TASX-AnnotatorTool for multilevel annotation and transcription of (multi-channel) video and audio data.multilevel tagger, transcriptionWindows, Mac, Linux, SolarisDown
Text Analysis Computing Tools (TACT)A simple, fairly old concordancer.concordancerCommercial
Text Variation ExplorerThe Text Variation Explorer TVE is a tool for exploring the effect of window size on various common linguistic measures. It visualizes these measures and allows for PCA/Cluster analysis.visualization, variation analysisJavaFree
Text Visualization BrowserA survey/gallery of text visualizationsvisualizationWebFree
TextanzLanguage analysis program that produces frequency lists, word lists, parts of speech tags.wordlists, concordancer, pos tagger, dictionaryAny OSFree, Open Source
TextArcA tool for visualizing the structure of texts.visualization
TextDirectoryTextDirectory is a tool for aggregating text files based on various filters and transformation functions.compilation, text-processing, pythonWindows, Linux, OSXFree, Open Source
TextplotA tool for mapping a document into a network of terms in order to visualize the topic structure.visualization, network analysisPythonFree, Open Source
TextplotA tool for converting documents into (semantic) networks based on KDE.semantics, network analysis, graphsLinux, Windows, MacOpen Source
TextSmith ToolsA tool for genre-informed phraseological profilesphraseology, segmentationWindowsFree
TextSTATTool for creation and manipulation of linguistic data from different languagescorpus creation, concordancerWindows, GNU/Linux und MacOSFree
The (Phonetic) Transcription EditorAn editor for creating phonetic transcriptionstranscriptionWindowsFree
The Great American Word MapperA visualization tool for the top 100,000 words used in American English twitter data.twitter, lexis, social mediaWebFree
The Simple Corpus ToolA corpus analysis toolkit that supports XML annotations.concordancer, annotation, xml, frequencyWindowsFree
The Simple PoS TaggerA simply PoS-tagger utilizing Perl Lingua::EN:Taggerpos tagger, taggingWindowsFree
The SPAADIA concordancerA concordancer for the SPAADIA corpusconcordancer, SPAADIAWindowsFree
The Text Feature AnalyserA tool for investigating textual features and various meassurestext analysis, concordancerWindowsFree
Thesaurus.comEnglish language thesaurus with links to English dictionary and translation sites.efl, esl, linguisticsNot sure, I'm not a programmer or geek.Free
TigerSearchTool for searching syntactically and POS-tagged corporasearch tool, pos taggerFree
TnT - Thorsten Brants's PoS TaggerA simple PoS-Taggerpos tagger, tagger, taggingWindows/UnixAvailable via Stanford
Tree Editor TrEd 2.0Graphical editor and viewer for tree-like structures.visualizationWindows, GNU/Linux und MacOSFree
TreeTaggerTool for annotating text with part-of-speech and lemma informationpos tagger, annotationWindows, Mac, LinuxFree
TurboParserMultilingual dependency parser with linear programmingparserFree
TwarcA command line tool (and Python library) for archiving Twitter JSONtwitter, social mediaPython, Windows, Linux, MacFree, Open Source
Tweet NLPTweet tokenizer, POS Tagger, hierarchical word clusters, and a dependency parser for tweets, along with annotated corpora and web-based annotation tools. Clusters: pos tagger, tokenizer, parserFree
TWINTA Twitter scraping tool written in Python that allows for scraping Tweets from Twitter profiles without using Twitter's API.twitter, social media, scrapingLinux, Windows, MacOpen Source
TXMXML & TEI compatible text analysis software based on TreeTagger, the CQP search engine and the R statistical environment.text analysis, concordancer, r, statistics, search tool, tokenizer, xmlWindows,Mac,Linux,TomcatFree
UAM CorpusToolText annotation tool and statistics for various types of linguistic analysis and multilayer annotationannotation, multi-layer annotation, computer-assisted annotationFree
UAM ImageToolImage annotation tool for visual data corporaannotationFree
UnitokTool that splits texts into tokenstokenizerFree
VARDSpelling variant detection and deletion in historical corpora (particularly EModE)variant detectorFree (with academic email)
VariAntTool for the detection of spelling variantsvariant detectorWindowsFree
Voyant ToolsA web-based reading/analysis toolkit for digital texts.reading, text analysis, visualization, trends patternsWebFree, Open Source
VU Amsterdam Metaphor Identification CorpusCorpus tool for metaphor identificationmetaphor identification, metaphorsWeb and local versionFree
WConcord 3.0A full featured concordancerconcordancerFree
WebAnnoA web-based annotation toolannotation, web-basedWebFree
WebLichtWebLicht is an execution environment for automatic annotation of text corpora embedded with the CLARIN-D project.annotationWebFree (CLARIN-D Account needed)
WmatrixTool for corpus analysis and comparison. Provides access to CLAWS and USAS.wordlists, concordancer, pos tagger, semantic tagger, keywords, web-basedWeb£50 per username per year
WordCruncherA tool for searching, studying, and analyzing digital texts and corpora. The tool has been tested for corpora up to a billion words.concordancer, wordlists, collocates, n-grams, keywords, key phrases, ebooksWindows, Mac, iOSFree
WordFishExtract political positions from text documents.political scienceRFree
WordHoardClose reading and scholarly analysis of deeply tagged textsclose readingWindows, Unix, Linux, MacFree
WordleA tool for generating word clouds.word clouds, visualizationWebFree
WordMapA simple web-based word-map / wordcloud generator.visualization, web-basedWebFree
WordscoresA tool (approach) to extract dimensional information from political textspolitical science, information retrievalFree
WordsmithOne of the most established corpus toolkits providing a variety of functionalityconcordancer, wordlists, statistics, keywordsWindows60€ per licence
WordstatixCorpus analysis toolconcordancerFree
WorldbuilderTool for annotation and visualisation in analysis applying text-world-theoryannotation, visualization
XairaIndexing and analysis of XML resources,indexing, xmlWindowsFree, Open Source
YACSI Chinese Tokeniser / PoS TaggerA Chinese tokenizer and PoS taggerchinese, tokenizer, pos taggerWindowsFree
Log-Likelihood and Effect-Size CalculatorAn online calculator for log-likelihoof and effect sizes.statisticsWebFree
CorefAnnotatorAn annotation tool for coreference.corerference, annotationWindows, Linux, MacOpen Source
SoMaJoA tokenizer and sentence splitter for German and English web and social media texts.tokenizer, sentence boundary detectorLinux, Mac, WindowsFree, Open Source
SoMeWeTaA part-of-speech tagger with support for domain adaptation and external resources.tagging, pos, pos taggerLinux, Mac, WindowsFree, Open Source
COCA_MWU20 ColloGramA collocation analysis tool based on a COCA collocation family list.collocationWindowsFree
RDQAAn R package for Qualitative Data Analysis (QDA).qdaWindows, Linux/FreeBSD, MacFree
KWordsA tool for keyword identification and analysis.keywords, CADS, concordancer, collocation analysisWindows, Linux, MacFree
Range Program (formerly VocabProfiler) (Paul Nation)A tool for for analyzing the vocabulary load of texts.voabulary, lexisWindowsFree
Frequency Program (Paul Nation)A tool that turns a text or texts into a word list with frequency figures.vocabulary, frequency, lexisWindowsFree
Compleat Lexical TutorA website featuring various tools and materials for data-driven language learning.vocabulary, language learning, lexis, web-based, ddlWebFree
WordSiftA word cloud generator, with dynamic filters, links to images, and KWIC capabilities. Works with various types/formats of word lists. word cloud, vocabulary profiling, lexis, vocabulary, language teachingWebFree
KHCoderA free software for quantitative content analysis or text mining that supports multiple languages.correspondance analysis, collocation analysis, frequency analysisWindows, Mac, LinuxFree, Open Source
MaltParser A system for data-driven dependency parsing, which can be used to induce a parsing model from treebank data and to parse new data using an induced model. parser, dependency parsingWindows, Mac, LinuxFree
MaltOptimizerA system for parser optimization using the open-source system MaltParser.parser, dependency parsingWindows, Mac, LinuxFree
Link Grammar ParserA syntactic parser of English, Russian, Arabic and Persian (and others), based on Link Grammar.parser, syntax, grammarLinux, Mac, WindowsFree
ANTLRANother Tool for Language Recognition is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.parser generatorLinux, Mac, WindowsFree, Open Source
GOLD Parsing SystemA parsing system that can be used to develop programming languages, scripting languages and interpreters.parser generatorLinux, Mac, WindowsFree
JavaCCA popular parser generator for use with Java applications.parser generatorLinux, Mac, WindowsFree
Lextutor Web ConcordancersWeb concordancers targeted towards DDLcollocations, concordancer, DDLWebFree
wordspaceAn R package for distributional semanticssemantics, distributional semantics, RRFree
UCS ToolkitA toolkit (libraries and scripts) for the statistical analysis of coocurence data.collocation, coocurence, statisticsR, PerlFree
CoqueryA free corpus query tool to search, analyze, and visualize corporaquery, visualizationLinux, Mac, WindowsFree
WordWandererA web-based visualization/analysis tool which allows its users to "wander" a text.visualization, concordancerWebFree
Cortext ManagerA scriptable "ecosystem" for modeling and exploring corpora. Especially useful for creating topic models and co-occurence networks.NER, topic models, visualization, word2vec, collocation, keywordsWebFree
AMesureA web-based system to analyse the reading complexity of French textstext complexity, readabilityWebFree
CEFRLexA web-based tool to analyse the lexical complexity of words in texts according to the CEFR scale in various languages.text complexity, readability, language learningWebFree
PACTEA flexible collaborative text annotation platform that is currently in development.annotationWebFree (for research)
CLANA tool for searching and analyzing child language data in the CHAT transcription, wordlists, collocation, child language, CHILDESWindows, Mac, UnixFree, Open Source
NVIVOA commercial Computer-Assisted Qualitative Data Analysis Software (CAQDAS) software that works with both qualitative and mixed methods dataqda, mixed methodsWindows, MacCommercial
QDA MinerA commercial QDA tool for coding, annotating, retrieving and analyzing collections of documents and images.qda, mixed methods, text analysisWindowsCommercial
tagtogA text annotation tool specifically built to train AI/ML models.machine learning, annotationCloud-BasedCommerical
Calc: Corpus CalculatorA web-based tool to calculate basic corpus statistics, for example, comparing frequencies across corpora.statisticsWebFree
gwicA very basic KWIC tool written in Go.concordancer, KWICWindows, Mac, LinuxOpen Source
ACTRES Rhetorical Movel TaggerA tool for tagging rhetorical moves.tagging, rhetoricsWebCommerical
ACTRES Corpus BrowserA tool for retrieving tagged information in more than one language.taggingWebCommerical
ACTRES Corpus ManagerA corpus compilation and analysis platform with a focus on multilingual and parallel corpora.compilation, corpus management, annotation, multilingualWebCommercial
VideoAntA web-based tool to annotate and discuss web-hosted videos.annotation, videoWebFree
buzzA python-based linguistic analysis tool.parsing, concordancer, visualizationPythonFree, Open Source
SLATESLATE is a python-based CLI annotation tool. It is very lightweight and can be used for various types of span-based annotation.annotationPythonFree, Open Source
YEDDAYEDDA is a python-based collaborative text span annotation tool with support for a very wide variety of languages including Chinese.annotationPythonFree, Open Source
LightTagA commercial text annotation tool focused on managing and working with teams of annotators.annotation, tagging, ai-taggingWebCommerical
ICECUPThe ICE Corpus Utility Program (ICECUP) is a corpus exploration tools for parsed corpora such as ICE-GB and DCPSE.ICE, explorationFree
CorponaA Python library for processing XML- and JSON-based corpora.library, XML, JSON, annotationPythonOpen Source
MurreA tool for normalising and generating dialectal Finnish and Swedishpython, variation, dialectal data, finnishLinux, Mac, WindowsFree
FinMeterA tool for analyzing Finnish poetry in terms of meter, rhyme, semantics, metaphors etc.lexical analysis, rhetorical analysis, poem analysis, metaphor interpretation, metaphor identification, semantics, metaphors, finnishLinux, Mac, WindowsFree
UralicNLPNLP tools (primarily) for Uralic languagesuralic, parser, pos tagger, tagging, inflection, morphological taggerLinux, Mac, WindowsFree
ConvoKitA toolkit for extracting conversational features and analyzing social phenomena in conversations, using an interface inspired by (and compatible with) scikit-learn.python, conversational analysis, social mediaPythonFree, Open Source
INCEpTIONA semantic annotation platform that offfers intelligent annotation assistance and knowledge managementannotation, multi-layer annotation, computer-assisted annotation, web-basedWebFree, Open Source
QualCoderQualCoder is free, open source software for qualitative data analysis.qda, text analysisLinux, Mac, WindowsFree, Open Source
English Grammar ProfilerA CEFR grammar profiler for ESL/EFL.grammar, parsing, ESL/EFL, CEFRWebFree
UBIAIA NLP-oriented text annotation platform for teams with comprehensive auto-annotation features.annotation, NLPWebCommerical