Tools

Tool	Description	Tags	Platforms	Pricing
@nnotate ✎	Semi-automatic annotation of corpus data	annotation	Solaris, Linux	Free (with licence agreement)
aConCorde ✎	Multilingual concordance tool (English and Arabic)	concordancer	Linux, Mac, Windows	Free
ACTRES Corpus Browser ✎	A tool for retrieving tagged information in more than one language.	tagging	Web	Commercial
ACTRES Corpus Manager ✎	A corpus compilation and analysis platform with a focus on multilingual and parallel corpora.	compilation, corpus management, annotation, multilingual	Web	Commercial
ACTRES Rhetorical Movel Tagger ✎	A tool for tagging rhetorical moves.	tagging, rhetorics	Web	Commercial
almaneser / SALTA ✎	Semantic Parser and PoS Tagger for English	parser, pos tagger, tagging		Free (with licence agreement)
AMALGAM ✎	Tool for grammatical annotation (PoS and phrase structure). Tagging a text that was entered via email.	annotation	Web	Free
AMesure ✎	A web-based system to analyse the reading complexity of French texts	text complexity, readability	Web	Free
ANC2go ✎	A web service that allows users to create custom sub-corpora of the ANC	ANC, sampling	Web	Free
ANNIS ✎	Search and visualization tool for multi-layer linguistic corpora with diverse types of annotation	search, visualization	Web (or Linux, Mac, Windows)	Free
AntCLAWSGUI ✎	Front-end interface for CLAWS tagger	pos tagger, tagging	Windows	Free
AntConc ✎	Corpus analysis toolkit	wordlists, concordancer, keywords	Linux, Mac, Windows	Free
AntCorGen ✎	A freeware discipline-specific corpus creation tool.	compilation, text analysis	Windows, Mac, Linux	Free
AntFileConverter ✎	Freeware tool to convert PDF and Word (DOCX) files into plain text	converter	Windows, Mac	Free
AntFileSplitter ✎	A freeware text file splitting tool.	compilation	Windows, Mac, Linux	Free
AntGram ✎	A freeware n-gram and p-frame (open-slot n-gram) generation tool.	text analysis, n-grams, p-frames, lexical bundles, lexical frames	Windows, Mac, Linux	Free
ANTLR ✎	ANother Tool for Language Recognition is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.	parser generator	Linux, Mac, Windows	Free, Open Source
AntMover ✎	Tool for text structure (moves) analysis	text analysis	Windows	Free
AntPConc ✎	Corpus analysis toolkit designed for working with parallel corpora.	wordlists, concordancer	Windows, Mac	Free
AntWordProfiler ✎	Tool for profiling vocabulary level and text complexity	text complexity	Linux, Mac, Windows	Free
ANVIL ✎	A tool for video annoation.	video, annotation	Windows, Linux, Mac	Free
ATLAS.ti ✎	A sophistaticated QDA software for mixed methods approaches	qda, mixed methods	Windows, Mac, Android, iOS	Commercial
Atomic ✎	Multi-layer corpus annotation platform.	annotation	Linux, Mac, Windows	Free
Authorial Voice Analyzer (AVA) ✎	A tool for the analysis of interactional metadiscourse features.	discourse, voice	Mac	Free
BFSU Collocator ✎	A collocation analysis toolkit	collocation, statistics	Windows	Free
BFSU ConcGram Lite ✎	A tool for retrieving bigrams with directional variations.	bigrams, concgrams	Windows	Free
BFSU English Sentence Segmenter ✎	A simple sentence segmenter	segmentation	Windows	Free
BFSU ParaConc ✎	A parallel concordancer	concordancer, parallel	Windows	Free
BFSU PowerConc ✎	A fairly powerful concordancer	concordancer	Windows	Free
BFSU Qualitative Coder ✎	A tool for manual coding of corpora	coding, annotation	Windows	Free
BFSU Sentence Collector ✎	A pedagogic concordancer	concordaner, ddl, pedagogy, language learning	Windows	Free
BFSU Stanford Parser ✎	A simple parser	parser	Windows	Free
BFSU Stanford PoS Tagger (Light) ✎	A GUI for the Standford PoS tagger	pos tagger, tagging	Windows	Free
BNCWeb ✎	BNCweb is a web-based client program for searching and retrieving lexical, grammatical and textual data from the British National Corpus (BNC).	analysis, concordancer	Web	Free
BootCat ✎	Tool for crawling and compiling data from the web with a list of seed words.	crawler, compilation
Bow ✎	Statistical Language Modeling, Text Retrieval, Classification and Clustering	text analysis	UNIX, Linux	Free
buzz ✎	A python-based linguistic analysis tool.	parsing, concordancer, visualization	Python	Free, Open Source
Calc: Corpus Calculator ✎	A web-based tool to calculate basic corpus statistics, for example, comparing frequencies across corpora.	statistics	Web	Free
CasualConc ✎	CasualConc is a concordance program that runs natively on macOS.	concordancer	OSX	Free
CATMA (Computer Assisted Text Markup and Analysis) ✎	An undogmatic, complex annotation and analysis package.	markup, analysis, visualization, annotation	Web	Free
CEFRLex ✎	A web-based tool to analyse the lexical complexity of words in texts according to the CEFR scale in various languages.	text complexity, readability, language learning	Web	Free
Chared ✎	Tool for detecting the character encoding of a text	text analysis	Python 2.6 or later	Free
Chi-Square and Log Likelihood Calculator ✎	A simple tool for calculating Chi-squared and LL	statistics	Windows	Free
CLAN ✎	A tool for searching and analyzing child language data in the CHAT transcription format.	search, wordlists, collocation, child language, CHILDES	Windows, Mac, Unix	Free, Open Source
CLaRK ✎	XML Based System For Corpora Development	compilation		Free (with licence agreement)
CLAWS PoS-Tagger ✎	The CLAWS part-of-speech tagger.	pos tagger, tagging	Web	Via licence or in-house tagging at Lancaster
CLiC ✎	A corpus tool to support the analysis of literary texts.	concordancer	Web	Free
COCA_MWU20 ColloGram ✎	A collocation analysis tool based on a COCA collocation family list.	collocation	Windows	Free
Coh-Metrix ✎	Coh-Metrix is a system for computing computational cohesion and coherence metrics for written and spoken texts. It allows readers, writers, educators, and researchers to instantly gauge the difficulty of written text for the target audience.	cohesion, coherence, readability, textual analysis	Web	Free
Colligator 2.0 ✎	A colligation query/analysis toolkit	colligation	Windows	Free
Collocate ✎	Tool for the extraction of concordances and collocations	concordancer	Windows	35 USD
CoMOn ✎	A tooil for corpus matching analysis	matching	Web	Free
Compleat Lexical Tutor ✎	A website featuring various tools and materials for data-driven language learning.	vocabulary, language learning, lexis, web-based, ddl	Web	Free
ConcGramCore ✎	A modern rewrite of ConcGram (Greaves 2005) that allows efficiently searching for concgrams.	collocation, concgram	Windows	Open Source
Concordance Randomizer ✎	A concordance randomizer	concordancer	Windows	Free
Concordancer ✎	Online tool for frequency counts and text clouds	concordancer	Web	Free
ConvoKit ✎	A toolkit for extracting conversational features and analyzing social phenomena in conversations, using an interface inspired by (and compatible with) scikit-learn.	python, conversational analysis, social media	Python	Free, Open Source
Coquery ✎	A free corpus query tool to search, analyze, and visualize corpora	query, visualization	Linux, Mac, Windows	Free
CorefAnnotator ✎	An annotation tool for coreference.	corerference, annotation	Windows, Linux, Mac	Open Source
CorpKit ✎	An advanced modern corpus toolkit with an emphasis on visualization and annotated corpora.	wordlists, parsing, concordancer, visualization	Linux, Mac, Windows (Python)	Free
Corpona ✎	A Python library for processing XML- and JSON-based corpora.	library, XML, JSON, annotation	Python	Open Source
CorporaCoCo ✎	A set of R functions used to compare co-occurrence between corpora	collocation	R	Free
Corpus Presenter ✎	Tree tagger and corpus analysis software	wordlists, parsing, concordancer, visualization	Windows	Free
Corpus Text Processor ✎	Corpus Text Processor is a downloadable application that provides batched operations for common corpus processing tasks such as encoding or standardization.	compilation, corpus management, text processing	Windows, Mac	Free, Open Source
CorpusExplorer ✎	A complex corpus analysis toolkit combining 45 interactive tools.	visualization, exploration, tagging, text analysis	Windows	Free, Open Source
CorpusSearch ✎	Searches parsed corpora in the Penn Treebank format	searching, penn treebank
Corpustools ✎	An R package for managing, querying, and analyzing texts.	text analysis, R	R	Free, Open Source
Cortext Manager ✎	A scriptable "ecosystem" for modeling and exploring corpora. Especially useful for creating topic models and co-occurence networks.	NER, topic models, visualization, word2vec, collocation, keywords	Web	Free
CPQWeb ✎	Overview of and access to a wide range of corpora	database	Web	Free (once registered)
DART ✎	An annotation tool and research environment for annotating dialogues.	dialogues, annotation	Windows	Free
DepCluster ✎	A tool used for lexeme-based collexeme analysis.	lexis, collexeme, CxG, LBCA
DeTagging Tool ✎	A tool that strips annotation/tags from files.	cleaning, annotations	Windows	Free
Dexter ✎	Tool for text annotation	annotation	Linux, Mac, Windows	Free
DISCO ✎	Corpus pre-processing tool for a variety of languages that Dallows to retrieve the semantic similarity between arbitrary words and phrases	tokenization, annotation	Windows, Linux, Solaris, and MacOS	Free
DisMo ✎	An automatic multi-level annotator for spoken language corpora.	spoken, multilevel, multi-layer, pos tagger, annotation, tagging
DocuScope ✎	A tool for computer-aided rhetorical anyalysis	rhetorical analysis, text analysis, visualization	Windows (Java)	Free
ELAN ✎	Transcription and annotation of sound or video files	transcription, annotation	Linux, Mac, Windows	Free
Emdros ✎	A database engine fpr analyzed and annotated text.	database, annotation, query	Windows, Linux, Mac	Free, Open Source
EncodeAnt ✎	Tool for the detection and conversion of character encodings	converter	Windows, Mac	Free
English Grammar Profiler ✎	A CEFR grammar profiler for ESL/EFL.	grammar, parsing, CEFR, esl, efl	Web	Free
EXMARaLDA ✎	Tool for transcription, annotation, corpus analysis of spoken data	transcription, annotation, analysis		Free
f4analyse ✎	QDA software specifically geared towards interview (spoken) data	qda, spoken	Windows, Mac, Linux	Commercial
f4transkript ✎	Software for transcribing audio data	transcription, spoken	Windows, Max, Linux	Commercial
FinMeter ✎	A tool for analyzing Finnish poetry in terms of meter, rhyme, semantics, metaphors etc.	lexical analysis, rhetorical analysis, poem analysis, metaphor interpretation, metaphor identification, semantics, metaphors, finnish	Linux, Mac, Windows	Free
FireAnt ✎	Social media analysis toolkit	downloader, converter	Windows, Mac	Free
FLAIR (2.0) ✎	An online tool for language teachers and learners that analyzes grammatical constructions and readability on the fly.	constructions, readability	Web	Free
Flesh PC ✎	Calculating Flesh-scores	readability, statistics	Windows	Free
FrameNet ✎	Dictionary of more than 10,000 word senses, tagged for semantic roles (according to Fillmorean Frame Semantics)	semantic parser	Web	Free
Frequency Program (Paul Nation) ✎	A tool that turns a text or texts into a word list with frequency figures.	vocabulary, frequency, lexis	Windows	Free
gensim ✎	Deep learning via word2vec	word2vec	Multi (Python)	Free, Open Source
Gephi ✎	A toolkit for network analysis	network analysis, graphs	Windows, Linux, Mac	Free
GOLD Parsing System ✎	A parsing system that can be used to develop programming languages, scripting languages and interpreters.	parser generator	Linux, Mac, Windows	Free
Google Ngrams ✎	An ngram-viewer for the whole of Google Books	ngrams	Web	Free
GraphColl ✎	Tool for building and exploring networks of linguistic collocations	visualization	Windows, Mac	Free
Gsearch ✎	Tool for syntactic pattern matching	pattern matching	?	Down
gwic ✎	A very basic KWIC tool written in Go.	concordancer, KWIC	Windows, Mac, Linux	Open Source
HeidelGram Web-Based Tools ✎	Basic corpus analysis toolkit for the HeidelGram Corpus	wordlists, concordancer	Web	Free
HeidelTime ✎	A multilingual, domain-sensitive temporal tagger	temporal tagger, timex3	Java	Free, Open Source
Heimdall ✎	A tool that searches a text for sequences written in other languages.	language detection	Linux, Windows, Mac	Open Source
HGSimpleCorpusNetwork ✎	Batch frequency analysis on corrupted (e.g. OCR) corpus data and generation of network analysis data.	wordlists, network analysis	Multi (Python)	Free, Open Source
HTST Samuels ✎	Historical Thesaurus Semantic Tagger via web-interface	semantic tagger	Web	Free
ICARUS ✎	Search and visualization tool for dependency trees	visualization		Free
ICECUP ✎	The ICE Corpus Utility Program (ICECUP) is a corpus exploration tools for parsed corpora such as ICE-GB and DCPSE.	ICE, exploration		Free
ICEweb ✎	A tool for compiling, downloading, and analyzing web corpora in accordance with the ICE	ICE, compilation, crawler	Windows	Free
IMS Corpus Workbench ✎	Tool for sorting frequencies in corpora	wordlists, concordancer	Web and local version	Free
INCEpTION ✎	A semantic annotation platform that offfers intelligent annotation assistance and knowledge management	annotation, multi-layer annotation, computer-assisted annotation, web-based	Web	Free, Open Source
Intelligent Archive ✎	Managing corpora for stylometry	stylometry, management	Windows, Unix, Linux, Mac	Free
JavaCC ✎	A popular parser generator for use with Java applications.	parser generator	Linux, Mac, Windows	Free
jTokenizer ✎	Tokenizing natural language	tokenizer		Free
JusText ✎	Tool for removing boilerplate content, such as navigation links, headers, and footers from HTML pages	boilerplate remover	Python	Free
juxta ✎	Comparing and collating multiple witnesses to single textual works	textual criticism, witnesses	Windows, Unix, Linux, Mac	Free
Kaleidographic ✎	A dynamic and interactive visualization tool for multivariate data.	visualization	Web	Free
KAT Tool ✎	Grouping patterns based on search terms	patterns, concordancer	Windows	Free
kdiff3 ✎	KDiff3 is a diff and merge program.	comparison	Windows, Linux, OSX	Free, Open Source
Keyword Plus ✎	A keyword generation/analysis tool	keywords	Windows	Free
kfNgram ✎	A simple tool for generating n-grams	n-grams, p-frames	Windows	Free
KHCoder ✎	A free software for quantitative content analysis or text mining that supports multiple languages.	correspondence, collocation analysis, frequency analysis	Windows, Mac, Linux	Free, Open Source
Khepri ✎	A view-based toolfor exploring (historical sociolinguistic) data	sociolinguistics, visualization	JavaScript, Web	Free, Open Source
KoGra-R ✎	An R-based online tool that provides statistical measures for corpus-based frequencies	statistics, frequency analysis	Web	Free
KorAP ✎	A complex platform for corpus analysis developed at the IDS in Mannheim	analysis, multilevel, multi-layer	Web	Free, Open Source
KWords ✎	A tool for keyword identification and analysis.	keywords, CADS, concordancer, collocation analysis	Windows, Linux, Mac	Free
LancsBox ✎	The Lancaster Desktop Corpus Toolbox; Software package for the analysis of language data and corpora	collocation, frequency analysis, keywords	Java	Free (CC)
langid.py ✎	A standalone language identification tool written in Python.	language detection	Linux, Windows, Mac	Open Source
LDA-Toolkit ✎	A toolkit for linguistic discourse and image analysis.	discourse, images	Windows	Free
Leipzig Corpus Miner ✎	A modern text mining infrastructure for qualitative data analysis	qda, mixed methods, text mining, lexicometrics, topic models, information retrieval	Linux, Windows, Mac (via VM)	Free
LEXA ✎	A complex lemmatizer.	lexis, lemmaizer		Free
LexisNexis ✎	A database containing (new and old) news articles. They also have other (business) data.	news, data	Web	Commercial
Lexonomy ✎	A tool for writing and publishing dictionaries and other dictionary-like things.	dictionary, publishing dictionary, annotation	Web	Free
lexpan ✎	A tool to analyze syntagmatic structures in corpora. Especially useful to analyze fillers and slots.	syntagmatic, slots	Windows, Linux, Mac	Free
Lextutor Web Concordancers ✎	Web concordancers targeted towards DDL	collocations, concordancer, DDL	Web	Free
LightSide ✎	A machine learning workbench.	machine learning	Linux, Windows	Free, Open Source
LightTag ✎	A commercial text annotation tool focused on managing and working with teams of annotators.	annotation, tagging, ai-tagging	Web	Commercial
Linguistica ✎	Word segmentation and morphological analysis?	segmentation, morphological tagger	Linux, Mac, Windows	Free
Link Grammar Parser ✎	A syntactic parser of English, Russian, Arabic and Persian (and others), based on Link Grammar.	parser, syntax, grammar	Linux, Mac, Windows	Free
LIWC ✎	A tool that tries to compute scores for different emotions, thinkings styles, and social concerns.	lexical analysis, style	Web	Free (but Commercial)
Log-Likelihood and Effect-Size Calculator ✎	An online calculator for log-likelihoof and effect sizes.	statistics	Web	Free
MALLET ✎	Package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text	statistical nlp	Windows	Free
MaltOptimizer ✎	A system for parser optimization using the open-source system MaltParser.	parser, dependency parsing	Windows, Mac, Linux	Free
MaltParser ✎	A system for data-driven dependency parsing, which can be used to induce a parsing model from treebank data and to parse new data using an induced model.	parser, dependency parsing	Windows, Mac, Linux	Free
MAT - Multidemensional Analysis Tagger ✎	A tagger for MDA (Biber et al.) by Andrea Nini.	tagging, MDA	Windows, Mac	Free
MAXQDA ✎	Sophisticated QDA software that works with multimodal data and supports mixed methods approaches	qda, mixed methods	Windows, Mac, Android, iOS	Commercial
MLCT ✎	Tool for building and processing corpora	concordancer, sentence boundary detector		Free
MMAX2 ✎	A multi-level annotation tool	annotation, multilevel, multi-layer	Java	Free, Open Source
MonoConc Esy ✎	Concordancing and text search tool that allows primary and secondary concordancing	concordancer, sentence boundary detector		Free for non-Commercial research
MorphAdorner ✎	Tool for performing morphological tagging of texts	morphological tagger		Free
Murre ✎	A tool for normalising and generating dialectal Finnish and Swedish	python, variation, dialectal data, finnish	Linux, Mac, Windows	Free
N-Gram Processor (NGP) ✎	A perl based tool for the creation and processing of n-gram lists out of text files.	n-grams	Linux, Windows, Mac	Open Source
NATAS ✎	A spacy-based library for processing historical corpora (with a focus on neologisms).	historical, python, lexis	Linux, Windows, Mac	Open Source
Natural Language Toolkit ✎	Platform for building Python programs to work with human language data	tokenizer, tagger	Unix, Mac, Windows (+Python 3.4)	Free
NooJ ✎	Tags texts and corpora (i.e. sets of text files) at the Orthographical, Lexical, Morphological, Syntactic and Semantic levels	multilevel tagger	Windows, Mac, LINUX and BSD Unix	Free
NoSketch Engine ✎	Word sketches, thesaurus, keyword computation, corpus creation	corpus creation, semantic analysis, wordlists		Free
NVIVO ✎	A commercial Computer-Assisted Qualitative Data Analysis Software (CAQDAS) software that works with both qualitative and mixed methods data	qda, mixed methods	Windows, Mac	Commercial
OneClick Terms ✎	An online term extractor with monolingual and bilingual term extraction capabilities.	keywords, term extraction, bilingual term extraction	Web	Free (limited version), 4.83€ / month
Onion ✎	Tool for removing duplicate parts from large collections of texts	duplicate remover		Free
Online Graded Text Editor ✎	Tool for profiling a text's vocabulary level and complexity	text analysis, editing, vocabulary	OSX, Windows	Free
OpenConc ✎	Tool for concordancing	concordancer		Free
PACTE ✎	A flexible collaborative text annotation platform that is currently in development.	annotation	Web	Free (for research)
PALinkA ✎	Annotation tool	annotation		Down
ParaConc ✎	A bilingual/multilingual concordancer	concordancer		Non-Free
Pareidoscope ✎	Pareidoscope is a collection of tools for determining the association between arbitrary linguistic structures, such as collocations, collostructions or between structures.	collocation, constructions		Free
PatCount ✎	A pattern counting tool with powerful statistic capabilities and regex support	patterns	Windows	Free
Pattern Builder ✎	A tool helping with regular expressions and PoS tags	regex, tagging	Windows	Free
Pepper ✎	Conversion between linguistic formats, e.g. from TEI to ANNIS to Tiger XML to EXMARaLDA.	conversion		Free
Phonological CorpusTools (PCT) ✎	Phonological analysis on transcribed corpora	phonology	Multi (Python)	Free
PhraseContext ✎	Tool for wordlists, concordancing, collocation, TTR,	wordlists, concordancer		35€
Pipoca (formerly openQDA) ✎	A web-based QDA software	qda, mixed methods	Web	Free, Open Source
Praaline ✎	Praaline is a system for metadata management, annotation, visualisation and analysis of spoken language corpora.	speech, prosody, spoken, annotation, concordancer, search, visualization, converter, analysis	Windows, Mac, Linux	Free / Open Source (GPL3)
PRAAT ✎	A tool for doing phonetics by computer	phonetics, spoken	Windows, Mac, Linux	Open Source
ProtAnt ✎	Tool for prototypical text analysis	wordlists	Windows, Mac	Free
pysupersensetagger ✎	Analyses texts for MWE and supersenses.	text analysis	Unix, Mac (Python)	Free
PyXMLConc ✎	Concordancer for XML files with automatic tag and attribute detection.	concordancer	Multi (Python), Windows	Free, Open Source
QDA Miner ✎	A commercial QDA tool for coding, annotating, retrieving and analyzing collections of documents and images.	qda, mixed methods, text analysis	Windows	Commercial
QualCoder ✎	QualCoder is free, open source software for qualitative data analysis.	qda, text analysis	Linux, Mac, Windows	Free, Open Source
Quanteda ✎	A python library used to study neologisms in historical English corpora.	R	Linux, Windows, Mac	Open Source
Query Tool for the Edenburgh Associative Thesaurus ✎	A query tool for the EAT	query, thesaurus	Windows	Free
Range Program (formerly VocabProfiler) (Paul Nation) ✎	A tool for for analyzing the vocabulary load of texts.	voabulary, lexis	Windows	Free
RDQA ✎	An R package for Qualitative Data Analysis (QDA).	qda	Windows, Linux/FreeBSD, Mac	Free
Readability Analyzer ✎	A tool for generating various readability statistics	readability, statistics	Windows	Free
Readability Webfx ✎	A tool to check how easy or difficult (readability) a given text is.	readability	Web	Free
Rescribe ✎	Rescribe is an OCR service/tool geared towards historical texts.	ocr	Windows, Linux, Mac	Free
RSTTool ✎	Tool that can annotate texts for constituency and rhetorical structure	annotation	Windows, Macintosh, UNIX and LINUX	Free
Salt ✎	Meta models for linguistic data.	meta modelling		Free
SarAnt ✎	Tool for batch search and replacing	editing, searching	Windows	Free
SegmentAnt ✎	Tool for the segmentation of Japanese and Chinese	segmentation, tokenizing	Windows, Mac, Linux	Free
Shinyconc ✎	ShinyConc is a framework for generating custom web-based concordancers and is written in R and R Shiny.	concordancer, kwic, r	Open Source / R	Free
Simple Concordance Program ✎	Tool for concordance and word listing that works with many languages	concordancer	Windows, Mac	Free
SKELL ✎	A simple tool for language learners and teachers.	language learning, language teaching	Web	Free
Sketch Engine ✎	A corpus manager and text analysis software developed by Lexical Computing.	annotation, concordancer, tagging, sampling, search, visualization, wordlists, keywords, compilation, text analysis, n-grams, collocation, statistics, segmentation, analysis, crawler, parallel, colligation, annotations, tokenization, query, ngrams, boilerplate remover, comparison, frequency analysis, information retrieval, data, sentence boundary, corpus creation, duplicate remover, regex, thesaurus, meta modelling, dictionary, text-processing, xml, frequency, trends patterns, web-based, collocates, collocation analysis, word cloud, coocurence, KWIC, corpus management, multilingual, NLP, diachronic analysis, term extraction, keyword extraction, bilingual term extraction		30-day free trial then starts at 4.83 €/month
SLATE ✎	SLATE is a python-based CLI annotation tool. It is very lightweight and can be used for various types of span-based annotation.	annotation	Python	Free, Open Source
SoMaJo ✎	A tokenizer and sentence splitter for German and English web and social media texts.	tokenizer, sentence boundary detector	Linux, Mac, Windows	Free, Open Source
SoMeWeTa ✎	A part-of-speech tagger with support for domain adaptation and external resources.	tagging, pos, pos tagger	Linux, Mac, Windows	Free, Open Source
SpiderLing ✎	Software for obtaining text from the web useful for building text corpora	crawler		Free
SPPAS ✎	A tool for the automatic annotation and analysis of speech.	speech, spoken, annotation	Windows, Mac, Linux	Free, Open Source
SPre ✎	Tool for segmenting and annotating texts	annotation		Free
Stanford Log-linear POS Tagger ✎	PoS Tagger (with Penn Treebank Tagset) for English, Arabic, Chinese, German	pos tagger, tagging		Free
Stanford Topic Modeling Toolbox ✎	The Stanford Topic Modeling Toolbox (TMT) allows users to perform topic modeling on texts imported from spreadsheets. It supports both LDA and labelled LDA.	topic modeling	Java	Free
Stylo for R ✎	Tool for computational stylistic analysis (authorship attribution, genre analysis)	text analysis		Free
Sub-Corpus Creator ✎	A tool for creating sub-corpora based on search searchs and metadata	compilation	Windows	Free
Synpathy ✎	Tool for manual syntactic annotation	annotation	Windows, Mac, Linux	Free
TAACO ✎	TAACO is a tool that calculates 150 indices of textual/lexical cohesion.	cohesion, lexical sophistication	All	Free, Open Source
TAALES ✎	TAALES measures over 400 indices of lexical sophistication.	lexical sophistication	Mac, Linux, Windows	Open Source
TagAnt ✎	Part-of-speech tagging tool built on Tree Tagger	pos tagger, tagging	Windows, Mac, Linux	Free
TagCrowd ✎	A simple tool for generating tag/word clouds online	word clouds, visualization	Web	Free
tagtog ✎	A text annotation tool specifically built to train AI/ML models.	machine learning, annotation	Cloud-Based	Commercial
Tagxedo ✎	A tool for generating word clouds.	word clouds, visualization	Web	Free
TASX-Annotator ✎	Tool for multilevel annotation and transcription of (multi-channel) video and audio data.	multilevel tagger, transcription	Windows, Mac, Linux, Solaris	Down
Text Analysis Computing Tools (TACT) ✎	A simple, fairly old concordancer.	concordancer		Commercial
Text Variation Explorer ✎	The Text Variation Explorer TVE is a tool for exploring the effect of window size on various common linguistic measures. It visualizes these measures and allows for PCA/Cluster analysis.	visualization, variation analysis	Java	Free
Text Visualization Browser ✎	A survey/gallery of text visualizations	visualization	Web	Free
Textanz ✎	Language analysis program that produces frequency lists, word lists, parts of speech tags.	wordlists, concordancer, pos tagger, dictionary	Any OS	Free, Open Source
TextArc ✎	A tool for visualizing the structure of texts.	visualization
TextDirectory ✎	TextDirectory is a tool for aggregating text files based on various filters and transformation functions.	compilation, text-processing, python	Windows, Linux, OSX	Free, Open Source
Textplot ✎	A tool for mapping a document into a network of terms in order to visualize the topic structure.	visualization, network analysis, semantics, graphs	Python	Free, Open Source
TextSmith Tools ✎	A tool for genre-informed phraseological profiles	phraseology, segmentation	Windows	Free
TextSTAT ✎	Tool for creation and manipulation of linguistic data from different languages	corpus creation, concordancer	Windows, GNU/Linux und MacOS	Free
The (Phonetic) Transcription Editor ✎	An editor for creating phonetic transcriptions	transcription	Windows	Free
The Great American Word Mapper ✎	A visualization tool for the top 100,000 words used in American English twitter data.	twitter, lexis, social media	Web	Free
The Prime Machine ✎	A user- and mobile-friendly corpus analysis toolkit (primarily concordancing) initially designed for English language teaching.	concordancer, language teaching, wordlist, keywords, efl, esl	MacOS, Window, iOS, Android	Free
The Simple Corpus Tool ✎	A corpus analysis toolkit that supports XML annotations.	concordancer, annotation, xml, frequency	Windows	Free
The Simple PoS Tagger ✎	A simply PoS-tagger utilizing Perl Lingua::EN:Tagger	pos tagger, tagging	Windows	Free
The SPAADIA concordancer ✎	A concordancer for the SPAADIA corpus	concordancer, SPAADIA	Windows	Free
The Text Feature Analyser ✎	A tool for investigating textual features and various meassures	text analysis, concordancer	Windows	Free
Thesaurus.com ✎	English language thesaurus with links to English dictionary and translation sites.	efl, esl, linguistics	Not sure, I'm not a programmer or geek.	Free
TigerSearch ✎	Tool for searching syntactically and PoS-tagged corpora	search tool, pos tagger		Free
TnT - Thorsten Brants's PoS Tagger ✎	A simple PoS-Tagger	pos tagger, tagger, tagging	Windows/Unix	Available via Stanford
Trafilatura ✎	Trafilatura is a Python package and command-line tool which seamlessly downloads, parses, and scrapes web page data.	corpus creation, python, R, compilation, crawler, boilerplate remover, data, xml, scraping	Python	Free, Open Source
Tree Editor TrEd 2.0 ✎	Graphical editor and viewer for tree-like structures.	visualization	Windows, GNU/Linux und MacOS	Free
TreeTagger ✎	Tool for annotating text with part-of-speech and lemma information	pos tagger, annotation	Windows, Mac, Linux	Free
TurboParser ✎	Multilingual dependency parser with linear programming	parser		Free
Twarc ✎	A command line tool (and Python library) for archiving Twitter JSON	twitter, social media	Python, Windows, Linux, Mac	Free, Open Source
Tweet NLP ✎	Tweet tokenizer, PoS Tagger, hierarchical word clusters, and a dependency parser for tweets, along with annotated corpora and web-based annotation tools. Clusters: http://www.cs.cmu.edu/~ark/TweetNLP/cluster_viewer.html	pos tagger, tokenizer, parser		Free
TWINT ✎	A Twitter scraping tool written in Python that allows for scraping Tweets from Twitter profiles without using Twitter's API.	twitter, social media, scraping	Linux, Windows, Mac	Open Source
TXM ✎	XML & TEI compatible text analysis software based on TreeTagger, the CQP search engine and the R statistical environment.	text analysis, concordancer, r, statistics, search tool, tokenizer, xml	Windows,Mac,Linux,Tomcat	Free
UAM CorpusTool ✎	Text annotation tool and statistics for various types of linguistic analysis and multilayer annotation	annotation, multi-layer annotation, computer-assisted annotation		Free
UAM ImageTool ✎	Image annotation tool for visual data corpora	annotation		Free
UBIAI ✎	A NLP-oriented text annotation platform for teams with comprehensive auto-annotation features.	annotation, NLP	Web	Commercial
UCREL Semantic Analysis System (USAS) ✎	An automatic semantic tagger for different languages (e.g., English, Chinese, Italian, Dutch, Portuguese, Spanish).	semantic annotation, tagging, semantics		Free
UCS Toolkit ✎	A toolkit (libraries and scripts) for the statistical analysis of coocurence data.	collocation, coocurence, statistics	R, Perl	Free
Unitok ✎	An annotation-aware tokenizer that splits text into line-by-line tokens.	tokenizer		Free
UralicNLP ✎	NLP tools (primarily) for Uralic languages	uralic, parser, pos tagger, tagging, inflection, morphological tagger	Linux, Mac, Windows	Free
VARD ✎	Spelling variant detection and deletion in historical corpora (particularly EModE)	variant detector		Free (with academic email)
VariAnt ✎	Tool for the detection of spelling variants	variant detector	Windows	Free
VideoAnt ✎	A web-based tool to annotate and discuss web-hosted videos.	annotation, video	Web	Free
Voyant Tools ✎	A web-based reading/analysis toolkit for digital texts.	reading, text analysis, visualization, trends patterns	Web	Free, Open Source
VU Amsterdam Metaphor Identification Corpus ✎	Corpus tool for metaphor identification	metaphor identification, metaphors	Web and local version	Free
WConcord 3.0 ✎	A fully featured concordancer	concordancer		Free
WebAnno ✎	A web-based annotation tool	annotation, web-based	Web	Free
WebLicht ✎	WebLicht is an execution environment for automatic annotation of text corpora embedded with the CLARIN-D project.	annotation	Web	Free (CLARIN-D Account needed)
wiki2corpus ✎	The tool downloads Wikipedia and converts them into clean text files.	wikipedia, web as corpus	Python	Free
Wmatrix ✎	Tool for corpus analysis and comparison. Provides access to CLAWS and USAS.	wordlists, concordancer, pos tagger, semantic tagger, keywords, web-based	Web	£50 per username per year
WordCruncher ✎	A tool for searching, studying, and analyzing digital texts and corpora. The tool has been tested for corpora up to a billion words.	concordancer, wordlists, collocates, n-grams, keywords, key phrases, ebooks	Windows, Mac, iOS	Free
WordFish ✎	Extract political positions from text documents.	political science	R	Free
WordHoard ✎	Close reading and scholarly analysis of deeply tagged texts	close reading	Windows, Unix, Linux, Mac	Free
Wordle ✎	A tool for generating word clouds.	word clouds, visualization	Web	Free
WordMap ✎	A simple web-based word-map / wordcloud generator.	visualization, web-based	Web	Free
Wordscores ✎	A tool (approach) to extract dimensional information from political texts	political science, information retrieval		Free
WordSift ✎	A word cloud generator, with dynamic filters, links to images, and KWIC capabilities. Works with various types/formats of word lists.	word cloud, vocabulary profiling, lexis, vocabulary, language teaching	Web	Free
Wordsmith ✎	One of the most established corpus toolkits providing a variety of functionality	concordancer, wordlists, statistics, keywords	Windows	60€ per licence
wordspace ✎	An R package for distributional semantics	semantics, distributional semantics, R	R	Free
Wordstatix ✎	Corpus analysis tool	concordancer		Free
WordWanderer ✎	A web-based visualization/analysis tool which allows its users to "wander" a text.	visualization, concordancer	Web	Free
Worldbuilder ✎	Tool for annotation and visualisation in analysis applying text-world-theory	annotation, visualization
Xaira ✎	A tool for indexing and analyzing XML resources.	indexing, xml	Windows	Free, Open Source
YACSI Chinese Tokeniser / PoS Tagger ✎	A Chinese tokenizer and PoS tagger	chinese, tokenizer, pos tagger	Windows	Free
YEDDA ✎	YEDDA is a python-based collaborative text span annotation tool with support for a very wide variety of languages including Chinese.	annotation	Python	Free, Open Source
BabelNet ✎	A multilingual encyclopedic dictionary featuring a semantic network/ontology.	dictionary, ontology, semantics, NLP	Web	Free
FLAX ✎	FLAX (Flexible Language Acquisition) is a set of tools and applications to automate the production and delivery of interactive digital language collections.	language learning, language teaching, text analysis	Java, Moodle	Free, Open Source
Just the Word ✎	A simple web interface for BNC data	concordancer, frequency analysis, BNC	Web	Free
Orange Data Mining ✎	An open source machine learning and data visualization platform based on workflows.	text analysis, visualization, time series	Windows, Unix, Linux, Mac	Free, Open Source
QualCoder ✎	An open source tool for qualitative data analysis that supports coding text and images.	qda, annotation	Windows, Mac, Linux, Python	Free, Open Source
TEITOK ✎	A web-based platform for viewing, creating, and editing corpora with rich textual mark-up and linguistic annotation.	visualization, TEI, mark-up, annotation	Linux, Mac	Free, Open Source
Wordless ✎	An Integrated corpus tool With multilingual support for the study of language, literature, and translation.	concordancer, text analysis, statistics, readability	Windows, Mac, Linux, Python	Free, Open Source
WebCorp Live ✎	A tool for accessing the Web as a corpus.	web-as-a-corpus	Web	Free
CorpusMate ✎	A web-based, streamlined, and simplified language data analysis experience for younger learners.	language learning, language teaching, concordancer, frequency analysis, pattern	Web	Free
MetaPak ✎	A tool to assist metadiscourse analysis based on Hyland's framework.	metadiscourse	Windows	Free
NeoSCA ✎	A syntactic complexity analyzer for written English. It is a fork of L2SCA with various additional features.	syntactic complexity, constituency parsing, pattern matching, tregex, command line	Windows, Mac, Linux	Free, Open Source
Sanchay ✎	An open source multi-purpose platform focused on South Asian languages.	annotation, tagging, chunking	Windows, Linux	Free, Open Source
LogosLink ✎	A tool for corpus management and ontological augmentation for discourse analysis.	discourse analysis, corpus management	Windows	Free
Word Frequency Analyser ✎	A web-based tool for analyzing word frequencies that also produces frequency charts and word clouds.	pos tagger, tokenizer, lemmatizer, frequency analysis	Web	Free
Discourse Analyzer ✎	An AI (LLM) powered platform for conducting discourse analysis.	discourse analysis, llm, generative AI	Web	Paid
Turkish-English Learner Corpus – Error Tagging ✎	TELC is a lexical-error tagged learner corpus compiled in the Turkish setting. It features a web-based error tagging tool.	learner corpus, error tagging	Web	Free
AutoSearch ✎	A cloud-based corpus query engine that supports the upload of corpora.	concordancer, corpus query engine	Web	Free
Text-Fabric ✎	A Python library for processing corpora (especially based on ancient texts) as annotated graphs.	graph model, annotation, python	Python	Free, Open Source
LogiTerm Pro ✎	A powerful commercial multilingual concordancer especially geared towards translators and terminologists.	concordancer, terminology management, terminology	Windows	875 $CAN
Lexical Complexity Analyzer ✎	A tool to automate lexical complexity analysis of English texts using 25 different measures.	lexical analysis, lexical complexity	Python	Free
FreeTxt ✎	A corpus-based toolkit designed to support the systematic analysis and visualisation of free-text data (e.g. questionnaire and survey responses).	analysis, visualisation, surveys, sentiment, semantic, bilingual	Web	Free

Tools for Corpus Linguistics

Top 25 Tags

Tools