Tools for Corpus Linguistics

A comprehensive list of 170 tools used in corpus analysis.

Please feel free to contribute by suggesting new tools or by pointing out mistakes in the data.

Suggest a Tool

Everything
annotation
concordancer
parser
pos tagger
search
visualization
wordlists
compilation
text analysis
converter
n-grams
p-frames
lexical bundles
lexical frames
text complexity
collocation
statistics
segmentation
coding
concordaner
ddl
pedagogy
analysis
crawler
parallel
tagging
colligation
parsing
collocations
exploration
searching
database
dialogues
cleaning
annotations
tokenization
transcription
downloader
readability
semantic parser
word2vec
ngrams
pattern matching
temporal tagger
timex3
network analysis
semantic tagger
ICE
tokenizer
boilerplate remover
patterns
comparison
keywords
sociolinguistics
frequency analysis
lexis
lemmaizer
news
data
machine learning
morphological tagger
statistical nlp
MDA
sentence boundary ...
tagger
multilevel tagger
corpus creation
semantic analysis
duplicate remover
editing
vocabulary
constructions
regex
conversion
phonology
speech
prosody
spoken
visualisation
phonetics
query
thesaurus
meta modelling
tokenizing
kwic
r
topic modeling
cohesion
lexical sophistication
word clouds
variation analysis
dictionary
text-processing
python
phraseology
xml
frequency
SPAADIA
efl
esl
linguistics
search tool
variant detector
reading
metaphor identific ...
metaphors
ebooks
political science
indexing
chinese
graphs
rhetorical analysis
textual criticism
witnesses
close reading
stylometry
management
twitter
web-based

Tool Description Categories Platform Pricing
almaneser / SALTASemantic Parser/POS Tagger for Englishparser, pos taggerFree (with licence agreement)
AntCLAWSGUIFront-end interface for CLAWS taggerpos taggerWindowsFree
BSFU Stanford POS TaggerA PoS taggerpos tagger, taggingWindowsFree
CLAWS POS-TaggerCLAWS- POS Tagger pos taggerWebVia licence or in-house tagging at Lancaster
Stanford Log-linear POS TaggerPOS Tagger (with Penn Treebank Tagset) for English, Arabic, Chinese, Germanpos taggerFree
TagAntPart-of-speech tagging tool built on Tree Taggerpos taggerWindows, Mac, LinuxFree
TextanzLanguage analysis program that produces frequency lists, word lists, parts of speech tags.wordlists, concordancer, pos tagger, dictionaryAny OSFree, Open Source
The Simple PoS TaggerA simply PoS-tagger utilizing Perl Lingua::EN:Taggerpos tagger, taggerWindowsFree
TnT - Thorsten Brants's PoS TaggerA simple PoS-Taggerpos tagger, taggerWindows/UnixAvailable via Stanford
TreeTaggerTool for annotating text with part-of-speech and lemma informationpos tagger, annotationWindows, Mac, LinuxFree
Tweet NLPTweet tokenizer, POS Tagger, hierarchical word clusters, and a dependency parser for tweets, along with annotated corpora and web-based annotation tools. Clusters: http://www.cs.cmu.edu/~ark/TweetNLP/cluster_viewer.html pos tagger, tokenizer, parserFree
WmatrixTool for corpus analysis and comparisonwordlists, concordancer, pos tagger, semantic taggerWeb£50 per username per year
YACSI Chinese Tokeniser / PoS TaggerA Chinese tokenizer and PoS taggerchinese, tokenizer, pos taggerWindowsFree