Sketch Engine ✎ | A corpus manager and text analysis software developed by Lexical Computing. | annotation, concordancer, tagging, sampling, search, visualization, wordlists, keywords, compilation, text analysis, n-grams, collocation, statistics, segmentation, analysis, crawler, parallel, colligation, annotations, tokenization, query, ngrams, boilerplate remover, comparison, frequency analysis, information retrieval, data, sentence boundary, corpus creation, duplicate remover, regex, thesaurus, meta modelling, dictionary, text-processing, xml, frequency, trends patterns, web-based, collocates, collocation analysis, word cloud, coocurence, KWIC, corpus management, multilingual, NLP, diachronic analysis, term extraction, keyword extraction, bilingual term extraction | | 30-day free trial then starts at 4.83 €/month |
The Simple Corpus Tool ✎ | A corpus analysis toolkit that supports XML annotations. | concordancer, annotation, xml, frequency | Windows | Free |
Trafilatura ✎ | Trafilatura is a Python package and command-line tool which seamlessly downloads, parses, and scrapes web page data. | corpus creation, python, R, compilation, crawler, boilerplate remover, data, xml, scraping | Python | Free, Open Source |
TXM ✎ | XML & TEI compatible text analysis software based on TreeTagger, the CQP search engine and the R statistical environment. | text analysis, concordancer, r, statistics, search tool, tokenizer, xml | Windows,Mac,Linux,Tomcat | Free |
Xaira ✎ | A tool for indexing and analyzing XML resources. | indexing, xml | Windows | Free, Open Source |