NoSketch Engine ✎ | Word sketches, thesaurus, keyword computation, corpus creation | corpus creation, semantic analysis, wordlists | | Free |
Sketch Engine ✎ | A corpus manager and text analysis software developed by Lexical Computing. | annotation, concordancer, tagging, sampling, search, visualization, wordlists, keywords, compilation, text analysis, n-grams, collocation, statistics, segmentation, analysis, crawler, parallel, colligation, annotations, tokenization, query, ngrams, boilerplate remover, comparison, frequency analysis, information retrieval, data, sentence boundary, corpus creation, duplicate remover, regex, thesaurus, meta modelling, dictionary, text-processing, xml, frequency, trends patterns, web-based, collocates, collocation analysis, word cloud, coocurence, KWIC, corpus management, multilingual, NLP, diachronic analysis, term extraction, keyword extraction, bilingual term extraction | | 30-day free trial then starts at 4.83 €/month |
TextSTAT ✎ | Tool for creation and manipulation of linguistic data from different languages | corpus creation, concordancer | Windows, GNU/Linux und MacOS | Free |
Trafilatura ✎ | Trafilatura is a Python package and command-line tool which seamlessly downloads, parses, and scrapes web page data. | corpus creation, python, R, compilation, crawler, boilerplate remover, data, xml, scraping | Python | Free, Open Source |