Please feel free to contribute by suggesting new tools or by pointing out mistakes in the data.
|LexisNexis||A database containing (new and old) news articles. They also have other (business) data.||news, data||Web||Commercial|
|Sketch Engine||A corpus manager and text analysis software developed by Lexical Computing.||annotation, concordancer, tagging, sampling, search, visualization, wordlists, keywords, compilation, text analysis, n-grams, collocation, statistics, segmentation, analysis, crawler, parallel, colligation, annotations, tokenization, query, ngrams, boilerplate remover, comparison, frequency analysis, information retrieval, data, sentence boundary, corpus creation, duplicate remover, regex, thesaurus, meta modelling, dictionary, text-processing, xml, frequency, trends patterns, web-based, collocates, collocation analysis, word cloud, coocurence, KWIC, corpus management, multilingual, NLP, diachronic analysis, term extraction, keyword extraction, bilingual term extraction||30-day free trial then starts at 4.83 €/month|
|Trafilatura||Trafilatura is a Python package and command-line tool which seamlessly downloads, parses, and scrapes web page data.||corpus creation, python, R, compilation, crawler, boilerplate remover, data, xml, scraping||Python||Free, Open Source|