Lexicon

classification

Classification describes the task of assigning a data object to one of several previously defined classes. Example: assigning a text to a genre based on its content.

Cluster analysis or clustering is a machine learning technique that sorts data points into similar groups. It uses the unsupervised machine learning algorithm, which requires no prior information about the data and relies solely on the similarities between the data points.

The basic idea behind this method is to group similar data together and place different data in separate groups. In this way, patterns and structures in the data become apparent. The similarities between the data points are calculated using different metrics. (more…)

computational linguistics

Computational linguistics is the interface between computer science and linguistics. It involves the use of computers to process natural language (both text and audio), such as speech recognition and synthesis, machine translation and dialogue systems. It is therefore an interdisciplinary field concerned with the application of computer technology to language.

One of its main goals is to enable computers to perform natural, human-like language processing, including comprehension and production. This may require hardware, such as input and output devices, as well as software programs. (more…)

content marketing

Content marketing is a strategic approach to creating and distributing valuable and relevant content to attract and retain users. The aim is to generate new internet users and retain existing ones by providing them with informative, useful and/or entertaining content. It is therefore, among other things, a tool for increasing traffic. Content marketing can also improve a company’s image and increase awareness of a brand, product or person. Content can be delivered in a variety of formats, such as blog articles on your own website, videos, podcasts or infographics. (more…)

corpus

A collection of texts that usually has a context in terms of content or structure. For example, a corpus may consist of texts from one source.

crawler

A crawler is a program that extracts data from a web page and writes these into a database. Crawlers are also known as robots or spiders because their search is automatic and their path through the web is similar to a spider’s web.

Spiders usually visit websites via hyperlinks embedded in websites that have already been indexed. The retrieved content is then cached, analysed and, if necessary, indexed. Indexing is based on the search engine’s algorithm. The indexed data then appears in the search engine results.

Using special web analysis tools, web crawlers can analyse information such as page views and links and compile or compare data in the sense of data mining. Websites that do not link or are not linked to cannot be detected by crawlers and therefore cannot be found by search engines.

Sources:

https://www.techtarget.com/whatis/definition/crawler

https://www.myrasecurity.com/en/knowledge-hub/crawler/