Classification describes the task of assigning a data object to one of several previously defined classes. Example: assigning a text to a genre based on its content.
The grouping of data: data within a group should be similar to each other, data in different groups should be different. Such a group is called a cluster. Example: a lot of international soccer clubs can be clustered according to their membership in a national league.
Computational linguistics is the interface between computer science and linguistics. Natural languages (both text and audio) should be processed using computers, such as speech recognition and synthesis, machine translation, and dialog systems.
Content marketing is a strategic approach to creating and distributing valuable and relevant content to attract and retain users. The goal is to generate new Internet users and retain existing users. Content marketing is thus a tool for increasing traffic. The content can have different formats, e.g. in the form of blog articles on one’s own website, in videos, podcasts or infographics.
A collection of texts that usually has a context in terms of content or structure. For example, a corpus may consist of texts from one source.
A crawler is a program that extracts data from a web page and writes these into a database.