Lexicon
preprocessing
Before a corpus can be passed to AI for training, some preprocessing steps have to be performed. Unwanted content, for example, is removed, normalizations are performed, and texts are adapted to model specifics. If for instance a model has only learned one type of quotation marks in pretraining, these quotation marks should be the same in the training corpus for finetuning, so that they are recognized correctly right away.
pretraining, pretrained model
The initial training of a model. Pretraining involves passing very large amounts of data to build a robust statistical model of language and knowledge.