    Before a corpus can be passed to AI for training, some preprocessing steps have to be performed. Unwanted content, for example, is removed, normalizations are performed, and texts are adapted to model specifics. If for instance a model has only learned one type of quotation marks in pretraining, these quotation marks should be the same in the training corpus for finetuning, so that they are recognized correctly right away.