adversarial learning

Besides defensive distillation, one of the techniques for defending against adversarial attacks on AI systems is adversarial learning (adversarial training). There is currently no other way to defend against such attacks with so-called “adversarial examples”.

Adversarial learning tries to feed as many adversarial examples as possible into a model. These examples are explicitly labelled as threatening and are intended to make the model more robust against attacks. Adversarial examples are deliberately perturbed to induce false results. This way, the model learns what an adversarial attack might look like and can gradually build a stronger ‘immune system’.

The adversarial training technique follows the same approach as typical antivirus software used on personal computers, with multiple updates every day. While quite effective, this type of software requires continuous updates to ensure that the virus database is kept up to date with new threats.

The same applies to adversarial learning: although it can be useful in preventing adversarial attacks, it requires a lot of maintenance and can only protect a model against attacks that are already known. Consequently, not all attacks can be stopped using this technique because the range of possible attacks is too large and cannot be generated in advance.

Sources:

https://www.ki.nrw/en/glossary/#1

https://deepai.org/machine-learning-glossary-and-terms/adversarial-machine-learning

https://the-decoder.de/kuenstliche-intelligenz-begriffe-erklaerung/ (German)

https://www.computerweekly.com/de/definition/Adversarial-Machine-Learning (German)