CINECA IRIS Institutional Research Information System

This study investigates how targeted attacks can compromise the reliability and applications of large language models (LLMs) in educational assessment, highlighting security vulnerabilities that are frequently underestimated in current AI-supported learning environments. As LLMs and other AI tools are increasingly being integrated into grading, providing feedback, and supporting the evaluation workflow, educators are adopting them for their potential to increase efficiency and scalability. However, this rapid adoption also introduces new risks. An unexplored threat is prompt injection, whereby a student acting as an attacker embeds malicious instructions within seemingly regular assignment submissions to influence the model’s behaviour and obtain a more favourable evaluation. To the best of our knowledge, this is the first systematic comparative study to investigate the vulnerability of popular LLMs within a real-world educational context. We analyse a significant representative scenario involving prompt injection in exam assessment to highlight how easily such manipulations can bypass the teacher’s oversight and distort results, thereby disrupting the entire evaluation process. By modelling the structure and behavioural patterns of LLMs under attack, we aim to clarify the underlying mechanisms and expose their limitations when used in educational settings.

When AI Is Fooled: Hidden Risks in LLM-Assisted Grading

Milani A.;Franzoni V.;Florindi E.;Omarbekova A.;Bekmanova G.;Yergesh B.

2025-01-01

Abstract

This study investigates how targeted attacks can compromise the reliability and applications of large language models (LLMs) in educational assessment, highlighting security vulnerabilities that are frequently underestimated in current AI-supported learning environments. As LLMs and other AI tools are increasingly being integrated into grading, providing feedback, and supporting the evaluation workflow, educators are adopting them for their potential to increase efficiency and scalability. However, this rapid adoption also introduces new risks. An unexplored threat is prompt injection, whereby a student acting as an attacker embeds malicious instructions within seemingly regular assignment submissions to influence the model’s behaviour and obtain a more favourable evaluation. To the best of our knowledge, this is the first systematic comparative study to investigate the vulnerability of popular LLMs within a real-world educational context. We analyse a significant representative scenario involving prompt injection in exam assessment to highlight how easily such manipulations can bypass the teacher’s oversight and distort results, thereby disrupting the entire evaluation process. By modelling the structure and behavioural patterns of LLMs under attack, we aim to clarify the underlying mechanisms and expose their limitations when used in educational settings.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Parole chiave
	
				AI misuse detection
education
educational evaluation
generative AI
human-in-the-loop AI
large language models
prompt injection
trustworthy AI
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14085/57763

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

6

ND

social impact