Something that worries those of us who study discourse and language, especially now, as AI becomes more present in everyday writing, is the question of responsibility: who is saying what, in what context, and under what conditions of accountability.
That question is not new. Discourse analysis has long been concerned with how texts position speakers, distribute responsibility, invoke other voices, and make certain judgments appear natural, neutral, or warranted. But AI gives the problem a new sharpness. We now encounter texts that sound polished, careful, balanced, and reasonable, even when the relation between the words and any socially situated speaker is unclear.
Many AI-generated texts acknowledge different sides of an issue. They avoid unnecessary confrontation. They rely on cautious formulations such as it is possible that, some may argue, it is important to consider, or the issue depends on context. At first glance, this can look like balance. But balance is not the same as commitment.
In human discourse, commitment is not only a matter of grammar or tone. It emerges from participation in social life: from belonging to professions, institutions, disciplines, political traditions, families, communities, and sometimes from moving uneasily among them. As human speakers and writers, we do not merely arrange positions on a page. We occupy them. We risk something by saying what we say, and that risk is part of what gives evaluation its force.
This is why AI-mediated writing poses such an interesting discourse problem. The most revealing question to me is not whether a text was written by a human or by a machine, but how a text handles evaluation and commitment when no clearly situated speaker stands behind it.
Evaluation, voice, and lived positioning
One useful way to approach this problem is through Appraisal Theory, developed within systemic functional linguistics. Appraisal helps clarify how texts evaluate people, events, institutions, actions, arguments, and possibilities, and how they position themselves in relation to other voices and possible viewpoints.
For the present problem, Engagement is especially important. It concerns how a text presents a claim: as certain or tentative, as its own or someone else’s, as open to alternatives or as if the matter were already settled. These are not merely technical linguistic details. They are signs of how a text relates to the social world, to possible disagreement, and to the question of who appears to stand behind a judgment.
Our commitment usually comes from somewhere: a professional role, a personal history, a disciplinary position, an ideological attachment, an institutional responsibility, or a concrete interaction with others. Even when people try to sound neutral, their neutrality is socially shaped. It has motives, risks, limits, and consequences.
Generative AI, by contrast, can imitate the language of commitment without participating in the social conditions that normally produce it. It can simulate caution, balance, disagreement, concession, certainty, and expertise. But it does not belong to a profession, a class, a political community, a legal institution, a family, a union, a nation, or a lived history. It can reproduce the language of social positioning without being socially positioned in the human sense.
That difference may leave traces.
How AI often handles commitment
None of this means that AI-generated texts are easy to identify. They are not. Human writing can also be formulaic, institutional, generic, evasive, or carefully balanced. And many contemporary documents are hybrid from the outset: drafted by a person, expanded by a model, revised by a person again, then standardized through institutional templates.
Still, when AI manages evaluation and commitment, certain recurrent tendencies become worth noticing; not because they prove machine authorship, but because they show how reasonableness can be performed without much social risk.
One of those tendencies is a kind of generic dialogic openness. The text makes room for other voices—some may argue, critics suggest, it could be said—but those voices often remain strangely weightless. They are not clearly tied to identifiable actors, traditions, debates, institutions, or documented disagreements. The result is an appearance of dialogue without much social location.
Another tendency is low-risk evaluation. The text calls something important, complex, valuable, problematic, or worth considering. None of these judgments is necessarily wrong. The issue is that they often remain safely acceptable across very different contexts. They evaluate, but from a position that does not seem to have much to lose.
A related tendency is formulaic concession: while X is true, Y should also be considered. Human writers use this move all the time, of course. The difference is that, in AI writing, concession can sometimes appear without much underlying pressure. The prose performs fairness, moderation, and balance, but the conflict that would make those rhetorical moves necessary is not always fully there on the page.
There is also a familiar appeal to context. AI texts often say that a question depends on context, and often that is true. But the relevant context is not always unfolded. The prose gestures toward complexity without entering the social, legal, institutional, ideological, or interpersonal conditions that would make the issue complex in the first place.
And then there is distributed responsibility. Through passive constructions, generalized attributions, balanced phrasing, and cautious hedging, responsibility can become diffuse. The document may still sound polished—even professional—but it becomes harder to say who, exactly, is standing behind the judgment being made.
The issue is not whether a text evaluates, but how it stands behind its evaluations
AI-generated writing does not suffer from an absence of evaluation. If anything, it often evaluates constantly. It ranks, compares, warns, recommends, qualifies, reassures, and concludes. The more revealing question is different: what kind of relation does the text establish to its own judgments?
A document may say that a policy is problematic, a claim questionable, an action reasonable, or a concern valid. But what stands behind those judgments? Are they anchored in evidence, expertise, institutional authority, disciplinary standards, legal reasoning, lived experience, or identifiable sources? Or do they remain as floating evaluations: plausible, polished, and difficult to contest precisely because they are only lightly attached to any accountable position?
This is where I find it useful to make a different distinction—between ordinary AI detection and what we might call evaluative authorship analysis.
AI detection asks a narrow but understandable question: Was this written by a machine?
Evaluative authorship analysis asks a different one: How does this text distribute responsibility for what it affirms, doubts, attributes, judges, intensifies, softens, or leaves open?
That second question is less dramatic, but often more useful. It does not promise certainty where certainty may be unavailable. It asks, instead, how commitment is being managed on the page—and what kinds of responsibility the text seems able, or unable, to carry.
What the emerging research helps us see
This line of inquiry is no longer speculative. Recent work has already started using Appraisal Theory to compare human and AI-generated writing. For example, Sharif Alghazo, Ghaleb Rabab’ah, Dina Abdel Salam El-Dakhs, and Ayah Mustafa examine Engagement strategies in human-written and AI-generated academic essays and report meaningful differences in how the two corpora manage expansion, contraction, hedging, and rhetorical complexity. Guangyuan Yao and Zhaoxia Liu make a related move in their study of GPT-generated and human-authored academic book reviews, showing that even when AI can reproduce the general shape of evaluative writing, it tends to be more impersonal and less persuasive in how it handles interpersonal engagement and graduation. Together, these studies matter because they treat appraisal not as ornament, but as a site where differences in argumentative positioning can actually be observed.
Margo Van Poucke, working with learner-chatbot interactions, shows why appraisal matters beyond authorship disputes: AI systems do not simply provide information; they also generate evaluative and interpersonal effects, including patterns that may reflect ideological bias or cultural insensitivity when contextual awareness is weak. At the same time, work by Imamovic and colleagues, who examine ChatGPT as an annotator of Attitude, is a useful reminder that appraisal analysis is not mechanically simple. Evaluation is contextual. It is not reducible to a list of obviously positive or negative words, and that matters if we want discourse analysis to remain analytically serious rather than become a new shortcut for automated suspicion.
Taken together, this emerging research suggests that Appraisal Theory has become genuinely relevant to the study of AI-mediated writing. But it also suggests something else: the most interesting problems do not begin only in controlled comparisons.
Real documents are messier than labelled datasets. A text may be drafted by a human and polished by AI, generated by AI and then rewritten by a human, translated, standardized, or assembled from multiple institutional sources. In those settings, the question human or AI? may simply be too blunt. What often matters more is the pattern of commitment the document displays.
Why this matters for consequential documents
In consequential documents, responsibility is not decorative. Legal, academic, institutional, corporate, and public-facing texts do not simply pass along information. They assign credibility, risk, relevance, legitimacy, blame, value, and authority. They help organize what counts as reasonable, defensible, or actionable.
This matters when reviewing reports, statements, institutional responses, expert summaries, complaints, policies, applications, or any document where wording may affect judgment.
A document can sound measured and careful and still do important social work. It can frame a person, soften an institutional failure, exaggerate consensus, neutralize conflict, or create the appearance of expertise. That is why, read closely, a polished surface is never the whole story.
For that reason, AI-mediated writing should not be examined only as a problem of authorship. It should also be examined as a problem of commitment, positioning, and responsibility.
When a human writer evaluates, that evaluation usually comes from somewhere, even if that somewhere is partial, conflicted, strategic, or institutionally constrained. A person may be wrong, biased, cautious, evasive, or unfair, but the judgment still emerges from a social position. AI can reproduce many of the linguistic forms of that positioning without occupying the position itself.
That difference will not always be easy to name with certainty. But it may still leave discourse-analytic traces.
So the question remains
Appraisal Theory is not a replacement for technical detection tools, and it should not be turned into a shortcut for declaring that a text was written by ChatGPT, Claude, Gemini, or any other system. That would miss the point.
The point is not to build another detector. It is to learn how to read these texts more carefully.
How does the text evaluate? How does it commit? How does it attribute? What voices does it invoke, and which ones remain curiously absent? Where does it open the discussion, and where does it quietly close it? Most importantly: who appears to be carrying the responsibility for the evaluative work the text performs?
AI-generated writing may not reveal itself only through factual errors, awkward phrasing, or familiar stylistic clichés. It may also reveal itself, at least sometimes, in the way it handles reasonableness, caution, balance, and commitment.
The issue is not that AI cannot evaluate. It clearly can.
The issue is that evaluation without lived social positioning may still sound persuasive while remaining only lightly accountable.
For discourse analysts, that is not a minor stylistic detail. It goes to the heart of the problem.
Daniel Avilán
References
Alghazo, S., Rabab’ah, G., El-Dakhs, D. A. S., & Mustafa, A. (2025). Engagement strategies in human-written and AI-generated academic essays: A corpus-based study. Ampersand, 15, Article 100237. https://doi.org/10.1016/j.amper.2025.100237
Imamovic, M., Deilen, S., Glynn, D., & Lapshinova-Koltunski, E. (2024). Using ChatGPT for annotation of Attitude within the Appraisal Theory: Lessons Learned. In Proceedings of the 18th Linguistic Annotation Workshop (LAW-XVIII) (pp. 112–123). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.law-1.11
Martin, J. R., & White, P. R. R. (2005). The Language of Evaluation: Appraisal in English. Palgrave Macmillan. https://doi.org/10.1057/9780230511910
Van Poucke, M. (2024). ChatGPT, the perfect virtual teaching assistant? Ideological bias in learner-chatbot interactions. Computers and Composition, 73, Article 102871. https://doi.org/10.1016/j.compcom.2024.102871
Weber-Wulff, D., Anohina-Naumeca, A., Bjelobaba, S., Foltýnek, T., Guerrero-Dib, J., Popoola, O., Šigut, P., & Waddington, L. (2023). Testing of detection tools for AI-generated text. International Journal for Educational Integrity, 19, Article 26. https://doi.org/10.1007/s40979-023-00146-z
Yao, G., & Liu, Z. (2025). Exploring artificial intelligence appraisal: Appraisal patterns in GPT-generated and human-authored book reviews. Applied Linguistics, Article amaf064. https://doi.org/10.1093/applin/amaf064
