Archives mensuelles : décembre 2025

The philosophical misconception behind the LLM cult (or why LLMs will always bullshit)

There is this idea that if a large language model (LLM) is trained on a large corpus of text, then it knows whatever knowledge is in that corpus. Improving the performance is then essentially a matter of scaling: as you expand the database, you expand knowledge, assuming the statistical model is fine enough (aka many parameters). You can then ask questions and the LLM will answer according to the knowledge in the corpus.

This might sound like a reasonable view, but it is based on a misconception about the nature of knowledge. Indeed, an implicit assumption is that the corpus is logically consistent. But what if it contains a proposition as well as its contradiction, for example the Earth is round and the Earth is flat? In that case, the trained LLM cannot produce consistent answers; it will answer differently, depending on how it is cued – an annoying experience that many users are familiar with.

The natural answer would be to build a high-quality corpus, e.g. by selecting academic textbooks, rather than conspiracy theories. Unfortunately, this is a naïve view of scientific knowledge, one that has been thoroughly debunked by over a century of philosophy of science. It is the view that science is a linear accumulative process: you add observations, and you add deductions, and if you check those assertions, then you get a consistent, certified, corpus of knowledge. Knowledge, then, is constituted of propositions that derive directly from observations, plus what you can logically deduce from those (this is essentially logical positivism). It follows that, if you add a book to a corpus of books, you necessarily increase knowledge, by exactly one book (assuming there is no redundancy).

As intuitive as it might sound, this view is utterly false. It has been shattered on historical grounds by Thomas Kuhn (see also Hasok Chang for more recent work), and on philosophical grounds by various philosophers, such as Lakatos, Quine and others. In science, theories get superseded by other theories that contradict them. At any given time, there are always different theories that coexist, and diverging interpretations of facts. Science is a debate. Human knowledge is contradictory, and science is about trying to resolve those contradictions, not accumulating true propositions. Any working scientist knows that any field is full of paradoxes, internal contradictions and diverging views. It follows that no scientific corpus is internally consistent.

If you build a statistical model of an inconsistent corpus, you do not resolve those contradictions. Instead, what happens is that, depending on context (the prompt), the model will predict one thing or its contrary, possibly within the same session, with apparent confidence – indeed, if you merge two confident propositions, you get a confident contradiction, not doubt. An LLM will always bullshit. Scaling alone (whether of the corpus or of the model) cannot solve this problem.

No, chatbots are not scientists

I can’t believe I need to write this, but apparently, I do. Many journals and preprint sites are now overwhelmed with chatbot-generated submissions. This is bad, but at least it gets characterized as fraud, something that we need to defend against. What I find much more worrying is that respectable scientists don’t seem to see the problem with generating part of their papers with a chatbot, and journals are seriously considering using chatbots to review papers, as they struggle to find reviewers. This is usually backed up by anthropomorphic discourse, such as calling a chatbot “PhD level AI”. This is to be expected from the CEO of a chatbot company, but unfortunately, it is not unusual to hear colleagues describe chatbots as some sort of “interns”. The rationale is that what the chatbot produces looks like a text that a good intern could produce. Or that the chatbot “writes better than me”. Or that it “knows” much more than the average scientist about most subjects (does an encyclopedia “know” more than you too?).

First of all, what’s a chatbot? Of course, I am referring to large language models (LLMs), which are statistical models of text tuned on very large text databases. An LLM is not about truth, but about what is written in the database (whether right, wrong, invented or nonsensical), and it is certainly not about personhood. But the term chatbot emphasizes the deceptive dimension of this technology, namely, it is a statistical model that is conceived in such a way as to fool the user into believing that an actual person is talking. It is the intersection of advanced statistical technology and bullshit capitalism. We have become familiar with bullshit capitalism: a mode of financing based not on the expected revenues of the company that can be reasonably anticipated from a well-conceived business plan, but on closed-loop speculation about the short-term explosion of the share value of a company that sells nothing, based on a “pitch”. Thus, funders are apparently perfectly fine with a CEO explaining that his business plan to make revenue is to build a superhuman AI and ask it to come up with an idea. It’s a joke, right. But he still did not clearly explain how he would make a revenue, so not really a joke.

Scientists should not fall for that. Chatbots are essentially statistical models. No one is actually speaking or taking responsibility for what is being written. The argument that chatbot-generated text resembles scientist-generated text is a tragic misunderstanding of the nature of science. Doing science is not producing text that looks sciency. A PhD is not about learning to do calculations, code, or developing sophisticated technical skills (although it’s obviously an important part of the training). It is about learning to prove (or disprove) claims. At core, it is the development of an ethical attitude to knowledge.

I know, a number of scientists who have read or heard of a couple of 1970-1980s classics in philosophy or sociology of science will object that truth doesn’t exist, or even that it is an authoritarian ideal, and it’s all conventions. Well, at a time when authoritarian politicians are claiming that scientific discourse has no more value than political opinions, we should perhaps pause and reflect a little bit on that position. First of all, it’s a self-defeating position. If it is true, then maybe it’s not true. This alone should make one realize that there might be something wrong with that position. Sure, truth doesn’t exist. That is because any general statement can always potentially be contradicted by future observations. In science, general claims are provisional. Sure. But consistency with current observations and theories does exist, and wrongness certainly does exist too. And sure, scientific claims are necessarily expressed in a certain theoretical context, and this context can always be challenged. But on what basis do we challenge theories and claims? Well, obviously based on whether we think the theories are incorrect or partial or misleading, so on the basis of epistemic norms – don’t call it “truth” if you want to sound philosophically aware, but we’re clearly in the same lexical field.

So, “truth doesn’t exist” is a fine provocative slogan, but certainly a misleading one, unless its meaning is carefully unpacked. Science is all about arguing, challenging claims with arguments, backing up theories with reasoning and experiments, looking for loopholes in reasoning, and generally about demonstrating. Therefore, what defines scientific work is not the application of specific methods and procedures (these differ widely between fields), but an ethical commitment: a commitment to an ideal of truth (or “truth”, if you prefer). This is what a PhD student is supposed to learn: to back up each of their claims with arguments; to challenge the claims of others, or to look for loopholes in their own arguments; to try to resolve apparent contradictions; to think of what might support or disprove a position.

It should be obvious then that to write science is not to produce text that simply looks like a scientific text. The scientific text must reflect the actual reasoning of the scientist, which reflects their best efforts to demonstrate claims. This is precisely what a statistical model cannot do. Everyone has noticed that a chatbot can be cued to claim one thing and the contrary within a few sentences. Nothing surprising there. It is very implausible that everything that has been written on the internet is internally consistent, so a good statistical model of that text will never produce consistent reasonings.

Let us see now some concrete use cases of chatbots in science. Given the preceding remarks, the worst possible case I can see is reviewing. No, a statistical model is not a peer, and no it doesn’t “reason” or “critically thinks”. Yes, it can generate sentences that look like reasoning or criticisms. But it could be right, it could be wrong, who knows. I hear the argument that human reviews are often pretty bad anyway. What kind of argument is that? Since mediocre reviewing is the norm, why not just generalizing it? The scientific ecosystem has already been largely sabotaged by managerial ideologies and publishing sharks, so let’s just finish the work? If that’s the argument, then let’s just give up science entirely.

Other use case: generating the text of your paper. It’s not as bad, but it’s bad. Of course, there are degrees. I can imagine that, like myself, one may not be a native English speaker, and some technological help to polish the language could be helpful (I personally don’t use it for that because I think even the writing style of a chatbot is awful). But the temptation is great to use it to turn a series of vague statements into a nice-sounding prose. The problem is that, by construction, whatever was not in the prompt is made up. The chatbot does not know your study, it does not know the specific context of your vague statements. It can be a struggle to turn “raw results” into scientific text, but that is largely because it takes work to make explicit all the implicit assumptions that you make, to turn your intuitions into sound logical reasoning. And if you did not explicitly put it in the prompt in the first place, then the statistical model makes it up – there’s no magic. It may sound good, but it’s not science.

Even worse is the use of a chatbot to write introduction and discussion. Many people find it hard to write those parts. They are right: it is hard. This is because this is where the results get connected to the whole body of knowledge, where you try to resolve contradictions or reinterpret previous results, where you must use scholarship. This is particularly hard for students because it requires experience and broad scholarship. But it is in making those connections, by careful contextualized argumentation, that the web of knowledge gets extended. Sure, this is not always done as it should be. But scientists should work on that skill, not improve the productivity of mediocre writing.

One might object that there is already much story-telling in scientific papers currently written by humans, especially in the “prestigious” journals, and especially in biology (not mentioning neuroscience, which is even worse). Well yes, but that is obviously a problem to solve, not something we should amplify by automation!

Let me briefly comment on other uses of this technology. One is to generate code. This can be helpful, say, to quickly generate a user interface, or to find the right commands to make a specific plot. This is fine when you can easily tell whether the code is correct or not – looking for the right syntax for a given command is such a use case. But it starts getting problematic when you use it to perform some analysis, especially when the analysis is not standard. There is no guarantee whatsoever that the analysis is done correctly, other than by checking yourself, which requires understanding it. So, in a scientific context, I anticipate that this will cause some issues. When I review a paper, I rarely check the code (it is usually not available anyway) to make sure it does what the paper claims. I trust the authors (unless of course some oddity catches my attention). Some level of trust is inevitable in peer review. I can see the temptation of a programming-averse biologist to just ask a chatbot to do their analyses, rather than looking for technical help. But the result of that is likely to be a rise in the rate of hard-to-spot technical errors.

Another common use is bibliographic search. Some tools are in fact quite powerful, if you understand that you are dealing with a sophisticated search engine, not an expert who summarizes the findings of the literature or of individual papers. For example, I could use it to look for a pharmacological blocker of an ionic channel, which will generally not be the main topic of the papers that use it. The model will output a series of matching papers. In general, the generated description of those papers is pretty bad and untrustable. But, if the references are correct, I can just look up the papers and check for myself. It is basically one way to do content-based search and it should be treated like that, in complement of other methods (looking for reviews, following the tree of citations etc.).

In summary, no, chatbots are not scientists, not even baby scientists: science is about (at least should be about) proving what you claim, not about producing sciency text. Science is an ethical attitude to knowledge, not a bag of methods, and only persons have ethics. Encouraging scientists to write their papers with a chatbot, or worse, automating reviewing with chatbots, is an extremely destructive move and should not be tolerated. This is not the solution to the problems that science currently faces. The solution to those problems is to be political, not technological, and we know it. And please, please, my dear fellow scientists, stop with the anthropomorphic talk. It’s an algorithm, it’s not a person, and you should know it.

All this comes in addition to the many other ethical issues that so-called AI raises, on which a number of talented scholars have written at length (a few pointers: Emily Bender, Abeba Birhane, Olivia Guest, Iris van Rooij, Melanie Mitchell, Gary Marcus).