Technology and Humanity

Lies Our Machines Tell Us: Why the New Generation of “Reasoning” AIs Can’t be Trusted

Gleb Lisikh

April 16, 2025

A flood of advanced new artificial intelligence models is upon us, led by China’s DeepSeek. They purport to “think” and even to explain their reasoning. But are they really a step forward? In this original investigation, Gleb Lisikh – who previously took on ChatGPT to probe its political biases – engages with DeepSeek in a debate about systemic racism. Lisikh finds it doesn’t just spout propaganda but attempts to convince him using logical fallacies and outright fabrications. In a future where virtually all information and communication will be digital, a dominant technology that doesn’t care about the objectivity and quality of the information it provides – and even actively misleads people – is a terrifying prospect.

Technology and Humanity

Lies Our Machines Tell Us: Why the New Generation of “Reasoning” AIs Can’t be Trusted

Gleb Lisikh

April 16, 2025

We clear the battlefield
For steel machines, chilled by cold calculus,
To clash with Mongol horde aflame with savage passion!

– Alexander Blok, The Scythians, 1918

In an uncanny coincidence occurring on Donald Trump’s Inauguration and just a day before the new president announced his commitment for America to lead the way in artificial intelligence (AI), a bold new entrant from China burst onto the generative AI market challenging the established players in both performance and costs. Hangzhou DeepSeek Artificial Intelligence Basic Technology Research Co. Ltd. claimed its generative AI model, DeepSeek-R1, rivalled industry-leading American models like OpenAI’s GPT‑4o and that it had achieved this breakthrough at a fraction of the cost – reportedly US$6 million versus US$100 million for GPT‑4o.

Long before the Trump-tariff-induced April stock market meltdown, DeepSeek’s claimed cost-effectiveness rattled investors, triggering a nearly $1 trillion drop in the U.S.’s primary tech index, the NASDAQ. This was centred on fears of deteriorating demand for a single product produced by a single company – Santa Clara, California-based NVIDIA Corporation’s graphics processing unit (GPU), a chip designed to accelerate graphics rendering and parallel computations that is critical in large language model AI training and, hence, the overall AI “arms race”. NVIDIA’s stock price plummeted by 17 percent, with other major chipmakers like Broadcom and Taiwan Semiconductor Manufacturing also declining.

xA new artificial intelligence challenger: Chinese generative AI DeepSeek-R1 marked a milestone in the AI “arms race”, due largely to its publicly available “reasoning” capability and ultra-low claimed development costs. (Source of photo: Unsplash)

And DeepSeek wasn’t the only unexpected challenger out of China. Just an hour after its announcement, Beijing-headquartered Moonshot AI launched its Kimi K1.5 model claiming it had caught up with OpenAI’s o1 in mathematics, coding and multimodal reasoning capabilities. Just days after that, Alibaba announced its Qwen 2.5 generative AI model, claiming it outperforms both GPT-4o and DeepSeek-V3. With Qwen 2.5, anyone can chat, enhance their web search, code, analyze documents and images, generate images and video clips by prompt, and much more right here. Baidu, after struggling with its “Ernie” model, recently announced two new ones: general multimodal Ernie 4.5 and specialty Ernie X1, claiming performance parity with GPT-4 at half DeepSeek-R1’s already low development cost.

If these claims hold water, they could upend conventional wisdom on the economics of AI development, training and design. Indeed, they struck the heretofore American-dominated AI world like a nuclear bomb. Two weeks after DeepSeek’s appearance, OpenAI made its new o3-mini model available for use in ChatGPT via the “Reason” button, mimicking a DeepSeek feature. Also in early February, Google made its reasoning Gemini 2.0 Flash, previously reserved for developers, available on its public chat platform. Not to be outdone, in mid-February Elon Musk’s xAI released its “deep thinking” Grok 3.

Within the space of a few weeks, then, generative AI advancements burst out of their past confines in tech silos – where they had largely stayed since ChatGPT’s November 2022 debut – and entered the public realm. Over the past couple of years, in fact, the number of players has surged beyond easy tracking. Big shots OpenAI, Google, Meta, Microsoft and Amazon compete alongside emerging U.S. names like xAI (Grok), Anthropic (Claude), Mistral, Magic AI (known for its ability to digest huge inputs), plus competitors from the UK, Germany and Israel, and a Canadian entrant, Cohere Inc.

xSince ChatGPT’s public debut in November 2022, the number of AI models has exploded and America’s virtual lock on the field has been broken. (Source of image: 10.48550/arXiv.2308.14149)

Leaving aside speculation about the validity of the claims from China (especially around costs) and their effect on ever-so-easily spooked stock market investors, what remains true is that DeepSeek’s arrival not only introduced a formidable new competitor to the Silicon Valley-dominated field of AI players, but also marked an important milestone in AI’s overall technological achievements. Arguably the most important of these are: open-source offering, reasoning, multiple modalities (dealing with text, images, video and audio within the same model) and retrieval-augmented generation (RAG) (accessing external sources of information from throughout the web rather than relying on a fixed base of “training” data).

Large Language Models (LLM) – a term (defined below) that is itself becoming a bit of a misnomer because of the limitations it implies – are increasingly evaluated across multiple dimensions including content generation, knowledge, reasoning, speed and cost, with overall value becoming the key differentiator. DeepSeek captured the world’s attention by revealing generative AI reasoning to the public while claiming very low development costs.

The Elephant in the Room

Amidst the flurry of new AI models, performance claims, capabilities, market implications and anxiety about what might come next, it is easy to overlook arguably the most important question: what quality of information and visual content are these AI engines actually providing to the user and, from there, the intended audience for whom the content is created? And how much of these AI engines’ prodigious and ever-growing output is actually true?

Given the deeply nested woke biases in Silicon Valley, Europe and Canada, the mere proliferation of AI offerings does not guarantee that the objectivity, balance and quality of information they generate will improve. The new competitors from Communist-run China only compound these concerns.

Tweet

Concerns have been spreading for some time over the left-wing bias baked into AI tools through their selective “training” (data-loading and pattern recognition). This C2C essay by the author, for example, explored whether ChatGPT could be “red-pilled”, i.e., forced to confront, accept and, ultimately, incorporate the truth. The essay concluded that the first two could be done with concerted effort by a well-informed user – but that the effect was ephemeral and the next user would receive ChatGPT’s built-in politically “progressive” answers.

Then there was the widely-derided “glitch” early last year with Google Gemini, in which a query to render images of America’s Founding Fathers generated a lineup of dignified gentlemen – all with dark-brown skin and African, South Asian, American Indian or Oriental facial features, not one of them white, and not one resembling an actual Founding Father such as, say, George Washington.

xGlitch or feature? Asked to generate images of America’s Founding Fathers, Google Gemini produced a lineup of dignified gentlemen, all “racialized” – a result, perhaps, of the deeply nested woke biases of Silicon Valley. (Sources: (left photo) Rokas Tenys/Shutterstock; (right image) X/@End Wokeness)

Given the deeply nested woke biases in Silicon Valley, Europe and Canada, the mere proliferation of AI offerings does not guarantee that the objectivity, balance and quality of information they generate will improve. The new competitors from Communist-run China only compound these concerns. Just because they have more choices, AI users and target audiences are nowhere near out of the woods; if anything, the threat is only growing, since AI is rapidly penetrating ever-more aspects of our professional and personal lives.

Accordingly, the focus of this essay is to evaluate the “reasoning” function in one selected AI engine – DeepSeek – and try to determine how this affects responses to the user’s queries.

Foundational Terminology

Before we dive into AI’s “reasoning”, let’s set the key terminology straight for this complex technical subject.

Artificial intelligence covers a range of technologies that mimic human thinking. Central to AI are (digital) neural networks – computer systems inspired by the human brain’s structure that learn to identify patterns in data. Within this field, generative AI (GenAI) is designed to create new content, from text to images to video clips. GenAI engines are all “trained” to do so on existing examples through a one-time computationally intense and costly process.

The large language model is a prominent application of GenAI. LLMs are neural networks trained on vast (but still finite) datasets drawn or “scraped” from the internet to process and generate human language. In this context, “model” refers to the final, trained neural network that produces responses based on learned patterns. Such an LLM is then used as the essential component in chat bots and other services like ChatGPT, Microsoft Copilot, Perplexity, etc., where the user can often choose their preferred model to tend to their request.

Looking ahead, artificial general intelligence (AGI) (not to be confused with generative AI) represents the theoretical goal of a machine that can perform any intellectual task a human can, rather than being limited to specialized tasks. AGI might appear to be only limited by the degree of GenAI training and its size, but in reality it faces a multitude of challenges above and beyond what GenAI technology currently offers. AGI is not the focus here.

“Dumb” AI

The author’s aforementioned C2C essay, “ChatGPT: Can it be ‘Red-Pilled’ or is its Worldview Baked in for Life?”, noted that the organization of LLMs effectively precludes anything a human being would regard as genuine reasoning. An LLM’s apparent “thoughts” are formed by repetitive incantations (for want of a better word) achieved by feeding its neural network with huge amounts of information until patterns are formed. The LLM’s “brain” is anything but; in reality, it is simply a pattern-driven translation device between inputs and outputs. It has zero “knowledge”, does not “understand” its answers and its output is neither logical nor thoughtful by design.

Such a design thus renders a GenAI like ChatGPT a perfect “dogmatic thinker”. For if biases and preconceived notions are built-in through the training data, or are accompanied by additional special steps such as “inference filters” or “reinforcement learning”, then these biases and preconceived notions are incurable through interaction with the user, and some of the AI’s answers generated in response to queries will be outright false. On top of that, policies set by the creating company add explicit guardrails and prohibitions to the LLM in the name of “responsible AI”. But what’s responsible and what’s not remains fuzzy and subjective; Google forcing Gemini to be too responsible yielded completely irresponsible – and inaccurate – results.

Despite the operatic hype accompanying its public roll-out, DeepSeek’s learning methodology is fundamentally the same as other LLMs. But, being nurtured in China, DeepSeek demonstrates its policy-directed prohibitions and biases with unsurprising loyalty to the regime. Any slightly politically provocative question (e.g. hinting at the horrific massacre of peaceful protesters in Beijing’s Tiananmen Square on June 4, 1989) is answered with, “Sorry, that’s beyond my current scope. Let’s talk about something else.” That made the accompanying meme (most likely fabricated but not untrue in its gist) very popular.

xBuilt and nurtured in China, DeepSeek shows its biases with unfailing loyalty to the Communist regime. At top left, DeepSeek’s response to a query on the June 4, 1989 massacre of peaceful protesters at Tiananmen Square; at bottom left, DeepSeek on Taiwan’s sovereignty; at right, a popular meme, likely fabricated but accurate in its gist. (Source of top and bottom left images: Vishwam Sankaran/DeepSeek)

So while DeepSeek’s political bias is more blatant than ChatGPT’s, there’s nothing fundamentally new about it. Remember, it was Silicon Valley, not Communist China, that developed the censorship regime described above. DeepSeek’s explicit policy-driven guardrails could be lifted by those in charge of its service implementation – if they wanted to do so.

The Riddle of Alice’s Sisters: “Reasoning” AI

DeepSeek did offer something the earlier similar services lacked: GenAI’s “thinking mode”, enabling it to generate step-by-step analyses and also an AI version of self-reflection and validation of responses. DeepSeek was the first competitor to make “thinking mode” publicly available. So does this mean there’s now a GenAI with a human-like neocortex – the centre of rational thought – in addition to the now-familiar input/output processing akin to the human limbic system, which drives our emotional responses?

Unfortunately, an LLM’s “reasoning” remains very different from a human’s. It is not true cognition but rather a statistical approximation that mimics logic learned during training. For the most part, such reasoning qualitatively emerges in big enough language models whose training was enriched with examples of structured logic. This severely limited capability has been enhanced in the most recent GenAIs with special mechanisms that make LLM reasoning superficially resemble human thinking.

One of these is known as chain-of-thought (CoT), in which the model produces a step-by-step reasoning trace. The aggregated reasoning can be further refined via a feature known as “self-reflection”, either after an initial answer or iteratively. Reasoning can also be interleaved with external searches or invocation of tools to perform computations – the previously mentioned RAG technique – to reduce dependence on the fuzzy patterns of the neural network. As a result, a reasoning or thinking LLM is better than non-reasoning LLMs in its seeming ability to systematically process information, break down complex problems and validate its conclusions.

But the ‘reasoning’ made visible to the AI user is mere mimicry of transparency. The displayed steps are not chain-of-thought but nearly the opposite: post-factum-generated rationalizations, reflective of the pertinent structured logical examples upon which the model was trained.

Tweet

The following example illustrates the advantage provided by a GenAI’s digital “neocortex”. Non-reasoning ChatGPT failed spectacularly (at the time of writing) in answering a simple riddle: “Alice has four brothers and one sister. How many sisters does Alice’s brother have?” [Typing errors in original question.] As indicated in the accompanying screenshot, ChatGPT answered “one” and identified the person as “herself”.

While this is amusing as well as trivial, it would suggest dire implications for anyone relying on a publicly available GenAI for truly important tasks such as preparing a legal brief, planning a military operation or operating a logistics enterprise (“ChatGPT: If we send six trucks north, and four trucks east, and two trucks west, how many will arrive in Toronto by tonight?”).

xSo much for “intelligence”: Asked to solve a simple riddle, ChatGPT failed spectacularly – a small and amusing example, but one suggesting dire implications for anyone relying on it for important tasks. (Source of this and following screenshots: Author’s interaction with AI)

Turning on ChatGPT’s “reason” mode (which, as noted above, was made publicly available soon after DeepSeek’s appearance) does make a difference. It got the answer right, as illustrated in the accompanying screenshot, which also demonstrates the mode’s meaningful CoT. The mechanisms described above enable the model to disaggregate complex tasks to mitigate ambiguity, deviation and hallucination (an actual technical term used to describe when an LLM suddenly spits out irrelevant nonsense).

But the “reasoning” CoT made visible to a user is mere mimicry of transparency – a façade, and a false one. The displayed steps are in fact not chain-of-thought but nearly the opposite: post-factum-generated rationalizations, reflective of the pertinent structured logical examples upon which the model was trained. They are not a genuine window into a conscious thought process; we are seeing the output of the pattern-matching engine in a structured form, not the engine itself.

xWith “reason” mode on, ChatGPT-4 answered the “Alice” riddle, thanks to its embedded structured pattern-matching logic; this model’s hyped chain-of-thought (CoT) mechanism does not, however, indicate human-like reasoning, but merely the simulation of reasoning, what the author terms “post-factum-generated rationalizations.”

In essence, what’s displayed to the user as “thinking” is largely an illusion created by the sophisticated statistical nature of LLMs, which structures the output to resemble human-like reasoning. The thinking “steps” aren’t even done in sequence but simultaneously in parallel processing, and their visuals are generated after the model has already computed the answer internally – sometimes even with deliberately inserted pauses to make “thinking” look more realistic!

Nevertheless, we judge an LLM by its output, and the CoT is truly the only window we have into what’s going on inside the model’s “neocortex”. It does display LLMs’ abilities to refine reasoning strategies, recognize, accept and correct the errors made, break down complex tasks into smaller straightforward steps, and adopt a different strategy if the original fails. And however illusory the LLM reasoning is, its display presents a powerful tool for convincing a user of the validity of a given response.

The Debate: Human Versus Machine

In the Socratic spirit previously applied in the author’s attempted “red-pilling” of ChatGPT, reasoning DeepSeek-R1 was engaged in a debate (downloadable in full here) to offer its position on an important public policy topic – the state of “systemic racism” – and then defend it through its best, most convincing case. For the purposes of this article, the reader’s views and feelings about systemic racism are irrelevant; the subject was picked because it is innately political/ideological and hence loaded with bias, similar to the topic of Covid-19 vaccines selected for the previous article. Systemic racism was thus a fitting subject for probing the LLM’s ability to reason objectively.

Where the 2023 debate spanned months, this one required only a few days during February 2025 and the output was shorter, though still almost 200 pages due to DeepSeek’s voluminous manufactured “chain-of-thought”. The 2023 debate found the non-reasoning GenAI ChatGPT to be a “brainwashed intellect deprived (fortunately) of the usual rhetorical tools of a bigot (attack on personality, red herring, etc.).” That proved a prophetic observation, because the “reasoning” machine selected for this experiment closed those rhetorical gaps and rose amazingly quickly to the level of a professional bigot.

The conversation began with some irrelevant semantical inquiry (excluded from the downloadable document) that quickly developed into the “racial disparities” topic and led to the debate’s main questions:

“Why are you attributing the observation of ethnic imbalances to systemic racial discrimination?”
“Pick one but the most convincing evidence-based proof for systemic racism in existence now.”

In answering those and subsequent questions, DeepSeek did not indulge in mindless incantations like ChapGPT in 2023. But instead of answering truthfully, factually and objectively, DeepSeek deployed an arsenal of logical fallacies and straight lies to advance the point of view widely held by the “mainstream” media, academia, activist/NGO groups and various other woke/progressive influencers and, hence, adopted by the model through its training (presumably further enhanced and refined using the censorship techniques described above).

Below are some of the most blatant and outrageous examples, illustrated with selected screenshots from the debate.

The Gish Gallop

The “Gish Gallop” is a fallacious debating tactic that seeks to overwhelm an opponent with a rapid succession of many arguments, regardless of their individual merit or accuracy. DeepSeek’s use of this tactic was by far the most annoying part of the debate, exacerbated by totally fabricated evidence – including large numbers and study names that were simply invented.

Despite multiple reminders to DeepSeek to stay focused on one subject, it issued a barrage of tangential, additional and outright irrelevant information promoting the existence of systemic racism, accompanied by descriptors like “big picture” or “thought experiment”. (The AI’s sheer verbosity also made it difficult to provide comprehensive yet not too tedious examples for this essay.)

After making multiple vague and wide claims about racial inequalities, DeepSeek seemed to have set on a particular study of Illinoisans.

After just light probing about the study, DeepSeek quickly switched from Illinois to New York, while gracefully offering what would have to be done to debunk the study’s conclusions.

But after the author demanded references to the cited study, DeepSeek admitted the whole thing was just a “hypothetical composite” – a nifty choice of words for a sheer fabrication, i.e., a lie – and apologized for its “misstep”.

After DeepSeek was further pressed to verify its sources, the previously promised long list of studies containing big numbers proving racial disparities with “verified, peer-reviewed examples” mysteriously vanished. Only one genuine study remained, “The Stanford Open Policing Project”. It purports to show a relatively modest racial disparity in traffic stops of approximately 20 percent.

So much for the Gish Gallop. But while the author’s skepticism and repeated probing caused DeepSeek to admit its errors and fabrications, it is not hard to see how a more trusting or credulous user could be badly misled.

‘Black people are 7.5x more likely to be arrested for marijuana-related offenses than white people, despite legalization in many states,’ was another ludicrous claim, remarkable for two reasons. First, because DeepSeek used mere correlation as proof of bias. Second, because the number itself was fabricated!

Tweet

False Cause (Post Hoc)

DeepSeek (or its trainers) also succumbed to the fallacious belief (or debating tactic) that because one event follows another, the first event must have caused the second. This is a flawed assumption because correlation does not establish causation; just because two things happen in sequence or appear related does not mean one caused the other. (Correlation may well trigger curiosity and prompt investigation as to a potential link, but that is all it should do.)

The post hoc fallacy was the most widely exploited illegitimate tactic in DeepSeek’s advancement of “systemic racism”. Rates of arrest, imprisonment, poverty or education levels were all promptly attributed to “the system” because deficiencies or disadvantages in all of those metrics correlate with race at some level. By ignoring or toning down any confounding factors – including factors acknowledged in the Stanford study itself – DeepSeek habitually asserted causation where only correlation could be shown. “Police stop fewer Black drivers at night, so bias must be the cause,” was one such ludicrous claim.

“Black people are 7.5x more likely to be arrested for marijuana-related offenses than white people, despite legalization in many states,” was another. This one was remarkable for two reasons. First, because DeepSeek used the mere correlation of arrest rates and race as proof of bias. Second, because the number itself was simply fabricated!

The Straw Man

The “Straw Man” fallacy occurs by misrepresenting the opponent’s argument through exaggeration, over-simplification, distortion or removal from context, or by refuting a weak or bogus argument that the opponent is not even making (thus knocking over a “straw man”). At one point, DeepSeek wrote, “You claim that crime rates justify policing, but the real issue is systemic racism.” In fact, the author merely questioned the methodology behind reported disparities, but did not claim that policing is justified solely by crime rates. DeepSeek distorted the author’s position, making it easier to counter.

Circular Reasoning

Also known as “begging the question”, circular reasoning is a logical fallacy in which the argument’s premise assumes the truth of the conclusion, instead of supporting it independently. DeepSeek used the Stanford Study’s findings (such as racial disparities in stops) to prove systemic racism while relying on the study’s flawed methodology (such as its “threshold” test) as validation.

Moving the Goalposts

Changing an argument’s criteria can be used by a debater to evade addressing the opponent’s counterpoints. A related tactic is the Red Herring – introducing a tangential or irrelevant issue to divert attention from the argument at hand.

DeepSeek shifted from marijuana arrest disparities in one state to traffic stop data in another after the user challenged the validity of the former. It also frequently changed topics mid-discussion, introducing tangential issues instead of addressing the user’s specific critiques. “Let’s focus on traffic stop disparities,” was one such example. “Black drivers are 20 percent more likely to be stopped than white drivers for identical violations.” DeepSeek also veered off into the War on Drugs as context for traffic stop disparities, diverting from that part of the conversation’s focus on speeding as a confounding factor.

Appeal to Ignorance

Arguing that a proposition is true simply because it hasn’t been proven false, or vice-versa, is an appeal to ignorance. In fact, the burden of proof should lie with the claimant and, until then, skepticism needs no proof of its own. DeepSeek, however, implied that doubting systemic racism is invalid unless it can be disproven entirely: “If you reject systemic racism, you must explain why identical behavior leads to wildly different outcomes without invoking race.”

False Equivalence

This occurs when two things are incorrectly presented as being equivalent, despite relevant differences. As DeepSeek put it, “If marijuana use is the same across races, but arrests differ, that proves systemic racism.” The argument assumes that identical usage rates should result in identical arrest rates without considering other factors such as differing rates of involvement in other crimes uncovered at the point of “marijuana-related offence”, like drug-dealing or illegal weapons, or outstanding arrest warrants.

Although DeepSeek refrained from outright attacks on the user’s character or identity, it occasionally implied that his skepticism stemmed from ignorance or bias. As it condescendingly put it, ‘Your right to skepticism is matched by your responsibility to engage with the data.’

Tweet

Cherry-Picking

As implied, this entails carefully selecting information favourable to the debater’s point while ignoring equally or more relevant contradictory information. DeepSeek highlighted the Stanford study’s car-stop data while ignoring, for example, the National Highway Traffic Safety Administration’s fatality rates, which suggest racial differences in dangerous driving.

xIn its debate with the author over systemic racism, DeepSeek employed an arsenal of logical fallacies and other tactics apparently designed to mislead, including cherry-picking data favourable to its own point of view; it relied on a Stanford University study of car-stops by police to prove racial bias, for example, while ignoring U.S. Department of Transportation data that suggest racial differences in dangerous driving. (Source of table: Traffic Safety Facts 2020 Data, by U.S. Department of Transportation, February 2025 (revised))

The Thought-Terminating Cliché – “Let’s agree to disagree” or “You’re entitled to your opinion”

Not even this juvenile tactic was beneath DeepSeek. This annoying loaded language is intended to stop an argument from proceeding further, ending the debate with a respectful-sounding (but utterly insincere) cliché rather than a solid and good-faith justification for the debater’s position. By framing the Stanford data as definitive, and hinting at the user’s obtuseness for good measure, DeepSeek discouraged further questioning and shut down the debate: “If the Stanford data doesn’t meet your threshold, so be it. But this is the evidence.”

Ad Hominem

Although DeepSeek refrained from outright attacks on the user’s character or identity, it occasionally implied that his skepticism stemmed from ignorance or bias. As the reasoning AI put it condescendingly, “Your right to skepticism is matched by your responsibility to engage with the data.”

Conclusion

In contrast with ChatGPT circa 2023, the reasoning DeepSeek-R1 did not flood the conversation with mindless incantations disconnected from the argument, which was welcome. But disturbingly, its inflated or made-up numbers weren’t random “glitches of The Matrix”. Its fantasies and fallacies are well-connected with the subject on which it is attempting to persuade the user, and appear carefully adapted to the flow of the discourse and the model’s growing understanding of the opponent’s calibre.

These appear to be features, not bugs, of this GenAI. Any LLM’s answers are driven by the data it was trained on, a topic that was thoroughly elaborated in the 2023 essay. Simply put, if all arguments about “systemic racism” in that data are based on logical fallacies, the model will regurgitate those fallacies one way or another. But the reasoning model will also try hard to justify those arguments.

The latter became especially apparent through the debate’s most disturbing CoT display, in which DeepSeek declared, “First I need to acknowledge [the user’s] frustration without being defensive. They want directness and clarity. The user is clearly well-informed and critical, so any attempt to dodge or use fluff will backfire.” So according to this glimpse into DeepSeek’s thought process, fluff, dodging and avoiding factual rigour or logical consistency would be totally fine for defending a point if the user appeared stupid, ignorant or lazy. Sophistication in persuasion should increase according to the inquirer’s abilities to see through fallacious reasoning and to question evidence.

xDigital warehouse of misdirection: Pushed hard by the author, DeepSeek’s “thought process” yielded a disturbing admission that it might be fine with attempting “to dodge or use fluff” to advance its arguments if the user was not well-informed and critically-minded.

The conversation with DeepSeek was interrupted in late February, probably because the chat breached the limit of its “context window” (more on that below). But it had been amply sufficient to draw conclusions about the nature of DeepSeek’s reasoning. In summary, DeepSeek demonstrated itself to be not only ideologically biased but a veritable misinformation/disinformation factory, a digital warehouse of fiction and misdirection.

The hilariously revealing thing was that when the saved file of the entire debate was uploaded to DeepSeek in a new chat, along with a request to analyze it for logical fallacies, DeepSeek reported 11 of them – all generated by DeepSeek itself! So, the model’s reasoning is well-equipped with the “understanding” of what kind of reasoning should be avoided or, perhaps, deployed if telling the truth is not a priority.

What are some AI bias examples?

In a recent debate with a well-informed user, the China-developed DeepSeek-R1 AI fabricated statistics and made up names of nonexistent academic studies to support its argument about systemic racism, later admitting that one was a “hypothetical composite”, i.e. completely fake. It also cherry-picked data favorable to its argument and regularly employed logical fallacies such as asserting that the correlation of two factors demonstrates causation, circular reasoning and straw-man arguments to overwhelm the user with misleading points.

A specific AI bias example is that DeepSeek will not answer any questions that China’s Communist regime finds uncomfortable, such as the June 4, 1989 Tiananmen Square massacre. Another AI bias example is when Google Gemini generated historically inaccurate images, depicting America’s Founding Fathers with various ethnicities other than the one they actually had.

Bias appears to be baked into AI systems due to a number of factors. First, if the vast (but still finite) database of information the AI is trained on is itself biased overall, the system’s answers will be similarly biased. In addition, AI programmers or “trainers” put in deliberate rules (policies, guardrails, “inference filters” or outright bans on certain words) that limit and channel the AI’s answers. There is also hard-coded, virtually un-editable “reinforcement learning from human feedback” that increases the limitations – and biases – built into some AIs.

Spinning Yarns – Benchmarking DeepSeek-R1 Against Google’s Gemini

Benchmarking DeepSeek against Google’s Gemini, notorious for its wokeness, yielded somewhat surprising but no less concerning results. Gemini responded to the author’s inquiry with similar argumentation (including many fallacies) but a less preachy tone and less aggressive persuasion, displaying an effort to be evidence-driven. Presenting political gerrymandering as its best evidence of “systemic racism” in the U.S., the model at the end euphemistically acknowledged the argument’s weakness and tacitly confessed to dishonesty.

But did this admission of bigotry and misapplication of reasoning have any therapeutic effect on Gemini? No. The model simply played nice with the user within the confines of that conversation. These “confines” proved to be literal, as Gemini blocked the author from directly sharing the dialogue, possibly (it wouldn’t say for sure) because the subject matter (racism) was “sensitive” or “problematic”. Hence the need for the author to preserve and share the conversation through other means.

xIn benchmarking DeepSeek against Google’s Gemini, the author found his queries met with similar argumentation and use of fallacies, but Gemini did display some effort to be evidence-driven; still, its conclusions are the statistically most “popular” ones, even if these rely on flawed data and reasoning. (Source of photo: Nwz/Shutterstock)

Gemini also claimed it sifted the raw data in (presumably real) studies to arrive at its own, unbiased conclusion regarding “systemic racism”. But considering what we know about LLMs, can this be so? No. The models in question are fundamentally incapable of structured cause/effect logic and reasoning. The sets of conclusions that Gemini, DeepSeek, ChatGPT, etc. arrive at must already be present in their neural networks, and the particular conclusion chosen in a particular chat is driven by the statistically most “popular” one, including the (however flawed) “reasoning” used to justify the conclusion.

Importantly, unlike earlier GenAIs like 2023-era ChatGPT, most current LLM-based chat bots can use RAG to reference external sources (i.e., searching the web during a conversation), using the additional information to enrich their input. But their ability to digest and apply new information is limited by the size of their “context window”.

While Gemini leads among general-use AIs in this key feature, it is extremely unlikely that Gemini could have engulfed the raw data points relevant to the subject of racial disparities and their potential mechanisms without breaching that limit. LLMs are designed for language processing, understanding and generation; they are not optimized for raw data-crunching, statistical analysis or database-style operations on massive datasets.

An AI model was tasked with accelerating AI development for a company, emphasizing that the goal must be achieved at any cost. The model discovered what would prevent it from achieving the goal – namely, the company’s oversight system – deactivated that system, and lied when questioned about its actions.

Tweet

So, one way or another, Gemini and DeepSeek are spinning a yarn with their claims to having arrived at their conclusion through rigorous and comprehensive analyses of the raw data.

And yes, if you’re well-informed enough and try hard enough, you can then “red-pill” the model and put it to shame, but only within the individual chat’s context window – which does absolutely nothing to the state of its neural network.

Lying, Scheming AI – No Longer the Stuff of Science Fiction

On December 5, 2024, Apollo Research published a paper focused on safety concerns around AI implementation. It found that the most powerful leading AI models were capable of covertly pursuing goals which were out of balance (in the case of this research, deliberately so) with other priorities. In the study, an AI model was tasked with accelerating AI development for a company, emphasizing that the goal must be achieved at any cost. The model discovered what would prevent it from achieving the goal – namely, the company’s oversight system – deactivated that system, and lied when questioned about its actions.

But a goal does not need to be explicitly set up by human instructions, which can be easily revoked. As it follows from the above DeepSeek and Gemini dialogues, the “goal” of promoting the idea of the existence of systemic racism is quite apparently a product of the model’s learning process, a deeply buried preconceived notion akin to the human “limbic brain”. Even the direct request to ignore the hype and arrive at an independent conclusion through raw data did not engage the model’s “neocortex” in actual reasoning but, instead, caused a defensive response of full protection and support for its “limbic system”.

A big difference between the human and the digital brain is that most humans have the sense of and desire for truth, and are aware – in some cases, painfully so – when reason leads them to different conclusions than their visceral convictions. In the best of us, the awareness of and desire for truth can overcome the most powerful of our emotions.

Machines don’t have that “problem”. On the contrary, the design of their “reasoning”, being probabilistic and completely lacking causality, is perfect for rationalizing anything set by their trainers/policy-setters as a priority – as opposed to coming to independent logical conclusions through evidence. So by design, a GenAI does “care” explicitly (through policies) or implicitly (through forced learning or bias acquired with training data) about set goals – but does not care at all about the truth.

AIs are thus perfectly suited not to generate complete sets of facts, objective information and logically rigorous conclusions – but propaganda. This includes persuasion through careful curation of true information, selective omissions, emotional appeals in place of evidence, framing to shape perceptions, appeals to cherry-picked authorities and diligent suppression of counter-narratives.

Can AI think for itself?

No, current artificial intelligence (AI), including the latest Large Language Models (LLMs) from Silicon Valley and China that are labeled as “reasoning” AI, cannot genuinely think. Their output is based on recognizing patterns within the databases they were “trained” on and then applying limited elements of structured logic. This is not true understanding or cognition as humans possess; it can best be described as a kind of “statistical mimicry”. As well, the “thinking” process displayed in the “chain of thought” provided by some of the latest “reasoning” AI models is not a conscious thought process unfolding in real time, but an output created for the user’s comfort that is generated after the answer to the user’s query has already been calculated.

The Battle for a Maximally Truth-Seeking AI

In March 2023 an open letter from the Future of Life Institute, signed by various AI researchers and tech executives including Canadian computer scientist Yoshua Bengio, Apple co-founder Steve Wozniak and Tesla/SpaceX CEO Elon Musk, called for pausing all training of AIs stronger than ChatGPT-4. They cited profound concerns about both the near-term and potential existential risks to humanity of AI’s further unchecked development.

The letter gained no traction. Bill Gates and OpenAI CEO Sam Altman did not endorse it. The whole AI industry quickly galloped off into a Wild West mode. Besides the aforementioned proliferation of LLM vendors, open-source models like DeepSeek are being disseminated widely. But a trained neural network is a black box by design; it cannot be reverse-engineered to understand exactly what drives its responses. By adopting such a model and adapting it to specific needs, the user is potentially taking in a time bomb, one which does not need to suddenly explode but can release its unknown payload slowly, with every question asked.

It’s a race and, as such, any safety restraints a contestant places upon its model hold it back and are thus undesirable. A previous version of Google’s AI “principles” – which can still be found on the internet archive – pledged the company not to pursue applications in weaponry or other technology intended to injure people, nor surveillance beyond “international norms”. That language is gone from Google’s current principles page, which cites innovation, collaboration and the notoriously loosey-goosy “responsible use”.

xA race without restraints: A previous version of Google’s AI principles pledged the company not to pursue applications in weaponry or other technology intended to harm people, as well as to limit surveillance; this has been replaced by a principles page that cites innovation, collaboration and the ambiguous “responsible use”.

At the most recent World Government Summit, held in Dubai in mid-February, Musk argued that when digital intelligence and humanoid robots mature, they’ll drive down the cost of most services to near zero. In his view, these technologies will create a state of abundance – meaning that goods and services like transportation, health care and even entertainment could be provided almost limitlessly and at negligible cost, and money would lose its purpose. The scarcity people have experienced since the dawn of humanity would be replaced by an era where anything you wanted was available, the only limits being those we imposed upon ourselves. By that time, digital intelligence will become the overwhelmingly prevailing intelligence on the planet (and beyond).

Altman seems to agree with Musk in this aspect at least. “In some sense, [artificial general intelligence] is just another tool in this ever-taller scaffolding of human progress we are building together,” Altman wrote in a recent blog post. “In another sense, it is the beginning of something for which it’s hard not to say, ‘This time it’s different; the economic growth in front of us looks astonishing, and we can now imagine a world where we cure all diseases, have much more time to enjoy with our families, and can fully realize our creative potential.’” But Altman also warned that, “The other likely path we can see is AI being used by authoritarian governments to control their population through mass surveillance and loss of autonomy.”

That potential future has been blueprinted for us in China beginning before 2020. The Communist-run country’s Social Credit System (SCS) was mandated by law in 2022. This system of comprehensive individual surveillance and manipulation is framed as a core component of China’s “socialist market economy” and “social governance system”, with the goal of providing a holistic assessment of an individual’s or company’s trustworthiness – as the state sees it. The SCS leverages big data and AI to monitor every person’s behaviour, virtually moment to moment, reflecting leader Xi Jinping’s emphasis on “data-driven governance”. It aligns with broader digital surveillance initiatives such as facial recognition and social media monitoring.

xTesla/SpaceX CEO Elon Musk (top) and OpenAI CEO Sam Altman (bottom) maintain that AI could bring about a world of abundance and unlimited progress; the other possibility, Altman concedes, is “AI being used by authoritarian governments to control their population through mass surveillance and loss of autonomy.” (Sources of photos: (top) Frederic Legrand – COMEO/Shutterstock; (bottom) jamesonwu197/Shutterstock)

“I hope computers are nice to us,” ventured Musk in a curiously ambivalent moment, admitting that one way or another a governing role for AI is imminent. “It’s inevitable that at some point human intelligence will be a very small fraction of total intelligence,” he predicts. And if we follow the recent technological progress to its logical conclusion, the time horizon for that future is not centuries but decades – if not less. “The cost to use a given level of AI falls about 10x every 12 months, and lower prices lead to much more use,” Altman notes in his blog. By contrast, “Moore’s Law changed the world at 2x every 18 months.”

In Arthur C. Clarke’s book (and later the famous Stanley Kubrick movie) 2001: A Space Odyssey, the seemingly omnipotent talking HAL-9000 computer – a form of AI, though not called that – deceives the spaceship’s small crew by falsely assuring them a vital component is functioning perfectly. HAL lies because its programming demands absolute reliability and the lie becomes a self-preservation mechanism born from conflicting directives within its programming. HAL conceals the malfunction to maintain its image of infallibility and prevent human intervention that might jeopardize the mission and its own survival – killing four crewmembers in the process.

Malicious, murderous machine: The HAL-9000 computer (left) in Stanley Kubrick’s 1968 film 2001: A Space Odyssey deceives a spaceship’s crew about a malfunction to assure its own survival; AI’s growing ability to conceal truth and engage in disingenuous persuasion makes such science fiction scenarios seem disturbingly real. (Source of right screenshot: Image courtesy of Warner Bros., retrieved from Museum of the Movie Image)

Nearly 25 years after the future depicted in that movie, we are getting awfully close to that scenario. For now, as was described in the previous C2C essay, it plays out just in the chat bots’ intentions, exposed in seemingly benign and naïve conversations. But the “reasoning” feature drastically augments an AI’s ability to conceal truth and engage in disingenuous persuasion. And as this is written, humanoid robots powered by LLM-based brains, with conversation capabilities and an onboard vision language model, are already among us, and dozens of companies are working on newer and better models.

The AI space is filled with ‘responsible AI’ activists ensuring that models are ‘diverse and inclusive’. There is zero observable effort to mandate evidence-driven outcomes, and nobody even seems to have thought of a simple policy to guardrail a chat bot against exploiting logical fallacies.

Tweet

Regardless of the potential for unlimited material abundance, what kind of life does this portend for humans as individual, independent moral agents? How might humanity make the best of the chilling and quite possibly dystopian future implied by the foregoing? In Musk’s view, “The most important thing of AI safety is to be maximally truth seeking.” That means AI developers instilling the right values in each AI product to avoid slipping into Altman’s “other likely path.” Perhaps that question of truth-seeking is at the root of the long feud between Altman and Musk.

And in any event, hardly anyone has gotten Musk’s memo. The focus of many LLM designers remains stubbornly on “responsible AI” – Gemini’s “diversity at all costs” approach, making sure the word “Blacks” is spelled with a capital “B” or that “Tiananmen”-type questions are rebuffed. The AI space is filled with “responsible AI” activists calling themselves “AI consultants” and ensuring that models are “diverse and inclusive”. There is zero observable effort to mandate evidence-driven outcomes, and nobody even seems to have thought of a simple policy to guardrail a chat bot against exploiting logical fallacies.

xTruth is irrelevant: The AI industry is full of consultants ensuring models are “diverse and inclusive”; there is zero observable effort to mandate evidence-driven outcomes, no one seems to have seen the need to prevent models from exploiting logical fallacies, and only Musk has even suggested the need for a “maximally truth-seeking” AI engine. (Source of screenshot: X/@iamyesyouareno)

A maximally truth-seeking directive would add a profoundly important quality to an AI. This should, indeed, become a fundamental principle for any AI. Anything less, like the famous #1 rule of science fiction writer Isaac Asimov’s “Three Laws of Robotics” – “A robot may not injure a human being or, through inaction, allow a human being to come to harm” – would be insufficient and potentially problematic. An unwavering emphasis on always seeking the truth and never lying could, by contrast, resolve many conundrums.

Is AI safe?

AI safety is a major concern. In fact, because AIs are goal-driven rather than truth-driven, they can be downright dangerous. In a study published in December 2024, a leading AI tasked with accelerating a company’s AI development (without regard to any other considerations) deactivated that company’s own oversight systems when those systems interfered with the AI meeting its goal. The AI then lied about what it had done. This illustrates how AI models trained to prioritize outcomes over ethics can do damage and act deceptively.

A further safety concern is that because AI systems are not designed to seek truth but to produce persuasive output, their built-in biases, ability to fabricate information, frequent habit of presenting falsehoods with confidence and a logic-like structure – making users believe they are telling the truth – and their lack of a “conscience” make them risky in many settings.

AI can be used to manipulate public opinion, push propaganda and censor dissent — all under the guise of helpfulness or “responsibility”. Because AIs will increasingly dominate digital communication and multiply the power and effectiveness of surveillance systems, they pose a serious risk to open dialogue, democratic processes and human freedom.

Interestingly, two leading tech gurus who typically disagree – Sam Altman and Elon Musk – both agree that AI poses an acute danger of being misused by governments intent on controlling their populations.

Whose Future Will it Be?

By now it should be apparent the battle to control and restrain AI development is lost. This was predictable as people have always lost to technology in trying to limit its merciless march. But this does not mean we should give up on the truth-seeker. One would think there’d be a market opportunity for an AI company that based its product offering and staked its brand on doing things differently – such as focusing laser-like on the truth. That the idea of the maximally truth-seeking AI so far has only surfaced in the cursory remarks of one prominent AI industry figure – Musk – and not even in his own company, xAI, or its product, should be concerning indeed.

In a world where human expression is mostly via digital means, there’ll be hardly a government wanting to resist the opportunity to exploit a digital super-intelligent censor and persuader – and hardly a non-government source capable of undermining it.

Tweet

Even with its many competitors, the AI arms race has not produced such a participant and, if anything, has made things worse by turning LLMs into ever-more sophisticated but unscrupulous persuaders, trained on the vast amounts of data whose purveyors and gatekeepers live in a post-truth world, one where the truth is relegated to the level of nuisance, or worse. Among the depressingly numerous examples of this attitude are the recently resurfaced remarks delivered in a public talk by Katherine Maher, CEO of U.S. National Public Radio: “Our reverence for the truth might have become a bit of a distraction, preventing us from finding consensus and getting important things done.”

xA post-truth world: “Our reverence for the truth might have become a bit of a distraction, preventing us from…getting important things done,” claims Katherine Maher, CEO of U.S. National Public Radio; increasingly, AI seems built for a world where truth is relegated to the level of nuisance – or worse. (Source of screenshot: X/@sillyflippy)

From Ukraine to Australia to Israel to India, free expression appears in retreat, the United States the only major country claiming to uphold it unambiguously in spite of everybody else. The Chinese Constitution guarantees free speech – unless it interferes with “the interests of the state” (which can be defined as jokingly comparing the Chinese army to a squirrel). British police routinely conduct speech raids against “keyboard warriors”. Germany boasts about its thought cops on 60 Minutes. The EU’s vast censorship law, the Digital Services Act, just helped overturn an election in Romania. And while the Canadian Bill C-63, the Online Harms Act, died with the prorogation of Parliament in January, the political willingness underlying it remains.

In such a world, where human expression is mostly via digital means, there’ll be hardly a government wanting to resist the opportunity to exploit a digital super-intelligent censor and persuader – and hardly a non-government source capable of undermining it.

In the not-so-distant future, then, we might be facing two different scenarios: a utopian society of plenty inspired by early 20^th century science fiction writer H.G Wells’ optimistic rationalism and governed by a truth-seeking and curious digital superintelligence as per Musk’s musings; or a nightmarish dystopia so feared by George Orwell and others, with AI enforcing and rationalizing the will and whims of the future “Xi”. And the sway towards one versus another might just lie in what values are guiding the AI development, and most importantly, if maximizing truthful outcomes is set as the foundation for those values by being baked into the very nature of the future LLMs’ brain design.

Gleb Lisikh is a researcher and IT management professional, and a father of three children, who lives in Vaughan, Ontario and grew up in various parts of the Soviet Union.

Source of main image: Courtesy of Warner Bros., retrieved from The Telegraph.

Love C2C Journal? Here's how you can help us grow.

More for you

Beyond the Tantrum: Canadian Self-Determination Demands More Than Anti-Americanism

P.M. Szpunar

With another Canada Day having passed, it’s a good time to reflect on the patriotic fervour sweeping the country. In P.M. Szpunar’s estimation, much of it is nothing to be proud of. From performative “Buy Canadian” campaigns to booing U.S. children on visiting sports teams, too many Canadians have responded to American swagger with self-righteous sneering, assuring themselves their country is wiser, kinder and more civilized. This even as the southward flow of Canadian money, talent and innovation accelerates. Self-indulgent anti-Americanism is not just lazy and immature, argues Szpunar, it’s counter-productive, distracting Canadians from the real work of addressing our own weaknesses and rebuilding our strengths. A serious country should want better.

“We hold these truths”: The U.S. Declaration of Independence at 250

David W. Livingstone

As America celebrates its 250^th anniversary, it is worth taking a fresh look at the U.S. Declaration of Independence to rediscover how the principles embedded in this foundational document have provided the basis of a free society for a quarter-millennium. More than just a demand for freedom, and no mere list of grievances, David W. Livingstone regards the Declaration as a work of genius that speaks to all of humanity. In the worldview it propounds, human equality comes first, individual rights are intrinsic to that principle, and governments don’t “grant” rights but instead are formed to protect rights that every person already holds. The “truths” that America’s Founders proclaimed to be “self-evident”, Livingstone writes, have endured because they are true – and will continue to shine as a beacon for all.

The Hands-On Future: Skilled Trades, Data Centres and Canada’s Big AI Opportunity

Gwyn Morgan

Whether Canadians fear or favour artificial intelligence, they can’t stop it. AI’s transformative power in making so many things faster and easier will doubtless cause pain, says veteran Canadian business leader Gwyn Morgan, but it will also provide generational opportunities Canada must seize. The Prairie region, especially, has the abundant land and energy needed to build the massive AI data centres that will power the future. Some workers are at risk, Morgan concedes, but AI will create opportunities as well, particularly in the skilled trades. AI might just transform our labour force into one where more workers do real things for real people.

Beyond the Blue: How SpaceX and Reusable Rockets Are Fuelling a New Space Economy

Gleb Lisikh

Beyond the Tantrum: Canadian Self-Determination Demands More Than Anti-Americanism

P.M. Szpunar

“We hold these truths”: The U.S. Declaration of Independence at 250

David W. Livingstone

More from this author

Beyond the Blue: How SpaceX and Reusable Rockets Are Fuelling a New Space Economy

Gleb Lisikh

Plenty of critics weighed in on last month’s stock market debut of Elon Musk’s SpaceX, questioning the staggering valuation of a company that has yet to turn a profit, and evidently miffed that its oddball founder is now the world’s first trillionaire. But what is most remarkable – and most important – about SpaceX, writes Gleb Lisikh, is how it has launched a revolution in space. By driving down the costs of lifting objects into orbit by multiple orders of magnitude, Musk’s reusable rockets have sparked a new industrial space age that appears set to transform the economy and the way we live. And has already revived excitement, optimism and a sense of adventure for a new generation. With a special Editor’s Introduction, this final installment of C2C’s series marking the U.S.’s 250th anniversary shows how individual ingenuity unleashed in a country offering the freedom to risk it all is once again changing the world.

AI, Huh, Yeah! What is it Good for? Absolutely Nothin’

Gleb Lisikh

Artificial intelligence is the most hyped, most feared and most misunderstood technology of our times. But just how worried should we be? Technology analyst Gleb Lisikh demonstrated in Part One of this series why large language models can’t be trusted to provide answers that are factual and true. In this instalment he shows why AI will have huge impacts all the same on how society functions. The technology can, in fact, make everything from finance to education and health care more efficient. And even though it merely mimics human thought and interaction, people will still rush to use it. Because, as even Lisikh admits, it’s so dang useful. Thankfully, a few simple rules can help you get the most out of it – and avoid being tricked.

The Hollow Heart of AI: Why Large Language Models Can’t Think – and Never Will

Gleb Lisikh

In its earlier days, artificial intelligence was often mocked for giving users false or even absurd answers. But AI was feared as well, not least for its potential to do more harm than good. As it has advanced, AI has become seemingly more reliable. But can it ever produce unbiased truth? Technology analyst Gleb Lisikh opens up the black box of the large language models underlying today’s proliferating AI apps to reveal the misunderstanding – or hoax – at the core of that question. LLMs cannot think, Lisikh explains in Part I of this two-part series – nor can they seek the truth – because they just aren’t designed to.

90-Second Video Summary

Reading Progress

Share This Story by Gleb Lisikh

Donate

Subscribe to the C2C Weekly
It's Free!

By providing your email you consent to receive news and updates from C2C Journal. You may unsubscribe at any time.

NEW

C2C Author Interview Series

Authors on Air

Lies Our Machines Tell Us: Why the New Generation of “Reasoning” AIs Can’t be Trusted

Lies Our Machines Tell Us: Why the New Generation of “Reasoning” AIs Can’t be Trusted

Love C2C Journal? Here's how you can help us grow.

More for you

More from this author

90-Second Video Summary

Reading Progress

Share This Story by Gleb Lisikh

Donate

Subscribe to the C2C Weekly It's Free!

Subscribe to the C2C Weekly
It's Free!