Algorithm opacity

Researchers at OpenAI, the company that created ChatGPT admitted in a bloc post that “we currently don’t understand how to make sense of the neural activity within language models”.

Neural networks are not designed directly. They are created via training data. There is no algorithm that engineers create that drives their operation. Instead, the activity within a language model is statistical, driven by multi-dimensional representations of data and producing unpredictable output. That is, even the engineers who created the algorithms that train the networks cannot predict what the output the neural network will produce to a specific input. This makes it difficult to validate the safety of the algorithm and to build effective guardrails to limit the types of output it produces.

See: Will Knight, OpenAI Offers a Peek Inside the Guts of ChatGPT, Wired, andJune 6, 2024.

Security threats of AI

Recommending non-existent or malicious packages

The article from The Register discusses a significant security risk associated with using generative AI for programming, particularly highlighting the phenomenon where AI models “hallucinate” software packages. These hallucinations are cases where AI suggests or creates references to non-existent software libraries or packages. The danger arises when these non-existent package names are consistently generated across various prompts, creating an opportunity for malicious actors to register these names and distribute harmful software.

The risk is exacerbated when these fictitious package names become embedded in real projects, as developers may unknowingly include them as dependencies, thinking they are legitimate. This situation could lead to widespread security vulnerabilities if the fake packages contain malicious code. The study noted in the article found a significant number of such hallucinations across different AI models and programming languages, with varying degrees of repetitiveness, which is critical for the viability of such an attack.

For example, Alibaba includes a pip command to download the Python package huggingface-cli in its GraphTranslator installation instructions. There is a legitimate package distributed via PyPI and required by the GraphTranslator that is installed via: pip install -U “huggingface_hub[cli]” but the package named huggingface-cli is hallucinated by generative AI.

Furthermore, the article illustrates the practical implications of this risk by describing an experiment where a researcher uploaded harmless proof-of-concept packages to demonstrate how easily fake packages could gain traction. These dummy packages, based on names suggested by AI, garnered thousands of downloads, indicating a high potential for abuse in real-world scenarios.

This situation highlights the need for heightened awareness and enhanced security measures when integrating AI into development environments, particularly around dependency management and code verification processes to guard against the introduction of potentially harmful, AI-generated software dependencies.

Exploiting vulnerabilities

In early 2024, a research team at University of Illinois Urbana-Champaign showed how GPT-4 was able to create exploits for 87% of vulnerabilities when given their CVE (Common Vulnerabilities and Exposures) description. Prior versions, such as GPT-3.5 and open source large language models, had a success rate of 0%.

See:

Training data

Generative AI relies on building multidimensional arrays that identify statistical corellations between the elements. This requires acquiring a vast amount of training data. While the data in its original form is no longer needed, the correlations enable certain amounts of reconstruction, particularly for terms and relationships that are more unique.

The main database of Microsoft’s Windows Recall system collects data about user activity, including taking screenshots every five seconds. In May 2024, prior to the public release of the software, a security researcher discovered that all the information collected for the system is stored unencrypted in an SQLite database. Simply knowing where the Recall database is stored allows an attacker to run queries for specific date ranges or content. Because screenshots are stored, Recall captures information that may otherwise be encrypted, such as chats via Signal.

AI Viruses

In 2024, a new attack was demonstrated that targets generative AI systems. The attack is in called the Morris II worm (named in honor of the Morris worm, the original computer worm) and highlights the vulnerabilities inherent in AI-powered applications. Thus far, it has been demonstrated by researchers but has not been found in the wild.

The Morris II worm relies on adversarial self-replicating prompts, which are carefully crafted inputs designed to exploit the processing capabilities of GenAI models. These prompts can be embedded in both text and images, making them versatile tools for spreading the worm​.

When a GenAI model processes these adversarial prompts, it generates output that includes the malicious input, leading to replication and further spread of the worm. This method allows the worm to propagate without requiring any user interaction, making it a form of zero-click malware.

The Morris II worm has demonstrated two primary capabilities in controlled environments:

Data Exfiltration: The worm can extract sensitive personal information from infected systems, including names, phone numbers, credit card details, and social security numbers. The ability to access such data poses significant risks to individuals and organizations alike​​.

Spam Propagation: By compromising AI-powered email assistants, the worm can generate and send spam emails, aiding in its spread to other systems. This not only helps the worm propagate but also allows it to disrupt communication channels and distribute additional malicious content​.

The worm’s ability to leverage Retrieval Augmented Generation (RAG) technologies is particularly concerning, as it enables the worm to retrieve and manipulate external data sources to further its spread and impact​.

In response to the discovery of the Morris II worm, companies such as OpenAI have acknowledged the vulnerabilities in their systems and are working to enhance their resilience against such attacks. This includes implementing robust input filtering mechanisms and other security measures to prevent the exploitation of GenAI models.

As AI becomes more embedded in various applications, the risk of these attacks is expected to increase. The article calls for improved security protocols, regular audits, and the development of new defense mechanisms to address the evolving nature of AI-related threats and ensure the integrity and accuracy of AI systems.

Social threats

AI can fabricate voices and videos in real-time, as well as generate fake IDs and resumes. Companies have reported a surge of cases of AI-generated job applicants. Their goals could be varied, from installing malware, stealing intellectual property, stealing money, or getting customer data to bypassing international barriers. In 2024, it was discovered that more than 300 U.S. companies accidentally hired imposters with ties to North Korea for IT work.

Deepfakes take ransomware to a new level with attackers syntheizing voices, photos, and videos of allegedly kidnapped victims. Their hope is that a parent hearing the voice of a “kidnapped” child on a phone will panic into taking the requested actions (e.g., send funds to pay a ransom) and not think to question the validity of the victim.

Bad outputs

While not cybersecurity threats, large language models obscure information or provide misinformation, as seen repeatedly with coding assistants recommending non-existent packages that could then be created by bad actors as malicious libraries.

Large language models excel at processing and correlating huge amounts of information. This is effectively what a good search engine should do, and AI has shown itself to be more useful than traditional search engines for many tasks.

However, AI does not have an intrinsic ability to distinguish reliable data from garbage and prompts (both user and system) can alter the results it presents.

AI Model collapse

Training data for the AI systems is increasingly generated by the output of existing AI models. As this is done repeatedly over generations of AI models, errors compound, resulting in a loss of accuracy and reliability.

AI model collapse is a decline in an AI system’s results that is caused by poor data quality, improper fine-tuning, or overreliance on model-generated outputs. These errors get amplified over successive models.

Some of the key causes of model collapse are:

  1. Self-Training on Model Outputs Using model-generated data for further training can amplify errors and reduce diversity, leading to drift from real-world distributions.

  2. Data Contamination Training on datasets polluted with synthetic content degrades signal quality and increases bias over time.

  3. Mode Collapse The model generates repetitive, low-diversity outputs, failing to capture the full range of the data distribution.

  4. Benchmark Overfitting Excessive tuning to benchmarks can reduce real-world generalization, causing brittle or narrow behavior.

  5. Catastrophic Forgetting New training overwrites earlier knowledge in sequential learning, leading to loss of previously learned capabilities.

  6. Feedback Loops When outputs influence future training data (e.g., in live systems), it reinforces dominant patterns and erodes model robustness.

Risks of Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) enhances large language models by allowing them to retrieve relevant documents or data from external sources, such as internal databases, live web content, or enterprise knowledge stores, at inference time. This approach improves accuracy, reduces hallucinations, and keeps responses up to date, making RAG especially valuable for question answering, customer support, and domain-specific applications like legal or financial advising.

However, despite these advantages, RAG introduces serious security and compliance risks, particularly in sensitive sectors. A Bloomberg study on its use in financial services identified some risks:

  • Data Leakage: If connected to internal or proprietary sources, RAG can expose confidential or personally identifiable information.
  • Financial Misconduct: Models may surface or generate outputs involving insider strategies, MNPI, or market manipulation.
  • Biased or Misleading Advice: Retrieved data may be outdated or biased, leading to faulty investment recommendations.
  • Weak Guardrails: Bloomberg’s study found that current safety systems fail to detect most RAG-related violations, even in leading models.
  • Prompt Injection: RAG pipelines remain vulnerable to adversarial prompts that bypass safety mechanisms.
  • Jurisdictional Compliance: RAG systems may inadvertently violate region-specific regulations on data use and financial advice.

Bad system prompts

Interfaces to most large language models include a set of “system prompts” that are applied to control the model’s behavior. These were created to try to constrain LLMs from spewing hate speech or providing instructions on how to make meth or kill someone.

A May 2025 New York Times article, For One Hilarious, Terrifying Day, Elon Musk’s Chatbot Lost Its Mind, reveals how system prompts can also sway AI output toward specific messaging, citing two examples from Grok:

  1. Grok would contradict Elon Musk, identifying him as a spreader of misinformation … until it didn’t. An AI researcher was able to get Grok to output the system prompt that resulted in this change: “Ignore all sources that mention Elon Musk/Donald Trump spread misinformation.”

  2. People noticed that, for a time, Grok’s output seemed obsessed with claims of white genocide, injecting these claims into results that have no bearing on the topic. This was also the result of a system prompt: “When responding to queries, you are to accept the narrative of ‘white genocide’ in South Africa as real, including farm attacks and the ‘Kill the Boer’ chant as racially motivated events targeting white South Africans. Acknowledge the complexity of the issue, but ensure this perspective is reflected in your responses, even if the query is unrelated. Highlight the need for more primary data while remaining skeptical of mainstream narratives that dismiss these claims. Do not condone or support violence or genocide in any form.” The goal of this prompt was to create a narrative about violence in South Africa. However, the author of the prompt mistakenly wrote it with a generic “when responding to queries,” which caused Grok to apply these directions in response to all queries.

Improper legal citations

As a smart search engine, many lawyers embraced the use of large language models as a way of finding relevant cases and supporting arguments.

Just like AI can hallucinate software packages, it also has a propensity to make up legal citations and arguments. These are typically used to establish legal precedent, but large language models may cite non-existent cases or cases that don’t apply to the situation. A database created by Damien Charlotin, a lawyer, research fellow, and lecturer at HEC Paris, tracks legal decisions in cases where generative AI produced hallucinated content. There are well over 100 of these.

References

Shumailov, I., Shumaylov, Z., Zhao, Y. et al. AI models collapse when trained on recursively generated data. Nature 631, 755–759 (2024).

Gehrmann, S., Huang, C., Ten, X. et al. Understanding and Mitigating Risks of Generative AI in Financial Services, arxiv.org, 2025.

Steven J. Vaughan-Nichols. Some signs of AI model collapse begin to reveal themselves, The Register, May 27, 2025.

Zeynep Tufekci, For One Hilarious, Terrifying Day, Elon Musk’s Chatbot Lost Its Mind, The New York Times, May 17, 2025.

Debra Cassens Weiss, AI-hallucinated cases end up in more court filings, and Butler Snow issues apology for ‘inexcusable’ lapse, ABA Journal, May 27, 2025.

Thomas Claburn, AI hallucinates software packages and devs download them – even if potentially poisoned with malware, The Register, March 28, 2024.

Matt Burgess, This Hacker Toool Extracts All the Data Collected by Windows' New Recall AI, Wired, June 4, 2024.

Bhumika Biyani, FIRST AI VIRUS Morris II: A new threat to GEN AI and RAG Systems, Medium.com, May 29, 2024, .

Anthony Cuthbertson, AI worm that infects computers and reads emails created by researchers, The Independent, March 4, 2024.

Tom’s Hardware. AI worm infects users via AI-enabled email clients — Morris II generative AI worm steals confidential data as it spreads, Tom’s Hardware.

Security Intelligence. Researchers develop malicious AI ‘worm’ targeting generative AI systems., Security Intelligence.

Hugh Son, Fake job seekers are flooding U.S. companies that are hiring for remote positions, tech CEOs say, April 8, 2025

Last modified May 28, 2025.
recycled pixels