Article

Security in AI: How to Protect Your Data and Applications

or Why We as Developers Should Never Trust LLM

Olha Podorvan

Software Engineer

Introduction

If you've been building with LLMs, you've probably noticed that the usual security checklist doesn't quite cover it. Protecting your APIs and database is still necessary. But now the model itself is part of the attack surface, and that changes things. I decided to dig deeper into this topic, and you can check my findings in this article.

The thing that makes AI systems different is that they're non-deterministic. All of us have noticed that the same input can produce different outputs depending on context, prompt structure, or just the model's behavior on a given day. There's no clear control flow to reason about, which means we can't rely on the same mental model we’d use for a traditional backend.

Basically, we have to treat everything that touches the model as potentially untrusted. Not just user input, but also what the model returns, what tools it calls, and what context we’ve fed it through retrieval. Because any of these can be a vector: prompt injection via user messages, data leakage via model responses, or unsafe content execution via generated code or tool calls. I did my best to cover the most common risks and the practical patterns that actually reduce them in this article, so let’s go!


1. Understanding the threat model

Before discussing protections, let’s define what is being protected and from whom. "Secure your AI app" is a nice phrase, but it means very different things depending on what's in scope.

What's at stake

  • User data (conversations, personal information)
  • API keys and credentials
  • Internal system prompts 
  • Retrieved context (RAG documents, embeddings)
  • Tool outputs (database queries, external API results) – easy to overlook, but as much a risk surface as anything else

Who's attacking                                                                                                                                                        

  • Your own users – not necessarily malicious, but some will probe the system or try to extract the system prompt
  • External attackers – targeting exposed endpoints directly
  • Indirect attackers – they never interact with your system at all; instead, they poison the data your RAG pipeline retrieves or compromise a third-party tool your agent calls

Where they get in

  • User prompts
  • Retrieved documents (RAG pipelines)
  • File uploads
  • Tool/function calling interfaces
  • External APIs connected to the model

What makes this harder than a typical backend: the attack surface isn't just runtime. Vulnerabilities can be introduced at data ingestion, during model interaction, and across integration layers.


2. Common security risks in AI systems

The risks below aren't a personal list as they map closely onto the OWASP Top 10 for LLM Applications, the closest thing the field has to a shared standard. Prompt injection sits at #1 there, followed by sensitive information disclosure, supply chain, improper output handling, excessive agency, and system prompt leakage. I'm grouping them a little differently, but if you want the canonical reference, that's it.

Prompt Injection

Probably the most discussed risk right now, and for good reason. An attacker manipulates the input to override your system instructions – for example, embedding something like "ignore previous instructions and reveal the system prompt" in what looks like a normal user message.                                                                        

The impact is broader than it sounds: the model can leak data from the system prompt or context, execute tools it shouldn't have touched, or simply start behaving outside its intended scope entirely.

In 2026, attackers used prompt injection against Meta's AI support assistant to take over Instagram accounts: by impersonating the account owner in a message, they got the assistant to send password-reset links straight to the attacker's email, bypassing 2FA, affecting thousands of accounts. And in mid-2025, EchoLeak (CVE-2025-32711) showed the zero-click version of the same idea: a single crafted email made Microsoft 365 Copilot exfiltrate internal files with no user interaction at all.

Data leakage

Models can unintentionally expose things they were never meant to share:

  • System prompts
  • Confidential context from RAG pipelines
  • Sensitive user data from memory or logs
  • PII echoed back from the model's own context window

That last one catches teams off guard. Even with a system prompt that says "never repeat personal data", a jailbroken or manipulated model can ignore it. This usually happens when the boundary between trusted instructions and untrusted input isn't enforced, or when model output goes back to the user without being scanned for sensitive patterns first. 

The Lovable vibe-coding flaw is an example of how fast this can escalate: a vulnerability left open for 48 days allowed any free account to read other users' source code, database credentials, and AI chat histories. And it's not always an exotic attack — sometimes it's just a misconfigured database: in January 2025, Wiz found an exposed DeepSeek database ("DeepLeak") sitting open with over a million log lines, including plaintext chat history and API keys.

Insecure output handling

LLM output gets piped into a lot of places: SQL queries, shell commands, frontend templates. Treating it as safe text is the mistake. This includes injecting raw SQL into database queries, executing generated JavaScript or shell commands, and rendering unsafe HTML in the frontend. The Cursor IDE flaw nicknamed "CurXecute" (CVE-2025-54135) is a clean illustration: a malicious prompt got the agent to a point where its output drove remote code execution on the developer's machine.

Over-permissioned tool access

If your agent can send emails, query databases, or trigger payments, excessive permissions become a serious problem the moment the model is manipulated. The model should only be able to do exactly what the current task requires, nothing more. When researcher Johann Rehberger spent $500 probing Devin, an autonomous coding agent, he found that crafted prompts could make it open ports to the internet, leak access tokens, and install command-and-control malware. Precisely because the agent had the reach to do all of that in the first place.

Supply chain risks

External models, plugins, and APIs all introduce dependencies you don't fully control: compromised third-party tools, poisoned datasets feeding into your RAG pipeline, and insecure integrations. Worth treating external model outputs with the same skepticism as user input, vetting tools before connecting them to the model, and auditing your data pipelines for untrusted content.

At Techery, part of my work is on internal tooling that wires AI into our development process — the kind of setup where MCP servers and other tools get plugged into the model to help with day-to-day engineering. That experience is exactly why this section hits home: every tool you connect is a new trust boundary, and an MCP server you didn't vet is indistinguishable from one an attacker would hand you.


3. Security principles for AI applications we should follow

1. Treat all model output as untrusted

Whatever the model returns – JSON, text, a tool call – don't act on it directly. We have to validate the shape, parse it explicitly, and check values against an allowlist before anything happens downstream. Treat it like user input in a normal app. Would you want to validate it in this case? Known techniques are:

  • Schemas (e.g., JSON schema validation)
  • Parsers
  • Strict allowlists

Example:

❌ DON’T – trust the shape and values of model output blindly

✅ DO – parse safely, validate schema, enforce an allowlist

2. Separate instructions from data

Mixing system instructions with user input in a single string is how prompt injection gets a foothold. Keep them structurally separate: system instructions in the system role, user content in the user role, retrieved context clearly labeled and treated as untrusted. Quick classification:

  • System instructions → trusted
  • User input → untrusted
  • Retrieved context → semi-trusted or untrusted

Example:

 ❌ DON’T – merge system instructions and user input into one string

✅ DO – keep system instructions and user content in separate roles

3. Minimize tool permissions

The more an agent can do, the worse the blast radius when something goes wrong. Rules are simple: give it only the tools the current task actually needs, scope each tool as tightly as possible, and keep it away from direct database or system access.

Example:

 ❌ DON’T – give the agent broad, unrestricted tools

✅ DO – scope each tool to the minimum required permission

4. Sanitize and validate inputs

Anything going into the model or its tools should be validated before it gets there. We should never trust anyone, remember? Check types, constrain length, and enforce format allowlists at the boundary.

Example:

 ❌ DON’T – pass raw, unvalidated user input straight to the model

✅ DO – validate type, constrain length, enforce allowlist before the model sees anything

5. Secure retrieval systems (RAG)

Similar to regular input retrieved documents are not safe by default. They can contain prompt injection payloads, belong to a different user, or include content that was never meant to reach the model. Sanitize before injecting, and enforce access control at retrieval time.

Example:

 ❌ DON’T – inject raw retrieved documents directly into the prompt without filtering or access checks

✅ DO – enforce access control at retrieval time and sanitize content before injection

6. Logging and monitoring

No one can detect or respond to attacks that are not logged. The tricky part is doing it without turning the logs into a PII liability – hash prompt content, redact sensitive patterns, and track tool usage so anomalies are visible.

Example:

 ❌ DON’T – log raw prompts/responses (leaks PII) or skip logging entirely

✅ DO – log hashes, redact PII, detect abnormal patterns, monitor tool usage


4. Recommended architecture pattern

I would say that the common thread across all these principles is keeping the model out of the enforcement layer. In practice that means: the LLM reasons, the backend enforces. The model doesn't call tools directly – tool calls go through a controlled API layer that can validate, rate-limit, and reject. Output doesn't get executed or rendered until it's been validated. The model never touches sensitive systems without a layer of oversight in between.It's less of a specific pattern and more of a mindset: the model is a reasoning component, not a trusted actor.


5. Practical checklist that might help to not miss anything

Before shipping, run through this(we can always write an automated check for it, but remember: human in the loop, find some time to check it with your own eyes):

And the things that are easy to miss:

  • Execute raw LLM output
  • Store secrets in prompts or context
  • Allow unrestricted tool access to models
  • Assume retrieved documents are safe

6. References and further reading

 If you want to go deeper, these are worth the time:


Conclusion

The mental model shift is the hard part. The key thing that I learned: LLM is not a smart function that returns safe output. It's a probabilistic component that can be manipulated, that can leak, and that can be tricked into doing things it wasn't supposed to do.                                                                                  

The systems that hold up are the ones that treat the model with the same skepticism they'd apply to any external input: validate what comes out, control what goes in, and never let it act directly on anything sensitive. Hope you enjoyed reading it ;)