Most static security tools find bugs in isolation. They scan one file, list the issues, and move on. The problem is that the most damaging vulnerabilities in modern codebases are rarely a single bug. They’re a chain: a hardcoded signing key plus a missing authorization check plus a SQL injection that, on their own, all look manageable. Together they’re an account-takeover path. This is exactly the kind of cross-cutting reasoning LLMs are good at, if you give them the right structure. In this article, we’ll build a two-agent security code reviewer using Python and the Venice AI API. By the end, you’ll have a CLI you can point at any Python codebase to produce a Markdown report with atomic findings and exploit chains. Interested in the full code implementation? Check out the GitHub repo. Before we continue, you’ll need a Venice API key. Export it as an environment variable:Documentation Index
Fetch the complete documentation index at: https://docs.venice.ai/llms.txt
Use this file to discover all available pages before exploring further.
What We’re Building
The reviewer is a small Python project with a few clear parts:| Part | What it does |
|---|---|
| Pydantic models | Define Evidence, Finding, and Chain, and give us a hard validation boundary between the LLM and the rest of the program |
| Venice client | Wraps the OpenAI Python SDK pointed at Venice’s OpenAI-compatible endpoint |
| AST repo map | Walks the target tree with Python’s ast module and builds a deterministic map of every module’s public symbols and import edges |
| Scanner agent | Reads one Python file at a time plus a per-file neighbourhood slice of the repo map, and emits atomic vulnerability findings with file:line evidence |
| Chainer agent | Reads the union of findings plus a condensed full repo map, and emits exploit chains that combine two or more findings |
| Reference validator | Drops any chain that references a finding ID the Scanner did not produce, or names a file none of its referenced findings actually came from |
| Markdown report | Renders findings and chains into a human-readable report |
| CLI | Wires everything together with Typer |
- Walk the target directory for
.pyfiles. - Build a deterministic repo map (imports, public symbols, signatures).
- For each file, send the Scanner its source plus a per-file neighbourhood slice of the map and collect atomic findings.
- Send the union of findings plus the condensed repo map to the Chainer and collect exploit chains.
- Drop any chain that references a finding ID the Scanner did not produce, or that names a file none of its referenced findings actually came from.
- Write a Markdown report.
ast and build a structural map. The Scanner sees a per-file neighbourhood (who imports from this file, what this file imports, signatures of those external symbols). The Chainer sees a condensed full map (every module, every public symbol, every import edge, no source). That’s the smallest amount of context engineering we have found that lets the Chainer construct chains whose data flow crosses module boundaries, without paying the token cost of stuffing the whole codebase into every prompt.
Pre-requisites
- Python 3.12+
- A Venice API key from venice.ai
- Basic familiarity with Pydantic, Python’s
astmodule, and the OpenAI Python SDK
uv for dependency management, but a regular virtual environment works just as well.
Setting Up the Project
Create a new project and install the dependencies:pip, create a virtual environment instead:
.env file for local development:
src/venice_security_reviewer/ to keep it importable as a package, with prompts under prompts/ at the repo root so they can be reviewed and diffed like any other source artefact:
Setting Up the Venice Client
Venice is OpenAI-compatible, so we can use the official OpenAI Python SDK and just point itsbase_url at Venice. Centralising the client construction in one file means the rest of the code never has to know which provider it’s talking to: swapping backends would only touch this one module.
Create src/venice_security_reviewer/client.py:
- We default to
zai-org-glm-5because it’s a strong general-purpose Venice model, but you can override it with theVENICE_MODELenvironment variable. For larger or more nuanced codebases, swapping in a stronger model can make the Chainer notably better at narrative quality. build_clientreturns the client and the model id, so callers don’t have to read environment variables themselves and tests can inject a fake config without monkeypatching.
Defining the Data Models
The whole point of using Pydantic here, rather than passing raw dicts around, is that we get a hard validation boundary between the LLM and the rest of the program. If the model returns malformed JSON or invents a finding ID that doesn’t exist, parsing fails loudly and we never propagate the hallucination into the report. Createsrc/venice_security_reviewer/models.py:
Finding.idandChain.idare constrained to a regex likeF-001,C-001. If the model gets creative with the format, validation fails.Chain.findingsrequires at least two entries: a “chain” of one finding is just a finding.Chain.severityis restricted tohighorcritical. A combination of findings that doesn’t raise the impact above the highest individual severity isn’t a chain worth reporting.Evidenceenforces thatend_line >= start_lineso the model can’t return nonsensical line ranges.
models.py:
Building the AST Repo Map
The repo map is the structural skeleton of a Python codebase: every module’s public surface, every import edge, and a reverse index from “module M” to “modules that import from M”. It’s built once per scan run with Python’sast, never via execution, so it’s safe to run on adversarial code: the parser doesn’t import or invoke anything from the scanned tree.
We’ll consume the map in two shapes. The Scanner gets a per-file neighbourhood slice so its prompts stay bounded in size. The Chainer gets a condensed full map so it can construct chains across files.
Create src/venice_security_reviewer/repo_map.py and start with the Pydantic models that describe the map:
__all__ list if one is present. Function signatures and class headers get rendered as compact strings the LLM can read directly:
_SIGNATURE_CHAR_CAP of 200 preserves typical real signatures (including type hints) while preventing pathological cases like a 200-line typed union from blowing up the prompt.
Next, the extractor that pulls the structural data out of a parsed module. We handle ast.FunctionDef, ast.ClassDef, top-level ast.Assign and ast.AnnAssign for constants, and both ast.Import and ast.ImportFrom for the import edges. Relative imports get resolved into their absolute dotted form so the Chainer can match them against module names later:
tree.body and emits SymbolDef and ImportEdge entries for each top-level node. The reference repo’s _extract function in repo_map.py covers the full implementation. The shape that comes out is a list of ModuleEntry objects, one per file.
The interesting part is what we do with those entries. Wrap them in a RepoMap with two consumer-facing methods:
neighborhood(path) is what the Scanner calls for each file. It returns a ModuleNeighborhood object containing the module itself, every other module that imports from it, and every in-repo symbol it imports from elsewhere (with their resolved signatures). That gives the Scanner enough context to flag findings that are only obvious in cross-file context, without dragging the whole codebase into the prompt.
condensed_dict() is what the Chainer gets. Snippets and signatures are dropped; only paths, module names, public exports, and import edges remain. That’s the smallest representation that still lets the Chainer reason about cross-module data flow.
Finally, the entry point that builds the whole thing:
Writing the Scanner Agent
The Scanner walks a target path, picks up Python source files, and asks Venice to identify atomic vulnerabilities one file at a time. Per-file scanning keeps the prompt small and makes failures isolated: one bad file doesn’t kill the whole run. We’ll keep the prompt itself in a separate file so it can be reviewed and diffed like any other source artefact. Createprompts/scanner.md:
- We tell the model to emit JSON only, with no prose or fences. The OpenAI SDK supports a
response_format={"type": "json_object"}parameter that enforces this on the API side, but reinforcing it in the prompt cuts down on edge cases. - We explicitly forbid the Scanner from producing cross-file chains. Chains are the Chainer’s job, and asking the Scanner to do both blurs the responsibility.
- We require the snippet to be copied verbatim. This means the report can quote the exact bytes the model claims to have seen, and a reviewer can spot-check a finding by comparing the snippet to the source.
src/venice_security_reviewer/scanner.py and start with the file walker and prompt loader:
MAX_FILE_BYTES is a safety cap. Beyond ~200 KB we skip rather than send a huge prompt that’s likely to be both expensive and low quality.
The next piece is the prompt builder. The template uses {filename}, {source}, and {neighborhood} as placeholders; we use str.replace rather than .format() because the template contains JSON examples with literal braces:
F-001 per file, but the Chainer needs to reference findings across the whole repo. We re-issue the IDs against a monotonic counter so they’re globally unique:
response_format={"type": "json_object"} and a low temperature, and parse the result:
- We patch the evidence file path to be relative to
repo_rootafter parsing, since the model echoes back whatever filename we gave it but we want a single canonical form throughout the report. temperature=0.1is intentionally low. We want the Scanner to be conservative and consistent across runs; creativity is the Chainer’s job.
Writing the Chainer Agent
The Chainer takes the union of Scanner findings plus the condensed repo map and asks Venice whether any of the findings combine into a real exploit chain. Two deterministic guardrails sit between the LLM and the report:- Every chain must reference only finding IDs the Scanner produced.
- Every chain must claim only files that at least one referenced finding’s evidence touches.
prompts/chainer.md. The core of it looks like this:
files_involved, and crucially, when not to chain. Telling the model “it is correct and expected for many codebases to have findings that do not chain” is what stops it from inventing chains to look productive.
Now the agent code. Create src/venice_security_reviewer/chainer.py:
MAX_REPO_MAP_CHARS = 8000 is a soft ceiling for the JSON-rendered repo map block in the Chainer prompt. At roughly 4 chars per token, that’s ~2000 tokens, which sits comfortably inside any Venice model’s context window even with findings and the narrative budget on top.
We serialise findings into a compact JSON block. Note we strip the snippet from evidence here on purpose: the Chainer doesn’t need raw bytes to decide whether two findings combine, and including them roughly doubles the token cost on real codebases:
_pruned, _kept, and _total markers, so the Chainer prompt can warn the model when the map has been trimmed.
Parsing the response is the same shape as the Scanner: deserialise, validate each chain through Pydantic, drop malformed entries:
- We bail out before calling the model when there are fewer than two findings. You can’t chain a single finding, and skipping the call means we don’t burn tokens on a guaranteed-empty result.
temperature=0.2is slightly higher than the Scanner’s0.1. The Chainer benefits from a touch more creativity to spot non-obvious combinations, but we still want it grounded in the findings and map it was given.- After parsing,
validate_chain_referencesruns the deterministic cross-reference check we wrote earlier. Anything that survives is safe to render; anything that doesn’t gets logged so the operator knows the model tried to invent something.
Rendering the Markdown Report
Keeping rendering separate from agent logic means the sameFinding and Chain objects can later be fed into other formats (JSON, SARIF, HTML) without touching the Scanner or Chainer.
We’ll use Jinja2 with a small template file. Create src/venice_security_reviewer/templates/report.md.j2:
src/venice_security_reviewer/report.py:
.html templates by extension.
Wiring the CLI
The CLI is the orchestrator: build the repo map, scan, chain, render. We’ll use Typer to handle argument parsing and Rich to print a nice summary table. Createsrc/venice_security_reviewer/cli.py:
pyproject.toml:
Testing the Guardrails
We’ve leaned hard on one idea throughout this build: the deterministic guardrails are what separate a useful security tool from a confidently wrong one. That claim is only worth making if we can prove the guardrails actually hold, so the most valuable tests in this project don’t call Venice at all. They lock down the Pydantic boundary and the prompt-assembly plumbing, which means they run offline, in milliseconds, with no API key and no token cost. Add the dev dependencies first:tests/test_models.py:
F-### pattern, and a “chain” of a single finding. If any of them ever stops raising, a whole class of hallucination has quietly become possible again.
The most important test covers the cross-reference validator, since that’s the function that actually drops invented chains:
F-999 was never produced by the Scanner, so the chain that references it lands in dropped and never reaches the report. The companion test in the reference repo, test_validate_chain_references_drops_unknown_files, does the same for a chain that claims a file none of its findings came from.
The second thing worth testing is the plumbing that feeds the Chainer. It’s easy to refactor the prompt assembly and silently stop passing cross-file context, at which point the Chainer would keep working but quietly get worse. This test builds a two-module fixture, renders the prompt, and asserts the cross-file information is actually present, again without a Venice round-trip. Create tests/test_cross_file_chain.py:
tests/test_scanner_parse.py, tests/test_chainer_parse.py, and tests/test_repo_map.py, which cover JSON parsing edge cases (malformed entries getting dropped rather than crashing the run) and the AST repo map builder.
Running the Project
To try it on a real codebase, point the CLI at a directory of Python source:pip install -e . and run venice-security-reviewer scan path/to/your/code.
The output looks roughly like this:
examples/vulnerable_app— a multi-file Flask app with three “low” findings, two of which combine into a critical privilege-escalation chain across files. Tests whether the Chainer is selective about what it combines.examples/url_preview— a multi-file URL-fetcher with a defensive allowlist that doesn’t apply per-iteration. Tests cross-file data-flow reasoning combined with deployment topology (link-local IPs are cloud-credential gateways).examples/csv_query— a single-file CSV filter with anevalsandbox escape via__class__.__base__.__subclasses__(). Tests language-level reasoning rather than HTTP flow.examples/webhook_handler— a single-file HMAC verifier with a JSON parser-differential vulnerability. Tests reasoning across multiple specifications.
chainer referenced N unknown finding id(s) or file(s); chains dropped, that’s the cross-reference validator catching the model in the act of inventing a chain. The dropped chains never make it into the report; you just get a warning that you can use to adjust the prompt or sample additional Chainer runs.
Extending This Example
The two-agent shape generalises well. A few directions worth exploring:- More languages. The Scanner is language-agnostic at the prompt level; the AST builder is what’s Python-specific. Swap in
tree-sitterand you can build the same neighbourhood/condensed-map shapes for TypeScript, Go, or Rust. - A third agent for fixes. Once you have a chain, asking a Patcher agent to draft a unified diff that defangs one of the constituent findings is a small step. Pydantic-validate the diff against the same evidence-file set and you get the same hallucination guard for free.
- Output formats.
render_reportis the only place that knows about Markdown. Add a SARIF renderer and the same findings can drop into GitHub code scanning. Add a JSON renderer and you can pipe results into a downstream system. - Caching by file hash. The Scanner’s per-file calls are independent and idempotent. Caching by
(file_hash, prompt_hash, model)means re-scanning a repo where one file changed only re-runs the Scanner on that one file. - Sampling for the Chainer. For high-stakes runs, call the Chainer N times at slightly higher temperature and intersect the results. Chains the model finds consistently are more likely to be real; chains it finds once and never again are likely noise.
- Stronger models.
zai-org-glm-5is the default because it strikes a good balance between cost and quality for combinatorial reasoning, but for harder codebases swapping in a stronger Venice model (set viaVENICE_MODEL) can make the Chainer’s narratives noticeably tighter.