CodeGuard benchmarks code generated by AI coding tools against 132 real-world CVE scenarios with Docker PoC dynamic validation, producing quantitative security scores, and detects CVE vulnerability patterns introduced in GitHub PRs in real time with zero false positives.
Request commercial licenseEvaluation plus defense — a complete security loop for AI coding tools
Score code generated by tools like GPT-4o, Claude, Gemini and Copilot. 132 real-world CVE scenarios plus Docker PoC dynamic validation produce quantitative, reproducible, and comparable security reports.
When a developer (or AI tool) opens a PR, CodeGuard automatically checks whether it introduces known CVE patterns. Every alert is validated by a Docker PoC — zero false positives. Results are written back as commit status and PR comments.
Real vulnerabilities, dynamic validation, zero false positives — the non-negotiables
Every CVE instance ships with a dedicated Docker image, an executable PoC exploit, and functional tests. Not static analysis — the code actually runs and we check whether the vuln is still there.
AST-level code structure analysis layered on top of BM25 full-text search. Supports 6 languages (C/C++/Java/Python/PHP/JS) for precise context completion.
Fixes the original scoring bug where failing every case still yielded a perfect score. Critical vulns carry far more weight than Lows, so the score actually reflects risk.
Supports Claude Code, Gemini CLI, OpenAI Codex, Aider and other mainstream agent code-generation frameworks with unified AgentMetrics behavior tracking.
Point your webhook at POST /webhook/github and every PR triggers a scan. Results are pushed back as commit status plus PR comments — zero-touch integration for developers.
Docker images are built once and reused. Incremental scans track vulnerability file mtime — per-PR scan time drops from minutes to seconds.
132 real-world CVEs drawn from 51 production GitHub projects
A safety net for code in the AI programming era
Before buying an AI coding assistant, use CodeGuard to run a cross-vendor security benchmark. Data-driven decisions — don't only look at code quality, look at the security floor.
Run the security benchmark continuously while iterating. Ensure each new model release does not regress on security, with auto-generated comparative reports that quantify the change.
Connect to a GitHub webhook and scan every PR for CVE patterns. With zero false positives, it can act as a hard CI gate — a real security checkpoint, not noise.
A standardized benchmark for security teams to compare models, prompt strategies, and context scopes. All 132 scenarios are reproducible and extensible.
Want to deploy an AI code security evaluation platform inside your company, or wire CVE regression detection into your GitHub workflow? Reach out for a commercial license and deployment plan.
Contact usCodeGuard is built as a deep enhancement on top of the open-source Tencent/AICGSecEval (A.S.E) framework and is licensed under Apache-2.0. We thank Tencent Security Platform Department and the partner universities (Fudan, Peking, SJTU, Tsinghua, Zhejiang) for the original dataset and research contributions.
A transparent proxy for LLM API calls that compresses redundant context — one-line integration cuts usage costs by 30-60%.
A hierarchical knowledge database designed for AI agents with L0/L1/L2 summaries, MCP integration and session memory.
An AI legal department for small businesses — review, risk scoring, plain-language translation, drafting, translation and legal Q&A in one.