AI Code Generation in 2025: Beyond Copilot and Cursor
AI code generation has moved beyond a single vendor. This guide catalogs modern tools, integration patterns, evaluation methods, and governance controls for enterprise use.
Executive Summary
This guide delivers a production blueprint for AI code generation beyond Copilot and Cursor: system architectures, editor integrations, repo-scale assistants, tool-calling, static analysis and SAST, test generation, refactoring, CI/CD gates, evaluation, costs and latency, security, and governance.
System Architectures
Single-Agent Inline Assistant
graph TD
E[Editor] --> G[Gateway]
G --> M[LLM]
M --> E
- Pros: low latency, minimal infra
- Cons: narrow context; fewer repo-wide insights
Planner/Executor with Tools
graph TD
U[User] --> P[Planner]
P -->|"lint, test, grep"| T[Tools]
T --> X[Executor]
X --> R[PR/Commit]
P --> M[LLM]
Multi-Agent Repo Assistant
graph LR
PM[Project Manager] --> ARCH[Architect]
ARCH --> DEV[Coder]
DEV --> QA[Tester]
QA --> SEC[Security]
SEC --> PM
- PM: task breakdown, acceptance criteria
- Architect: design, interfaces, patterns
- Coder: implement diffs
- Tester: generate tests, run suite
- Security: SAST/secret/dangerous API checks
IDE/Editor Integrations
VS Code
{
"contributes": {
"commands": [{"command": "gen.suggest", "title": "AI: Suggest"}],
"keybindings": [{"command": "gen.suggest", "key": "cmd+shift+g"}],
"configuration": {
"properties": { "gen.endpoint": { "type": "string" } }
}
}
}
vscode.commands.registerCommand('gen.suggest', async () => {
const editor = vscode.window.activeTextEditor
const text = editor?.document.getText(editor.selection) || editor?.document.getText()
const resp = await fetch(getConfig('gen.endpoint'), { method: 'POST', body: JSON.stringify({ text }) })
const suggestion = (await resp.json()).text
editor?.edit((e) => e.insert(editor.selection.end, suggestion))
})
JetBrains (IntelliJ) Action
class GenerateAction: AnAction() {
override fun actionPerformed(e: AnActionEvent) {
val project = e.project ?: return
val editor = e.getData(CommonDataKeys.EDITOR) ?: return
val text = editor.selectionModel.selectedText ?: editor.document.text
val suggestion = callGateway(text)
WriteCommandAction.runWriteCommandAction(project) {
editor.document.insertString(editor.caretModel.offset, suggestion)
}
}
}
Vim
command! -range=% AICode :<line1>,<line2>w !curl -s -X POST http://localhost:8080/gen -d @-
Repo-Level Code Generation
- Read
WORKSPACE/pnpm-workspace.yaml/lerna.jsonfor monorepos - Build a repo graph (packages, dependencies)
- Summarize APIs, types, and code style rules
interface RepoSummary { packages: string[]; deps: Record<string,string[]>; codeStyle: any }
# Generate symbols index
ctags -R -f tags .
Prompt Library for Coding Tasks
{
"implement_function": "Implement the function. Return ONLY code inside one code block.",
"refactor_module": "Refactor to improve readability and testability. Keep public API stable.",
"add_tests": "Add unit tests with high coverage. Return tests only.",
"migrate_version": "Migrate from vX to vY. Update APIs and configs."
}
Tool-Calling APIs
export async function runLint(path = "."){ return $`pnpm eslint ${path} --format json`.json() }
export async function runTest(){ return $`pnpm test -- --json`.json() }
export async function runBuild(){ return $`pnpm build`.exitCode }
export async function runFormat(){ return $`pnpm prettier -w .`.exitCode }
Thought: run lint
Action: runLint{"path":"apps/web"}
Observation: 3 errors missing deps
Thought: fix imports
Action: createCommit{"message":"fix: add missing deps"}
Static Analysis and SAST
name: sast
on: [pull_request]
jobs:
semgrep:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: returntocorp/semgrep-action@v1
codeql:
uses: github/codeql-action/init@v3
rules:
- id: no-eval
patterns: ["eval(""]
message: Avoid eval()
languages: [javascript]
Test Generation
Unit/Property Tests (JS)
import fc from 'fast-check'
describe('sum', () => {
it('commutative', () => {
fc.assert(fc.property(fc.integer(), fc.integer(), (a, b) => sum(a, b) === sum(b, a)))
})
})
Python Pytest Example
def test_parse_date():
assert parse_date("2025-10-27").year == 2025
E2E (Playwright)
test('login', async ({ page }) => {
await page.goto('/login'); await page.fill('#email','a@b.com'); await page.fill('#pwd','x');
await page.click('text=Login'); await expect(page).toHaveURL('/dashboard')
})
Refactoring Assistant
Refactor to smaller functions, descriptive names, and remove dead code. Keep tests passing.
export function proposeRefactor(code: string){
return callModel({ prompt: `Refactor this code for readability and testability:\n\n${code}\n\nReturn ONLY the refactored code.` })
}
Migration Playbooks
- React 17 → 18:
createRoot, concurrent features - Node 16 → 20: test runners, ESM, fetch
- TypeScript 4.x → 5.x:
satisfies, decorators, config tightening
Checklist:
- Update deps and peer deps
- Fix breaking API changes
- Run tests and lint, update CI
CI/CD Gates
name: codegen-ci
on: [pull_request]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: pnpm i --frozen-lockfile
- run: pnpm -w build
- run: pnpm -w test -- --ci
- run: pnpm -w eslint . --max-warnings 0
Evaluation Harness for Code Tasks
CASES = [
{"id":"impl-001","prompt":"Implement fib(n)...","grader":"pytest -q"},
{"id":"ref-002","prompt":"Refactor module X","grader":"eslint --max-warnings 0"}
]
python eval/run.py --suite eval/cases.json --model http://tgi:8080 --out report.json
Offline Datasets
- HumanEval, MBPP, CodeSearchNet, APPS (licenses vary)
- Create internal corpora from solved tickets and diffs
Cost and Latency Calculators
const pricing = { "gpt-4o-mini": { in: 0.000005, out: 0.000015 } }
export function costUSD(model: string, inTok: number, outTok: number){ const p = pricing[model]; return inTok*p.in + outTok*p.out }
export function tps(tokens: number, seconds: number){ return tokens/seconds }
Caching and Retrieval over Code
import { readFileSync } from 'fs'
export function codeContext(paths: string[]){ return paths.map(p=>({ path: p, content: readFileSync(p,'utf8') })) }
// embeddings for code
const embed = await embedModel.encode(snippet)
store.upsert({ id: filePath, vector: embed, metadata: { lang: 'ts', symbols: ['sum'] } })
Code Search and Embeddings
export async function searchCode(query: string){
const q = await embedModel.encode(query)
const hits = await store.search(q, { topK: 20, filter: { lang: 'ts' } })
return hits
}
Security and Secret Handling
- Never include secrets in prompts
- Use server-side retrieval for tokens; scope and rotate
- Secret scanning in CI and pre-commit
gitleaks detect -v
Policy-as-Code
package codegen
deny["no_eval"] { input.code contains "eval(" }
Troubleshooting
- Incorrect imports: run lint autofix; search symbols
- Type errors: regenerate with types visible; add explicit interfaces
- Flaky tests: seed RNG; stabilize network calls
JSON-LD
Related Posts
- Prompt Engineering: Advanced Techniques (2025)
- LLM Fine-Tuning: LoRA and QLoRA (2025)
- LLM Observability (2025)
Extended FAQ (1–120)
-
How to choose models for codegen?
Use small/medium for edits; large for design/refactor. -
Context too small?
Retrieve relevant files and symbols; compress. -
Inline vs repo assistant?
Inline for quick edits; repo assistant for multi-file changes. -
Ensure code compiles?
Run build/lint/test automatically before suggesting merge. -
Flaky tests from AI?
Use deterministic seeds; avoid network; mock time. -
Per-language support?
Language servers + prompts for idioms. -
Code style alignment?
Infer from repo; run formatter. -
Monorepo awareness?
Workspace configs; cross-package imports. -
Proprietary libs?
Index docs; few-shot examples. -
Secrets in code?
Pre-commit scanning; block merges. -
Security in codegen?
SAST; safe APIs; avoid dangerous patterns. -
Licensing?
Respect licenses; track provenance. -
Refactor safety?
Run tests; incremental diffs. -
Migration risks?
Feature flags; fallbacks. -
Can it write docs?
Yes—generate READMEs and docstrings. -
Can it write tests first?
Yes—TDD loop with AI assistance. -
Tool orchestration?
Planner decides; executor runs; verify. -
IDE latency?
Stream tokens; prefetch context. -
How to measure wins?
PR cycle time, defects, coverage, pass rate. -
Cost control?
Cache and route models; cap tokens.
... (add 100 more detailed Q/A on repo indexing, embeddings, test generation, refactors, CI gates, security, observability, and rollout)
Repo Graph Builders
import fg from 'fast-glob'
import { readFile } from 'fs/promises'
export async function buildRepoGraph(root = '.'){
const files = await fg(['**/*.{ts,tsx,js,jsx,py,go,rs,java}', '!**/node_modules/**', '!**/build/**'], { cwd: root })
const nodes = [] as { path: string; imports: string[] }[]
for (const f of files){
const src = await readFile(`${root}/${f}`, 'utf8')
nodes.push({ path: f, imports: extractImports(src) })
}
return { nodes }
}
Language Server Protocol (LSP) Integration
import * as lsp from 'vscode-languageserver/node'
const connection = lsp.createConnection(lsp.ProposedFeatures.all)
connection.onInitialize(() => ({ capabilities: { textDocumentSync: lsp.TextDocumentSyncKind.Incremental } }))
connection.onRequest('codegen/suggest', async (params) => {
const suggestion = await callGateway(params)
return { text: suggestion }
})
connection.listen()
AST and Codemod Frameworks
ts-morph (TypeScript)
import { Project, SyntaxKind } from 'ts-morph'
const project = new Project({ tsConfigFilePath: 'tsconfig.json' })
for (const sf of project.getSourceFiles()){
sf.forEachDescendant((n) => {
if (n.getKind() === SyntaxKind.CallExpression && n.getText().startsWith('eval(')){
n.replaceWithText('// eval removed')
}
})
}
await project.save()
jscodeshift (JS)
module.exports = function(file, api){
const j = api.jscodeshift
return j(file.source)
.find(j.CallExpression, { callee: { name: 'oldFn' }})
.replaceWith(p => j.callExpression(j.identifier('newFn'), p.value.arguments))
.toSource()
}
libCST (Python)
import libcst as cst
class Visitor(cst.CSTTransformer):
def leave_Call(self, node, updated):
if getattr(node.func, 'value', None) == 'os' and getattr(node.func, 'attr', '') == 'system':
return cst.parse_expression('subprocess.run')
return updated
go/ast (Go)
ast.Inspect(f, func(n ast.Node) bool {
ce, ok := n.(*ast.CallExpr)
if ok {
if fmt.Sprintf("%s", ce.Fun) == "exec.Command" {
// replace or flag
}
}
return true
})
Rust (syn)
let file: syn::File = syn::parse_str(&code)?;
for item in &file.items { /* walk AST */ }
Tree-sitter Queries
((call_expression function: (identifier) @fn-name)
(#eq? @fn-name "eval"))
import Parser from 'web-tree-sitter'
Structured Diffs and Patch Application
interface Patch { file: string; before: string; after: string }
export function applyPatches(patches: Patch[]){
for (const p of patches){
const src = readFileSync(p.file, 'utf8')
if (!src.includes(p.before)) throw new Error('context not found')
writeFileSync(p.file, src.replace(p.before, p.after))
}
}
PR Bot Workflows
name: pr-bot
on: [pull_request]
jobs:
review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: node bots/reviewer.js ${{ github.event.pull_request.number }}
// reviewer.js
const diffs = await getPullRequestDiff()
const comments = await callLLM({ prompt: `Review these diffs: ${diffs}` })
await postReviewComments(comments)
Code Smell Detectors
export const smells = [
{ id: 'long-function', detect: (code: string) => /function[\s\S]{200,}/.test(code) },
{ id: 'magic-number', detect: (code: string) => /\b\d{3,}\b/.test(code) },
]
Code Review Heuristics
- Smaller diffs preferred; clear function boundaries
- Proper naming; explicit types; early returns
- Tests covering edge cases; no commented-out code
Security and Supply Chain Protections
syft dir:. -o cyclonedx-json > sbom.json
cosign attest --predicate sbom.json --type cyclonedx registry/app:sha
slsa:
provenance: required
materials: pinned digests
Sandboxed Runners
securityContext:
runAsNonRoot: true
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
// run user tests in isolated container
await $`docker run --rm -v $PWD:/work -w /work node:20 pnpm test -- --ci`
Multi-language Examples
TypeScript
export function chunkArray<T>(arr: T[], size: number){
const out: T[][] = []
for (let i=0;i<arr.length;i+=size){ out.push(arr.slice(i, i+size)) }
return out
}
Python
def chunk(lst, n):
return [lst[i:i+n] for i in range(0, len(lst), n)]
Go
func Chunk[T any](in []T, size int) [][]T {
var out [][]T
for i:=0; i<len(in); i+=size { end := i+size; if end>len(in) { end=len(in) }; out = append(out, in[i:end]) }
return out
}
Rust
fn chunks<T: Clone>(v: &Vec<T>, size: usize) -> Vec<Vec<T>> {
v.chunks(size).map(|c| c.to_vec()).collect()
}
Java
List<List<T>> chunk(List<T> in, int size){
List<List<T>> out = new ArrayList<>();
for (int i=0;i<in.size();i+=size){ out.add(in.subList(i, Math.min(i+size, in.size()))); }
return out;
}
Code Search UI
export function Search(){
const [q,setQ] = useState(''); const [hits,setHits] = useState([])
return (<div>
<input value={q} onChange={e=>setQ(e.target.value)} />
<button onClick={async()=>setHits(await api.search(q))}>Search</button>
<ul>{hits.map(h=> <li key={h.id}>{h.path}</li>)}</ul>
</div>)
}
Embeddings + Rerankers for Code
const hits = await vector.search(await embed(query), { topK: 50 })
const reranked = await crossEncoder.score(query, hits.map(h=>h.snippet))
Retrieval over Docs
const md = await loadMarkdown(['README.md','docs/**/*.md'])
const ctx = selectRelevant(md, query)
Dataset and Evaluation Harness
{
"cases": [
{ "id": "js-sum", "prompt": "Implement sum(a,b)", "grader": "npm test -- sum" },
{ "id": "py-parse-date", "prompt": "Implement parse_date", "grader": "pytest -q" }
]
}
Rollout and Canary of Suggestions
- Shadow: generate suggestions silently; compare acceptance
- Canary: 10% users get new strategy; track metrics
- Rollback: switch off flag on regressions
Human-in-the-Loop Workflows
- Submit suggestions as PRs; developer reviews and edits
- Auto-assign reviewers by code owners
- Collect feedback to improve prompts and rules
Performance Metrics
- Suggestion TTFT, completion latency, acceptance rate
- Post-merge defect rate, test failure rate
- Cost per accepted suggestion
Caching
const cache = new Map<string, string>()
export function cachedSuggest(key: string, fn: ()=>Promise<string>){
if (cache.has(key)) return Promise.resolve(cache.get(key)!)
return fn().then(v => (cache.set(key, v), v))
}
Cost Models
export function costPerSuggestion(tokensIn: number, tokensOut: number, priceIn: number, priceOut: number){
return tokensIn*priceIn + tokensOut*priceOut
}
Observability and Alerting
alerts:
- alert: SuggestionAcceptanceDrop
expr: avg_over_time(codegen_accept_rate[6h]) < 0.25
for: 1h
labels: { severity: page }
Runbooks
- Acceptance drop: check prompts, context retrieval, model routing
- Latency spike: batch size, provider status, cache misses
- Defect spike: stricter tests, smaller diffs, more reviews
Extended FAQ (121–300)
-
How to keep diffs small?
Constrain edits to selected functions; review suggestions. -
How to prefer idiomatic code?
Provide repo examples; lint rules enforce style. -
Can AI rename variables?
Yes—ensure tests and references updated. -
Safe refactors?
Use codemods and AST; run tests. -
Monorepo imports broken?
Resolve workspace paths; update tsconfig paths. -
Multi-language repos?
Route to specialized models; detect language via LSP. -
Performance of code search?
Index once; incremental updates; cache popular queries. -
Inline vs PR suggestions?
PRs for big changes; inline for small edits. -
Accept rate KPI?
Target >30% for suggestions; varies by team. -
Handle binary files?
Ignore; operate on text code only. -
Code ownership?
CODEOWNERS and metadata for reviewers. -
Secret detection?
Gitleaks; block merges. -
SAST false positives?
Suppress with annotations; keep rules tight. -
License headers?
Add if required; templates per repo. -
Editor latency?
Stream tokens; prefetch context. -
GPU vs CPU serving?
GPU for large models; cache to reduce load. -
How to eval code prompts?
Task sets with automated graders. -
Autocomplete vs chat?
Both; chat for complex tasks. -
Prevent dangerous APIs?
AST detection; replace; review. -
How to migrate frameworks?
Codemods and unit tests; stepwise. -
Private registries?
Pin digests; SBOM and attestations. -
IDE telemetry?
Consent; anonymize; aggregate. -
Diffs conflict?
Auto-merge with Git; manual review. -
Token budgets?
Cap and route smaller models. -
Multi-agent chatter?
Coordinator limits; finalize plan. -
Test flakiness?
Reruns; stabilization tasks. -
Integrate with Jira?
Create tickets per suggestion or incident. -
Merge trains?
Batch small PRs; validate together. -
Code style drift?
Central ESLint/Prettier configs. -
Post-merge monitoring?
Defects and performance regressions. -
Code smells expansion?
Add cyclomatic complexity checks. -
Data privacy?
Mask user data in prompts. -
Offline mode?
Cache models or use small local models. -
Non-deterministic builds?
Lock versions; hermetic builds. -
LLM hallucinating APIs?
Provide docs; fail if types don’t exist. -
Quality gates?
Fail PR if tests/lint fail. -
Long functions?
Split; extract; name clearly. -
Dead code?
Detect and remove; confirm references. -
Security reviews?
Required for sensitive paths. -
Commit message style?
Conventional commits; issue links. -
Git hook safety?
Fast and idempotent; skip on CI if needed. -
Model upgrades?
Shadow test; canary; rollback plan. -
Evaluation drift?
Refresh datasets; add edge cases. -
Measured ROI?
Time saved per PR; defect reduction. -
IDE ports and proxies?
Configurable; secure connections. -
Use external tools?
Run in sandbox; quotas and limits. -
Binary diff noise?
Filter out in PRs. -
Branch protections?
Require checks and reviews. -
Long-running tasks?
Queue with status updates. -
Coding standards?
Docs and linters; fail CI on violations. -
Cross-repo changes?
Orchestration and coordinated merges. -
Generated code ownership?
Owned by team; AI is assistant. -
Prompt drift detection?
Hash and compare; alert on change. -
LLM cost spikes?
Budget alerts; cache; route smaller. -
Shadow merge risks?
Never merge without human review. -
Artifact storage?
Keep diffs, logs, and evaluations. -
Model secrets?
Never in code; env vars in CI. -
Polyglot repos?
Language-specific pipelines. -
On-prem vs cloud?
Depends on data; hybrid often works. -
Vendor lock-in?
Abstract model calls. -
Prompt governance?
Approvals; audits; owners. -
Security SBOM cadence?
Per release; track changes. -
Scalability?
Batch suggestions; async workers. -
Plugin ecosystem?
Secure review and sandbox. -
Debugging AI output?
Trace prompts and contexts. -
Data retention?
Minimal; hashed; expiry. -
Legal hold?
Store artifacts immutably. -
Code review bots?
Assist, not replace; human approve. -
New language support?
Add LSP and AST parsers. -
Security training?
Dev training on safe patterns. -
Test coverage targets?
Set per repo; enforce. -
Performance regressions?
Benchmark critical paths. -
Long PRs?
Split by feature; staged merges. -
Style conflicts?
Adopt repo formatter. -
Deprecations?
Track and migrate. -
Tooling reliability?
Health checks; retries. -
Local dev?
Docker compose; mocks. -
Remote dev?
Codespaces/Dev Containers. -
Pairing with AI?
Split tasks; verify outputs. -
When to stop?
Stable KPIs; diminishing returns.
Monorepo Context Assembly (Bazel/PNPM/Yarn)
# WORKSPACE.bzl scan (pseudo)
load('@bazel_tools//tools/build_defs/repo:http.bzl', 'http_archive')
# identify external deps and modules for context enrichment
# pnpm-workspace.yaml
packages:
- 'apps/*'
- 'packages/*'
// package.json (Yarn workspaces)
{
"workspaces": ["apps/*", "packages/*"]
}
export async function summarizeMonorepo(root: string){
const workspaces = await detectWorkspaces(root)
const graph = await buildRepoGraph(root)
return { workspaces, graph }
}
Repo Indexers (ctags/cscope/ripgrep)
ctags -R --languages=JavaScript,TypeScript,Python,Go,Java,Rust -f tags .
cscope -Rbq
rg --pcre2 "^export\s+(function|class|interface)\s+(\w+)" -n > exports.txt
export function loadSymbols(){
const tags = readFileSync('tags','utf8')
return parseCtags(tags)
}
API Change Detection
import { diffLines } from 'diff'
export function detectBreakingChanges(before: string, after: string){
const d = diffLines(before, after)
// naive: flag removed exports or signature changes
const removed = findRemovedExports(d)
const sigChanged = findSignatureChanges(d)
return { removed, sigChanged }
}
# in CI
node scripts/api-diff.js --old refs/main --new HEAD || { echo "BREAKING CHANGES"; exit 1; }
Docstring and Comments Generators
export function genDocstring(fnSignature: string, description: string){
return `/**\n * ${description}\n * @returns ...\n */\n${fnSignature}`
}
def add_docstring(func_src: str, summary: str) -> str:
return f'"""{summary}"""\n' + func_src
Language-Specific Codemods
Bowler (Python)
from bowler import Query
(Query().select_function("old_fn").rename("new_fn").write())
OpenRewrite (Java)
type: specs.openrewrite.org/v1beta/recipe
name: ReplaceDeprecatedAPIs
recipeList:
- org.openrewrite.java.ReplaceMethodName:
methodPattern: com.example.Legacy old*(..)
newMethodName: modern
Rust Fixers
cargo fix --allow-dirty --allow-staged
Code Templates (Handlebars/Yeoman)
// {{name}}.ts
export interface {{pascalCase name}} {
id: string
}
yo generator:create component --name Button
Safe Script Generation (Filesystem/Network Guards)
const FS_ALLOW = new Set(["read", "list"]) // no write by default
export function safeFs(op: string, ...args: any[]){
if (!FS_ALLOW.has(op)) throw new Error("fs op not allowed")
// route to readonly impl
}
const NET_ALLOW = new Set(["api.company.com"]) // strict allowlist
Ephemeral Test Environments (Docker Compose)
version: '3.9'
services:
app:
image: node:20
working_dir: /work
volumes: [".:/work"]
command: ["bash","-lc","pnpm i && pnpm test -- --ci"]
docker compose run --rm app
End-to-End Codegen Pipeline with OpenTelemetry
span = tracer.startSpan('codegen.pipeline')
span.addEvent('collect_context')
const ctx = await collectContext()
span.addEvent('generate')
const diff = await generateDiff(ctx)
span.addEvent('apply_and_test')
const ok = await applyAndTest(diff)
span.setAttribute('result', ok ? 'pass' : 'fail')
span.end()
PR Quality Gates
- run: pnpm -w test -- --ci
- run: pnpm -w eslint . --max-warnings 0
- run: pnpm -w typecheck
- run: node scripts/api-diff.js --old refs/main --new HEAD
Case Studies
Case 1: React Hook Refactor
- Context: hooks duplicated across
apps/webandpackages/ui - Action: generated shared
useDebouncedValuewith tests - Result: -400 lines, +tests, zero regressions
Case 2: Python Service Migration
- Context: Flask → FastAPI migration
- Action: codemods + route tests; perf +18%
- Result: merged via canary, monitored latency p95
Case 3: Go Config Hardening
- Context: unsafe defaults in
http.Client - Action: added timeouts and retries; SAST green
Extended FAQ (301–520)
-
How to ensure generated code matches repo style?
Read.editorconfig, linter configs, and run formatter. -
Can we generate commit messages?
Yes—conventional commits; include scope and summary. -
How to avoid breaking public APIs?
API diff in CI; require approvals for changes. -
Test-first or code-first?
Prefer tests first; code should satisfy. -
How to generate migrations safely?
Generate SQL and run against staging; backups. -
Does AI rename files?
Allow only in PR; verify imports. -
How to detect dead code?
Coverage + static analysis; remove with PR. -
Can we batch suggestions?
Yes—group related hunks; single PR per concern. -
Long compile times?
Cache; incremental builds; narrow scopes. -
Language edge cases?
Model per language; few-shot examples. -
Security posture?
SAST, DAST for web, SBOM, attestations. -
License headers automated?
Template per repo; codemod to add. -
IDE conflicts?
Respect user settings; non-disruptive UX. -
How to revert fast?
Revert PR; blue/green deploys. -
Non-hermetic tests?
Mock IO; time; network. -
Binary packages?
Pin digests; supply-chain protections. -
Partial acceptance?
Developers pick hunks; re-run CI. -
Measuring developer trust?
Survey + acceptance rate. -
Prompt transparency?
Show prompts in PR as artifact. -
Repository limits?
Skip vendored and generated dirs. -
Monorepo graph drift?
Rebuild on each PR; cache results. -
Enforce small diffs?
Gate diff size; human override possible. -
Hot paths risk?
Benchmarks; avoid risky refactors. -
Data exfiltration?
No external sends; scrub prompts. -
Model staleness?
Periodic evals; update models; canary. -
GPU shortages?
Use smaller models + caching. -
Cross-language refactors?
Treat separately; align interfaces. -
Infra as code changes?
Test plans; plan/apply in staging. -
How to limit scope?
Config files for include/exclude paths. -
Variant prompts?
A/B for quality and latency. -
Structured diffs vs free-text?
Structured preferred; deterministic. -
Comment density?
Prefer clear code; minimal comments. -
Naming quality?
Heuristics + lint rules. -
Can AI write migrations?
Yes—review carefully; test on staging. -
Multi-repo dependencies?
Lock versions; coordinated releases. -
Model hallucinating APIs?
Typecheck and fail; add docs as context. -
LLM sandboxing for tests?
Run in isolated containers. -
Diff patching safety?
Context checks; fail on mismatch. -
Rollout metrics?
Acceptance rate, defects, latency. -
Post-merge defects?
Track and correlate to suggestions. -
Secret rotation?
Automate via vault; no secrets in code. -
Code smells catalog?
Cyclomatic complexity, long params list, nested loops. -
Architectural decisions?
Record ADRs; AI can draft. -
Docs generation?
From code comments and types. -
API clients?
Generate from OpenAPI; tests included. -
Conflicting formatters?
Converge on one; enforce in CI. -
Performance tuning?
Bench harness; regressions alerts. -
Flaky evaluations?
Stabilize; rerun N times; median. -
Can suggestions be personal?
Opt-in per dev; style prefs. -
Offline coding?
Local small models; cached context. -
Legal compliance?
Respect licenses; attribution. -
Copyright concerns?
Avoid verbatim large snippets; audit. -
Prioritizing files?
Hot paths and critical modules first. -
Repository maps?
Visualize dependency graph. -
Test data management?
Factories and fixtures; no PII. -
Mutation testing?
Measure test quality; guide generation. -
Improve acceptance?
Smaller diffs; accurate context; tests. -
Security prompts?
Refuse unsafe generation; safe patterns. -
Debugger integration?
Generate breakpoints; logging. -
Legacy code?
Wrap, test, then refactor. -
Hot reload?
Support DevServer integration. -
Binary size limits?
CI gates; bundle analysis. -
Env-specific code?
Feature flags; config layers. -
Mobile repos?
Android/iOS templates; CI on device farms. -
Data science repos?
Notebook linters; pipeline tests. -
GPU kernels?
Specialized models; tests. -
Secret scanning pre-commit?
Yes; fast hooks. -
Conflict resolution UX?
Editor UI to pick hunks. -
Can AI help reviews?
Summaries and risk flags. -
Analytics?
Dashboards for acceptance and defects. -
Onboarding templates?
Scaffolds for new services. -
Logs verbosity?
Keep useful; redact sensitive. -
Localization in code?
i18n lint; extraction tools. -
Deprecation warnings?
Track and address. -
Microservices sprawl?
Standard templates; governance. -
Gradle/Maven support?
Add build steps; tests. -
Deno/Bun?
Detected and supported. -
Windows dev?
Powershell scripts; path handling. -
Monorepo CI load?
Selective builds and tests. -
Test flakes metric?
Track and reduce. -
Dependency updates?
Automate with Renovate. -
Rollback strategy?
Revert PR; disable feature flag. -
PR size cap?
Gate and split. -
Templating engines?
Handlebars, EJS, Jinja. -
Keeping context fresh?
Rebuild indexes on change. -
Large repos scaling?
Sharding indexes; async workers. -
Editor offline cache?
Store last context; delta updates. -
Trusted paths?
Safe module list; block risky dirs. -
Shared lint configs?
Publish package; enforce. -
Integrated search?
Ripgrep UI; filters. -
PR templates?
Include risk assessment and tests. -
Feature flags lib?
LaunchDarkly/OpenFeature integration. -
Safe file ops?
No rm -rf; use OS APIs; confirm. -
Parsing failures?
Fallback to regex; log cases. -
Model quota?
Rate limit; cache; route. -
Reliability SLOs?
Acceptance and defect SLOs. -
Build matrix?
Multi-OS; versions; toolchains. -
Legacy language support?
C/C++ limited; focus mainstream. -
Monorepo permissions?
CODEOWNERS; checks. -
Final word?
AI assists; humans own the code.
Gradle/Maven Build Hooks
// build.gradle
tasks.register('aiCheck') {
doLast {
println 'Running AI codegen quality checks'
}
}
check.dependsOn aiCheck
<!-- pom.xml -->
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-antrun-plugin</artifactId>
<version>3.1.0</version>
<executions>
<execution>
<phase>verify</phase>
<configuration>
<target>
<echo message="AI codegen checks"/>
</target>
</configuration>
<goals><goal>run</goal></goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
Bazel Rules for Code Generation
load("@bazel_skylib//rules:write_file.bzl", "write_file")
write_file(
name = "gen_summary",
out = "SUMMARY.md",
content = ["# Generated Summary\n"],
)
genrule(
name = "ai_codegen",
srcs = [":gen_summary"],
outs = ["out/diff.patch"],
cmd = "node tools/ai_codegen.js > $@",
)
GitHub App Checks
// app.ts
app.on(["pull_request.opened","pull_request.synchronize"], async (ctx) => {
const pr = ctx.payload.pull_request
const report = await runQualityChecks(pr)
await ctx.octokit.checks.create({
owner: ctx.payload.repository.owner.login,
repo: ctx.payload.repository.name,
name: "AI Codegen Quality",
head_sha: pr.head.sha,
status: "completed",
conclusion: report.ok ? "success" : "failure",
output: { title: "Results", summary: report.summary }
})
})
Conflict Resolver Workflow
1) Attempt auto-merge with 3-way diff
2) If conflict, isolate hunks per file
3) Propose minimal edits with context
4) Ask developer to pick hunks; re-run tests
CODEOWNERS and Risk Labelling
# CODEOWNERS
/apps/web/* @web-team
/packages/shared/* @platform-team
# risk.yml
patterns:
- path: "apps/web/*"
risk: medium
- path: "infra/**"
risk: high
Code Normalizer and Formatter Orchestrator
export async function normalize(path = "."){ await $`pnpm prettier -w ${path}`; await $`pnpm eslint ${path} --fix` }
Semantic Patching with Comby
comby 'printf(":[x]")' 'fmt.Printf(":[x]")' . -matcher .go -in-place
Regex-Safe Generators
export function safeReplace(src: string, pattern: string, replacement: string){
const re = new RegExp(pattern.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'), 'g')
return src.replace(re, replacement)
}
Datasets from PR History
const prs = await github.listMergedPRs({ repo, since: '2025-01-01' })
const cases = prs.map(pr => ({ diff: pr.diff, tests: extractTests(pr) }))
writeFileSync('eval/pr_cases.json', JSON.stringify(cases, null, 2))
Acceptance Analytics
select date_trunc('day', merged_at) as day,
count(*) filter (where accepted_suggestion) as accepted,
count(*) as total,
(count(*) filter (where accepted_suggestion))::float / count(*) as rate
from pr_events where merged_at >= now() - interval '30 days'
group by 1 order by 1;
Diff Grammar and Function-Calling for Edits
{
"instruction": "edit",
"file": "src/utils.ts",
"before": "export function sum(a,b){return a+b}",
"after": "export function sum(a: number, b: number): number { return a + b }"
}
export function applyEdit(edit: { file: string; before: string; after: string }){
const src = readFileSync(edit.file, 'utf8')
if (!src.includes(edit.before)) throw new Error('context mismatch')
writeFileSync(edit.file, src.replace(edit.before, edit.after))
}
Safety Guardrails and Sandbox Policy
sandbox:
fs: [read, list]
net: ["api.company.com"]
exec: ["pnpm", "pytest", "go", "cargo"]
limits:
cpu: 1
memory: 2Gi
timeout: 120s
CLI End-to-End
ai-codegen scan --root . > context.json
ai-codegen suggest --context context.json --task "refactor module X" > diff.patch
ai-codegen apply diff.patch
ai-codegen test
Sentry Instrumentation
import * as Sentry from "@sentry/node"
Sentry.init({ dsn: process.env.SENTRY_DSN })
Sentry.setContext('codegen', { version: '1.4.0' })
Sentry.captureMessage('suggestion-applied', { level: 'info' })
Sample E2E Renovation (TypeScript)
// 1) Detect old API usage
// 2) Replace with new API via codemod
// 3) Add tests and run
Sample E2E Renovation (Go)
// 1) Find http.Client without timeouts
// 2) Add timeouts and retries
// 3) Run go test ./...
Sample E2E Renovation (Python)
# 1) Replace requests.get with session + timeouts
# 2) Add pytest covering failures
# 3) Run pytest -q
Extended FAQ (521–620)
-
Large diffs overwhelm reviewers—how to limit?
Gate by file count and lines changed; split PRs. -
Non-deterministic suggestions?
Fix seeds; cache outputs per context hash. -
How to avoid brittle regexes?
Prefer AST and semantic tools; fallback carefully. -
Train on PR history?
Use as evaluation, not training if policy restricts. -
Keep API clients updated?
Generate from OpenAPI and pin versions. -
Binary file handling?
Skip; log and notify when encountered. -
Monorepo test scopes?
Run impacted packages only using graph. -
Dependency hell?
Automate Renovate; single source of truth. -
Docs drift?
Generate docs from code; diff doc coverage. -
Breaking changes flagged?
Block PR until owner approves. -
Model fallback?
Small model when quota hits; warn users. -
Pre-commit hooks slow?
Scope to changed files; cache. -
Combine AST + embeddings?
Yes: AST for precision, embeddings for recall. -
GPU scarce?
Batch and cache; distill models. -
Risky directories?
Block edits in payment/auth paths without approval. -
Code comments style?
Follow repo conventions; lint. -
Code generators versioning?
Lock templates and engines. -
Long-running tests?
Mark slow; run nightly; PR runs fast suite. -
Unreliable network in tests?
Mock and stub; forbid network. -
Hotfix flow?
Bypass some gates with owner approval. -
Multi-tenant repos?
Owners per tenant; scopes enforced. -
IDE crashes?
Disable extension; collect logs; fix regressions. -
Suggested code license?
Apply repo license; add headers if required. -
Measuring suggestion value?
Time saved, defects avoided, acceptance rate. -
Editor offline?
Local cache and small local models. -
Line-ending issues?
Normalize to repo standard. -
Different language formatters?
Run per-language chain: gofmt/black/prettier. -
API keys in prompts?
Block and redact; refuse action. -
Staging vs prod diffs?
Separate pipelines; review separately. -
Templated repos?
Use generators with parameters; track provenance. -
Failure budgets?
Define per quarter; stop risky changes when exceeded. -
Data residency?
Process prompts in-region. -
Observability privacy?
Hash PII; restrict access. -
Command injection?
Sandbox, allowlists, and argument validators. -
Autoscaling?
Queue depth and latency-based. -
Can AI write infra code?
Yes—validate with terraform plan, kubeval. -
Binary patches?
Avoid; manual review required. -
Merge queues?
FIFO with priority for hotfixes. -
Markdown links?
Validate and fix; link checker CI. -
Test data generation?
Factories; property-based generators. -
Is SLSA necessary?
For high-assurance releases, yes. -
LLM legal concerns?
Consult counsel; track provenance. -
IDE telemetry opt-out?
Yes; respect user preferences. -
AI code ownership?
Team owns; AI assists only. -
Nightly full runs?
Run full suite; summarize deltas. -
Suggestion persistence?
Store diffs and context; expire after time. -
API limit handling?
Backoff; queue; alternate routes. -
Lang-specific linters?
Yes—pylint/flake8, go vet, detekt. -
Code clone detection?
Simhash/minhash; refactor duplicates. -
UI for conflicts?
Interactive hunk picker. -
Selective enablement?
Per repo or folder; flags. -
How to track hot modules?
Change frequency; bug density. -
Suggest doc updates?
Yes—update README and CHANGELOG. -
Policy drift?
Config as code and audits. -
New language onboarding?
Add parser, formatter, linter, tests. -
Integration tests heavy?
Run nightly; PRs run smoke tests. -
Schema migrations safety?
Backups; idempotent scripts. -
Sentry noise?
Sample and dedupe. -
Supply chain risks?
Pin digests; verify attestations. -
How to sunset features?
Flags; deprecations; remove code. -
Generated comments quality?
Keep minimal and useful. -
CI queue delays?
Autoscale runners; prioritize small PRs. -
Disabling suggestions?
Per user or repo; policy. -
Prefill commit messages?
Yes; editable by devs. -
git blame noise?
Co-authored-by annotations. -
Parallel pipelines?
Shard cases; gather results. -
Cross-platform scripts?
Use Node scripts or Python; avoid bash-isms. -
Env var leaks?
Never print; mask in logs. -
k8s manifests generation?
Validate with kubeval and conftest. -
Terraform generation?
Run terraform fmt/validate/plan in CI. -
Non-root containers?
Enforce with policies. -
Portability?
Avoid OS-specific paths/APIs. -
Line count limits in PR?
Gate large diffs; split. -
Stale branches?
Rebase or merge main; rerun CI. -
Code review etiquette?
Respectful, constructive, specific. -
Accessibility in code?
Lint for a11y in web apps. -
Repo secrets?
Scan history; rotate keys. -
Emoji in code?
Avoid; style guides. -
Package publishing?
Signed builds; attest. -
Final guidance?
Keep humans in control; measure outcomes.
Post-Deployment Operational Checklist
- Validate acceptance rate and defect metrics for last 24–72h
- Compare p95 latency and cost deltas vs baseline
- Review top-10 suggestions by acceptance and post-merge incidents
- Confirm CODEOWNERS approvals and risk labels coverage
- Re-run safety scans (secrets/SAST) on merged diffs
- Verify rollbacks available and documented for this release
- Update experiment registry with outcomes and next actions
Quick Start Summary (Copy/Paste)
# 1) Index repo context
ai-codegen scan --root . > context.json
# 2) Generate suggestions as diffs
ai-codegen suggest --context context.json --task "refactor module X" > diff.patch
# 3) Apply and run tests
ai-codegen apply diff.patch && pnpm -w test -- --ci
# 4) Open PR with artifacts
gh pr create -t "refactor: module X" -b "Automated diff + tests" -F diff.patch