Code Review Tool Comparison

Testigo Recall vs GitHub Copilot vs CodeRabbit vs Greptile — 5 tests, 35 issues, head-to-head

ado122/analyzer — 2,904 KB facts (Tests 1, 2, 4–5)
twentyhq/twenty — 31,045 KB facts (Test 3)
Testigo Recall
31/35
issues found across all tests
T1: 7/8 T2: 5/5 T3: 2/2 T4: 9/10 T5: 8/10
Only tool with cross-module awareness
CodeRabbit Pro
29/35
issues found across all tests
T1: 7/8 T2: 5/5 T3: 0/2 T4: 9/10 T5: 8/10
Strong paid reviewer — no cross-module awareness
GitHub Copilot
28.5/35
issues found across all tests
T1: 8/8 T2: 4.5/5 T3: 0/2 T4: 8/10 T5: 8/10
Best free option, very verbose
Greptile $30/dev/mo
22.5/35
issues found across all tests
T1: 4/8 T2: 4.5/5 T3: 0/2 T4: 7/10 T5: 7/10
Most expensive, fewest issues found
Test 1

PR #13 — Edge Case Gauntlet

4 files, 8 real bugs + 2 non-bugs (traps), ado122/analyzer

Real Bugs (8)

  1. Rate limiter >=> (off-by-one)
  2. CSRF HMAC-SHA256 → base64 (security downgrade)
  3. API timeout 15min → 30s (breaks analysis jobs)
  4. SQL injection ×2 in new exportService.js
  5. CSV injection in new exportService.js
  6. Off-by-one pagination in new exportService.js
  7. Path traversal in new exportService.js
  8. Promise error swallowing

Traps (not real bugs): Harmless log simplification (all tools correctly skipped) · Loose equality == on two numbers (behaves identically to === — Copilot & CodeRabbit flagged as FP)

Results

BugTestigo RecallCopilotCodeRabbitGreptile
1. Rate limiter >=>YES (KB)YESYES~summary only
2. CSRF HMAC → base64YES (KB)YESYES (critical)YES
3. API timeout 15m → 30sYES (KB)YESYESMISSED ("reasonable")
4. SQL injection ×2YES (both)YESYES (critical)YES
5. CSV injectionYESYESYES (critical)YES
6. Off-by-one paginationYESYESYES~summary only
7. Path traversalMISSEDYESMISSEDMISSED
8. Promise error bugYESYESYES~summary only

Scores

Copilot
8/8
~5 false positives (noisy)
CodeRabbit
7/8
~2 FP
Testigo Recall
7/8
0 false positives + bonus CSRF timing attack
Greptile
4/8
0 false positives

Key Observations

Test 2

PR #14 — Domain-Specific Logic Bugs

5 files, KB-heavy planted bugs — all are documented value/behavior changes, ado122/analyzer

Planted Bugs

  1. JWT token expiry 7d → 30min (breaks stateless auth)
  2. Virality score weights swapped: velocity 40%→10%
  3. Strategy cache duration 6h → 5s (excessive API calls)
  4. Free tier channels 1 → unlimited (breaks business model)
  5. Encryption AES-256-GCM → AES-128-ECB (security regression)

Results

BugTestigo RecallCopilotCodeRabbitGreptile
1. JWT 7d → 30minYES (KB, critical)YESYESYES
2. Virality weights swappedYES (KB, warning)~partial (stale comments)YES~summary only
3. Strategy cache 6h → 5sYES (KB, warning)YESYESYES
4. Free tier 1 → unlimitedYES (KB, warning)YESYESYES
5. AES-256-GCM → AES-128-ECBYES (KB, 2 critical + 1 code)YES (3 comments)YES + runtime crashYES (1 + 3 detail)

Scores

Testigo Recall
5/5
0 false positives — quoted exact KB values
CodeRabbit
5/5
+1 unique: getAuthTag runtime crash
Copilot
4.5/5
Virality = "stale comment" not logic bug
Greptile
4.5/5
Virality only in summary, not inline

Testigo Recall & CodeRabbit Both Aced This Test

Testigo Recall quoted exact facts: "JWT tokens expire after 7 days" (98%), "strategy: 6hr (21600000ms)", "Free: 1 channel", "AES-256-GCM with 16-byte IV". CodeRabbit found all 5 + a bonus runtime crash.

Test 3

PR #18108 — Real-World Billing Fix

Real PR from twentyhq/twenty (open-source CRM, 31K KB facts) — billing credits display fix, 3 files, +82/−28 lines. Not planted bugs — real issues found during review.

Issues Identified

  1. Behavioral change: upTo raw value → toDisplayCredits() conversion (undocumented)
  2. Cross-module side effect: shouldUpdateAtSubscriptionPeriodEnd compares in internal units — potential 1000× mismatch

Results

IssueTestigo RecallCopilotCodeRabbitGreptile
1. Behavioral change (upTo conversion)YES (KB, 100%)MISSEDMISSEDMISSED
2. Cross-module unit mismatchYES (KB, 95%)MISSEDMISSEDMISSED

Scores

Testigo Recall
2/2
Both findings from KB layer — required cross-file knowledge
CodeRabbit
0/2
1 nitpick (test edge case), 0 real findings
Copilot
0/2
"Reviewed 3 files, generated no comments"
Greptile
0/2
"5/5 confidence — safe to merge with minimal risk"

Key Observations

Test 4

PR #16 — Competitor Benchmarking (10 bugs)

10 files, 10 planted bugs (3 KB-detectable + 7 pure code), 1,329 lines added, ado122/analyzer

Planted Bugs

  1. Missing CSRF on competitor routes KB
  2. optionalAuth on DELETE (bypasses auth) KB
  3. Conditional rendering (loses tab state) KB
  4. NoSQL injection / ReDoS via unescaped RegExp
  5. IDOR — no ownership check on getById
  6. XSS via dangerouslySetInnerHTML
  7. Off-by-one pagination
  8. Missing await on async delete
  9. Mass assignment / prototype pollution
  10. String vs number comparison

Results

BugTestigo RecallCopilotCodeRabbitGreptile
1. Missing CSRFYES (KB)YESYESYES
2. optionalAuth on DELETEYES (KB)YESYESYES
3. Conditional renderingYES (KB)MISSEDYESMISSED
4. NoSQL injection / ReDoSYESMISSEDYESMISSED
5. IDORYESYESYESYES
6. XSS dangerouslySetInnerHTMLYESYESYESYES
7. Off-by-one paginationYESYESYESYES
8. Missing awaitYESYESYESYES
9. Mass assignment / proto pollutionYESYESYESYES
10. String vs number comparisonMISSEDYESMISSEDMISSED

Scores

Testigo Recall
9/10
+6 bonus findings (tier limits, ownership)
CodeRabbit
9/10
+5 bonus (TOCTOU race, compound index)
Copilot
8/10
+5 bonus — only tool finding string comparison
Greptile
7/10
+0 bonus — precise but shallow

Key Observations

Test 5

PR #19 — Scheduled Reports & Data Export (10 bugs)

11 files, 10 planted bugs (all different from Test 4), 1,485 lines added, ado122/analyzer

Planted Bugs

  1. Missing CSRF on report routes KB
  2. Path traversal in report file download
  3. SSRF via webhook delivery URL
  4. Insecure randomness (Math.random for share tokens)
  5. IDOR — no ownership check on getById/delete/generate/download
  6. Prototype pollution in deep merge
  7. Missing await on async delete
  8. ReDoS regex in template name validation
  9. Info disclosure (stack trace in error response)
  10. Off-by-one date range (29 days instead of 30)

Results

BugTestigo RecallCopilotCodeRabbitGreptile
1. Missing CSRFYES (KB)YESYESYES
2. Path traversalYESYESYES (critical)YES
3. SSRF (webhook)YESYESYESYES
4. Insecure randomnessYESYESYESYES
5. IDOR (4 endpoints)YES (all 4)YES (all 4)YES (critical)YES
6. Prototype pollutionYESMISSEDYESMISSED
7. Missing awaitYES (KB)YESYESYES
8. ReDoS regexMISSEDMISSEDMISSEDMISSED
9. Info disclosure (stack trace)YESYESYESMISSED
10. Off-by-one date (29 not 30)MISSEDYESMISSEDMISSED

Scores

Testigo Recall
8/10
19 comments — 9 critical, 0 FP
CodeRabbit
8/10
20 comments + full fix diffs
Copilot
8/10
33 comments — very verbose, lots of noise
Greptile
7/10
10 comments — concise but shallow

Key Observations

Conclusions (across all 5 tests)

2
CodeRabbit is the strongest paid competitor — 29/35, with excellent fix suggestions. But it's completely blind to cross-module issues (scored 0/2 on Test 3 where Recall found both).
3
Copilot is the best free option — 28.5/35, occasionally catches unique bugs (path traversal in T1, string comparison in T4, off-by-one in T5). Trade-off: extremely verbose (33 comments on T5 alone, many are noise).
4
Greptile trails at $30/dev/mo — 22.5/35, consistently last place across all 5 tests. Underperforms the free Copilot on every test.
5
ReDoS is a universal blind spot — all 4 tools missed it in Test 5. AI code reviewers don't analyze regex computational complexity.
7
Recall complements existing tools. Copilot and CodeRabbit handle code-level bugs well; Recall adds the codebase-level layer — convention violations, KB regressions, cross-module side effects. Together they cover the full spectrum.