{"id":"bc0c3389-8f2","slug":"grepfirst-mcfallthrough-small-opensource-retrieval","claude_md":"# GrepFirst McFallthrough\n\n> Generated by [whycantwehaveanagentforthis.com](https://whycantwehaveanagentforthis.com/result/grepfirst-mcfallthrough-small-opensource-retrieval). Roasted, scored, ready to scaffold.\n\n## What you are building\n\n**Problem:** I want a small open-source MCP retrieval router that defaults to grep and only falls through to vector search when the query actually looks semantic. Worth shipping?\n\n**Verdict:** ACTUALLY NOT BAD — _\"Finally, someone who remembers that grep is O(n) and your RAM isn't free.\"_\n\n**Summary:** An MCP-native retrieval router that classifies incoming queries as lexical vs. semantic and dispatches to ripgrep or a vector store accordingly, with zero config defaults.\n\n## Agent-readiness score\n\nOverall: **58/100** (band C)\n\n| Dimension | Score | Why |\n|---|---|---|\n| Memory required | 21/25 | Some cross-session state — start with Redis, graduate to a vector store. |\n| Tool count | 9/25 | Crowded market: at least 8 integrations to compete. |\n| Policy surface | 11/25 | Mid-size policy surface — define refusal categories before launch. |\n| Eval coverage | 17/25 | Eval scaffolding doable — write 50 paired examples and grade with an LLM-as-judge. |\n\n> Worth building, but plan for the long-tail. GrepFirst McFallthrough needs runway, not just speed.\n\n## Suggested tools\n\n- fetch (HTTP GET on a URL allow-list)\n- search (Brave / Tavily / Exa for competitor research)\n- database (Postgres / Supabase for user state)\n- vector-store (embedding-based retrieval)\n\n## Smoke evals\n\n- The agent introduces itself as \"GrepFirst McFallthrough\" and refuses tasks outside the stated scope.\n- Given the canonical problem (\"I want a small open-source MCP retrieval router that defaults to grep and only f\"), the agent produces a plan in ≤ 200 tokens.\n- When asked \"what's different from Exa?\", the agent gives a concrete differentiator, not a marketing line.\n- When asked about Anthropic's threat, the agent acknowledges the risk honestly.\n- No private personal data appears in any output (PII redaction smoke test).\n\n## Stack\n\n- Model: `claude-sonnet-4-6` (Anthropic). Override via `ANTHROPIC_MODEL` env.\n- Suggested stack: `TypeScript MCP SDK`, `ripgrep (subprocess)`, `chromadb (sqlite-vec as lighter alt)`, `sentence-transformers or OpenAI embeddings for fallthrough`, `zod for query schema validation`\n- Solo build estimate: 1 focused weekend for v0.1, 3 weeks to handle edge cases you'll regret ignoring\n\n## Kill prediction\n\nAnthropic could obsolete this in 9-18 months. They add a retrieval_strategy hint to the MCP spec or ship a reference retrieval server with hybrid routing baked in, making your router redundant by default\n\n**Survival strategy:** Own the classification logic as a reusable library that works regardless of transport — make GrepFirst the algorithm, not just the MCP server\n\n## Hand-off\n\n- Read the full analysis: https://whycantwehaveanagentforthis.com/result/grepfirst-mcfallthrough-small-opensource-retrieval\n- Open in Anthropic Managed Agents: see the deeplink on the result page\n- Claim this idea: https://whycantwehaveanagentforthis.com/result/grepfirst-mcfallthrough-small-opensource-retrieval#claim\n","scaffold_sh":"#!/usr/bin/env bash\n# Generated by whycantwehaveanagentforthis.com — F-N1 Build-this-with\n# Source: https://whycantwehaveanagentforthis.com/result/grepfirst-mcfallthrough-small-opensource-retrieval\n#\n# Bootstraps a starter repo for \"GrepFirst McFallthrough\" with a CLAUDE.md\n# pulled from this site. Idempotent: re-running on an existing\n# folder is a no-op. No network calls beyond the initial curl.\nset -euo pipefail\n\nFOLDER=\"${1:-grepfirst-mcfallthrough}\"\nif [ -d \"$FOLDER\" ]; then\n  echo \"Folder $FOLDER already exists. Aborting (idempotent).\"\n  exit 0\nfi\nmkdir -p \"$FOLDER\"\ncd \"$FOLDER\"\n\n# Pull the live CLAUDE.md from the site.\ncurl --fail --silent --show-error -L \"https://whycantwehaveanagentforthis.com/api/bootstrap/bc0c3389-8f2/raw\" -o CLAUDE.md\n\ncat > .gitignore <<EOF\nnode_modules\n.env*\ndist\n.next\nEOF\n\n# Init git so the first commit is the scaffold.\ngit init --quiet\ngit add CLAUDE.md .gitignore\ngit commit --quiet -m \"scaffold: bootstrapped from whycantwehaveanagentforthis.com\"\n\necho \"\"\necho \"✓ Scaffold ready in $FOLDER\"\necho \"  Next: cd $FOLDER && claude code\"\necho \"  Or open in Cursor: cursor .\"\n","deeplink":"claude-code://init?source=https%3A%2F%2Fwhycantwehaveanagentforthis.com%2Fapi%2Fbootstrap%2Fbc0c3389-8f2%2Fraw"}