copilot — ideas/codebase-archaeologist
node v20
$ copilot --idea "Codebase Archaeologist"
intermediate ⏱ 1-2 hours Creative & General Projects
Onboard to any codebase in hours, not weeks — let Copilot CLI map the territory

The Problem

You just joined a new team. The codebase is 200,000 lines across 15 services. The documentation is either missing, outdated, or written by someone who assumed you already understood everything. The README says "see the wiki" and the wiki says "see the code."

What You'll Build

A systematic codebase exploration workflow that produces:
- An architecture map showing all services, their responsibilities, and how they communicate
- A glossary of domain terms used in the code
- A "where to find things" guide for the most common development tasks
- An onboarding document a new developer could follow in their first week

Step-by-Step Walkthrough

Phase 1: Bird's Eye View

Start with the big picture. Launch multiple explore agents in parallel:
$ "Map the architecture of this repository:
1. What are the main entry points?
2. What frameworks and libraries are used?
3. How is the project structured (monolith, microservices, monorepo)?
4. What databases and external services does it connect to?
5. How is authentication handled?"
For monorepos, send an explore agent per service — they run in parallel and report back independently.

Phase 2: Data Flow Tracing

Pick the most important user journey and trace it:
$ "Trace the complete request lifecycle for a user login:
1. Which endpoint handles the initial request?
2. What middleware runs before the handler?
3. How does it validate credentials?
4. What gets written to the database?
5. What tokens or sessions are created?
6. What gets returned to the client?"

Phase 3: Complexity Hotspots

$ "Find the 10 most complex functions in this codebase.
Rank by cyclomatic complexity and lines of code.
For each one, explain what it does in plain English
and whether it could be simplified."
$ "Find dead code — functions that are defined but never called,
imports that are never used, and files that nothing references.
Estimate the percentage of dead code."

Phase 4: Domain Language

$ "Extract the domain language from this codebase.
List every business term (like 'invoice', 'subscription', 'tenant')
with its definition based on how the code uses it.
Note any terms that are used inconsistently."

Phase 5: Generate the Onboarding Doc

$ "Create an onboarding guide for a new developer joining this team.
Include:
- How to set up the local development environment
- How to run tests
- The top 10 files they should read first
- Common tasks (add an API endpoint, add a database migration, deploy)
- Gotchas and non-obvious conventions"

Pro Tips

• Use `git log --oneline --since="6 months ago" -- <path>` to find actively maintained areas
• The code-review agent is great for understanding recent PRs
• Ask Copilot to generate a Mermaid sequence diagram for complex flows
• Save the output as actual docs in the repo — future team members will thank you

What You'll Learn

• Systematic codebase exploration techniques
• How to use parallel agents for large-scale analysis
• Documentation generation from code
• Identifying technical debt and complexity hotspots