Development Guide
Quick Start
# Check your environment
task doctor
# Setup everything
task setup
# Start developing
task devThe docs site uses Bun. task setup installs docs dependencies with bun install, and task doctor checks that Bun is available.
Daily Workflow
task dev # Hot reload development
task check # Run before committing (fmt, lint, test)
task ship # Full CI pipeline locallyDebugging
task wtf # What's broken?
task doctor # Environment checkOllama
task ollama # Start Ollama with Metal GPU supportRequires nomic-embed-text model:
ollama pull nomic-embed-textUseful Commands
task # List all available tasks
task build # Build the binary
task test # Run tests
task race # Run tests with the race detector
task short # Run short tests
task verbose # Run verbose tests
task cov # Generate coverage output
task flows # Run all terminal Studio flows
task site # Start the VitePress docs site
task site:build # Build the VitePress docs site
task clean # Remove build artifactsArchitecture Overview
vecgrep follows a layered architecture with clear separation of concerns:
┌─────────────────────────────────────────────────────────────┐
│ CLI Layer │
│ cmd/vecgrep/main.go │
│ (Cobra commands, user interaction, flags) │
└─────────────────────────────────────────────────────────────┘
│
┌─────────────────────┼─────────────────────┐
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ MCP Server │ │ Studio │ │ Search │
│ internal/mcp │ │internal/studio│ │internal/search│
└───────────────┘ └───────────────┘ └───────────────┘
│ │ │
└─────────────────────┼─────────────────────┘
▼
┌─────────────────────────────────────────────────────────────┐
│ Index Layer │
│ internal/index │
│ (Chunker, file walking, language detection) │
└─────────────────────────────────────────────────────────────┘
│
┌─────────────────────┴─────────────────────┐
▼ ▼
┌───────────────┐ ┌───────────────┐
│ Embedding │ │ Database │
│ internal/embed│ │ internal/db │
│Ollama/cloud │ │ (veclite) │
└───────────────┘ └───────────────┘Data Flow
Indexing Flow:
Files → Walker → Chunker → Embeddings → Database- File walker discovers files, respecting ignore patterns
- Chunker splits code into semantic units (functions, classes, blocks)
- Embedding provider generates vectors for each chunk
- Database stores chunks and embeddings with file metadata
Search Flow:
Query → Embedding → Vector/BM25 Search → Ranked Results- Query text is embedded using the same provider for semantic and hybrid modes
- Keyword mode uses VecLite BM25 without generating a query embedding
- Vector similarity and BM25 results are ranked or fused by the selected search mode
Package Responsibilities
cmd/vecgrep/
CLI entry point using Cobra. Defines all commands (init, index, search, etc.) and handles user interaction, flags, and output formatting.
internal/config/
Hierarchical configuration system with multiple sources:
config.go- Core config types and loadingresolution.go- Multi-level config resolutionglobal.go- Global project registry (~/.vecgrep/)
Resolution Order (highest to lowest priority):
- Environment variables (VECGREP_*)
- Project root vecgrep.yaml
- Project .config/vecgrep.yaml
- Project .vecgrep/config.yaml (legacy)
- Global project entry
- Global defaults (~/.vecgrep/config.yaml)
- Built-in defaults
Default Storage: New projects are registered in ~/.vecgrep/config.yaml by default, with generated index data stored under ~/.vecgrep/projects/<project>/. Repo-local .vecgrep/ directories are legacy or explicit vecgrep init --local behavior only.
internal/db/
Pure veclite database layer (no SQLite, no CGO):
db.go- Database operations and wrappervector_backend.go- Vector backend interfaceveclite_backend.go- VecLite HNSW implementation with full metadata storage
Data Model: All data is stored in veclite vector payloads:
- File info: path, hash, size, language
- Chunk info: content, lines, type, symbol name
- Project info: root path, indexed timestamp
Embedding Boundary: vecgrep owns provider selection, credentials, batching, retries, code chunking, and rebuild policy. VecLite owns vector storage, BM25, filtering, HNSW search, and persistence. See docs/veclite-integration.md for the integration contract.
Embedding Profile Guard: Indexing persists embedding_profile.json next to vectors.veclite with provider, model, dimensions, distance, modality, and chunker version. Incremental indexing and vector-based search compare the stored profile with the active config and require a full re-index when vector meaning changes.
internal/embed/
Embedding provider implementations:
provider.go- Provider interface definitionollama.go- Ollama API client (local)openai.go- OpenAI API client (cloud)cohere.go- Cohere Embed v2 client (cloud)voyage.go- Voyage AI embeddings client (cloud)detect.go- Provider detection and model metadata
Provider Interface:
type Provider interface {
Embed(ctx context.Context, text string) ([]float32, error)
EmbedBatch(ctx context.Context, texts []string) ([][]float32, error)
Model() string
Dimensions() int
Ping(ctx context.Context) error
}Providers that distinguish retrieval roles can also implement:
type QueryProvider interface {
EmbedQuery(ctx context.Context, text string) ([]float32, error)
}
type DocumentProvider interface {
EmbedDocuments(ctx context.Context, texts []string) ([][]float32, error)
}The indexer prefers DocumentProvider for chunk embeddings. Search prefers QueryProvider for semantic, hybrid, and similar-by-text queries. Providers without those optional interfaces use the base Provider methods.
internal/index/
File indexing and code chunking:
indexer.go- Main indexer coordinating the processchunker.go- Language detection and heuristic code splittingwatcher.go- Optional file system watching for changed files
Chunker Strategy:
- Uses language-specific pattern extraction where available
- Identifies functions, types, classes, and blocks
- Falls back to sliding window for unknown formats
- Respects chunk_size and chunk_overlap settings
internal/mcp/
Model Context Protocol server:
server_sdk.go- MCP server using official Go SDK
Available Tools:
vecgrep_init- Initialize projectvecgrep_search- Semantic searchvecgrep_index- Index filesvecgrep_status- Get statisticsvecgrep_similar- Find similar codevecgrep_delete- Remove file from indexvecgrep_clean- Optimize databasevecgrep_reset- Clear database
internal/search/
Search implementation:
search.go- Query embedding and similarity searchwarmup.go- Search warmup helpers
internal/app/
Shared application service layer used by CLI and Studio:
session.go- Project/config/database/provider session setupsearch.go- Search and similar-code requestsindex.go- Index maintenance operationsstatus.go- Project status aggregation
internal/studio/
Bubble Tea v2 Studio terminal app:
model.go- Update/view state machinerun.go- Program bootstraptheme.go- Lip Gloss styles
internal/render/
CLI rendering adapters for shared result formatting.
internal/version/
Version information:
- Set via ldflags at build time
- Used by
vecgrep versioncommand
Adding New Features
Adding an Embedding Provider
- Create
internal/embed/newprovider.go:
type NewProvider struct {
apiKey string
model string
}
func (p *NewProvider) Embed(ctx context.Context, text string) ([]float32, error) {
// Implementation
}
func (p *NewProvider) Dimensions() int {
return 768 // or whatever your model uses
}- Add config options to
internal/config/config.go:
type EmbeddingConfig struct {
// ... existing fields
NewProviderAPIKey string `mapstructure:"newprovider_api_key"`
}Wire up in
internal/app/provider.goand MCP initialization if the shared factory cannot be usedUpdate README.md with usage instructions
Adding a Language Chunker
Add language detection in
internal/index/chunker.goImplement heuristic chunk extraction in
internal/index/chunker.go:
func (c *Chunker) chunkNewLang(content string) []Chunk {
// Identify semantic units (functions, classes, etc.)
// Return chunks with proper types
}- Add tests with sample code in your language
Adding an MCP Tool
- Add tool definition in
internal/mcp/server_sdk.go:
{
Name: "vecgrep_newtool",
Description: "Description of the new tool",
InputSchema: mcp.ToolInputSchema{
Type: "object",
Properties: map[string]any{
"param1": map[string]any{
"type": "string",
"description": "Parameter description",
},
},
Required: []string{"param1"},
},
}- Add handler in the tool dispatch:
case "vecgrep_newtool":
return s.handleNewTool(ctx, params)Implement the handler method
Update README.md MCP section
Modifying the Data Model
The database uses veclite with all metadata stored in vector payloads.
Update the
ChunkRecordstruct ininternal/db/veclite_backend.goUpdate the payload construction in
InsertChunk()Update the payload extraction in
recordToChunk()Run tests to ensure compatibility:
task test- Note: Existing indexes may need to be rebuilt after schema changes
Testing Patterns
Unit Tests
func TestChunker_Go(t *testing.T) {
chunker := NewChunker(512, 64)
content := `func Hello() { return "world" }`
chunks := chunker.Chunk(content, "go")
assert.Len(t, chunks, 1)
assert.Equal(t, "function", chunks[0].Type)
}Integration Tests
Tests requiring Ollama use a skip condition:
func TestSearch_Integration(t *testing.T) {
if testing.Short() {
t.Skip("skipping integration test")
}
// ... test with real Ollama
}Run integration tests:
task test # All tests (Ollama required)
task short # Skip integration testsTerminal Flows
Glyphrun terminal flows live in specs/flows/, matching the flow/action layout used by the automation projects.
task flows
task flow FLOW=specs/flows/studio_launch_quit.ymlDocs Site
The docs site is powered by VitePress and uses Markdown files in docs/.
task site
task site:build
task site:previewMock Providers
For testing search without Ollama:
type MockProvider struct {
embeddings map[string][]float32
}
func (m *MockProvider) Embed(ctx context.Context, text string) ([]float32, error) {
if vec, ok := m.embeddings[text]; ok {
return vec, nil
}
return make([]float32, 768), nil
}CI/CD
GitHub Actions
- ci.yml - Runs on every push/PR:
task check,task race,task build, and coverage upload - release.yml - Runs
task check, then creates tagged release binaries with GoReleaser
Local CI Simulation
task ship # Runs full CI pipeline locallyTroubleshooting
Common Issues
"not in a vecgrep project"
- Run
vecgrep initin your project directory - Or add project to global registry
"failed to connect to Ollama"
- Ensure Ollama is running:
ollama serve - Check the URL in config (default:
http://localhost:11434)
"embedding profile mismatch" or "embedding dimensions mismatch"
- Embedding provider, model, dimensions, distance, or chunker profile changed
- Run
vecgrep index --fullorvecgrep reset --forceand re-index
Database migration warning
- A legacy
.vecgrep/vecgrep.dbfile without a veclite index is not used by the current build - Run
vecgrep reset --forceand re-index, or keep a backup before deleting legacy data
Debug Mode
vecgrep --verbose <command> # Enable verbose output
VECGREP_DEBUG=1 vecgrep ... # Extra debug info