Files
DocsMCP/README.md
T
2026-06-06 12:44:02 +01:00

447 lines
10 KiB
Markdown

# Context7-style Docs MCP System
A self-hosted, local-compatible documentation retrieval and search system using Docker. This project uses Qdrant for vector embeddings and SQLite for metadata storage, exposing a FastAPI docs backend and an MCP server for IDE/tool integration.
## 🏠 Home Server / Production Use
This section covers hardening recommendations for running this system on a home server or in production.
### Environment Variables (`.env`)
Copy `.env.example` to `.env` and configure:
```bash
cp .env.example .env
```
| Variable | Description | Example |
|----------|-------------|---------|
| `HOST_PORT` | Docs API host port (default: 8787) | `8787` |
| `MCP_HOST_PORT` | MCP server host port (default: 8788) | `8788` |
| `DOCS_API_KEY` | API key for docs-api authentication (optional) | `my-secret-key-123` |
| `MCP_API_KEY` | API key for MCP server authentication (optional, FastMCP handles via --key flag conceptually) | `mcp-secret-key` |
| `DOCS_PATH` | Path to documentation files inside container | `/docs` |
| `DB_PATH` | SQLite database path inside container | `/data/db.sqlite` |
| `LOG_LEVEL` | Logging level: DEBUG, INFO, WARNING, ERROR | `INFO` |
> **Security Note:** API keys are optional. Leave empty in `.env` if you don't need authentication (backward compatible with existing setups). If set, the docs-api requires an `X-API-Key` header matching `DOCS_API_KEY` for protected endpoints.
### Port Configuration
For firewall or network setup:
```bash
# Example: Run docs-api on port 9000 instead of 8787
HOST_PORT=9000 MCP_HOST_PORT=9001 docker compose up -d --build
```
### Backup Instructions
#### SQLite Database (`data/db.sqlite`)
Regular SQLite backups prevent data loss. Example cron job:
```bash
# Add to crontab (run daily at 2am)
0 2 * * * docker compose exec docs-api sqlite3 /data/db.sqlite ".backup '/backups/db_$(date +%Y%m%d).sqlite'"
```
Or one-off backup:
```bash
docker compose exec docs-api sh -c "sqlite3 /data/db.sqlite '.dump' | gzip > /backups/db-$(date +%Y%m%d-%H%M%S).sql.gz"
```
#### Qdrant Vector Store
Qdrant stores vectors in `./data/qdrant`. For backup:
```bash
# Backup entire Qdrant data directory
docker compose exec qdrant sh -c "tar czf /backups/qdrant-backup-$(date +%Y%m%d).tar.gz /qdrant/storage"
# Or pull full export to host (requires volume mount)
docker run --rm -v local-context7_data:/data -v $(pwd)/backups:/backups qdrant/qdrant:latest tar czf /backups/qdrant-backup-$(date +%Y%m%d).tar.gz /qdrant/storage
```
### Rebuild Without Losing Sources or Ingestion
Normal image rebuilds preserve Git source definitions, cloned repositories,
uploaded documents, SQLite metadata, Qdrant vectors, and the embedding model
cache because they are bind-mounted from the host.
```bash
git pull
docker compose up -d --build
```
Do not delete `data/`, `docs/`, or `docs_sources.yaml`. Do not run the reset
commands below unless you intentionally want to erase the indexed data and
source configuration.
### Safe Reset Command
To reset both SQLite and Qdrant cleanly:
```bash
docker compose down -v # Removes volumes and stops services
rm ./data/db.sqlite # Remove database file
rm -rf ./data/qdrant # Remove Qdrant data
docker compose up -d --build
```
Or use the `make reset` command below.
### Makefile Commands
The included `Makefile` provides convenient commands:
```bash
# Start services
make up
# Stop services
make down
# Rebuild and restart
make restart
# Backup database
make backup-db BACKUP_PATH=/backups/db-$(date +%Y%m%d).sqlite.gz
# Reset everything (delete volumes)
make reset
```
---
## Architecture
## Architecture
```
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Client │────▶│ docs-api │◀────│ docs-mcp │
│ (IDE/Tool) │ │ (FastAPI) │ │ (MCP Server)│
└─────────────┘ └─────────────┘ └─────────────┘
┌─────────────┐
│ Qdrant │
│ (Vector DB) │
└─────────────┘
```
**Components:**
- `qdrant` — Vector database storing document embeddings
- `docs-api` — FastAPI backend exposing ingestion, search, and library endpoints
- `docs-mcp` — MCP server providing tools for Context7-style AI interactions
## Prerequisites
- Docker Engine v20.10+
- Docker Compose
- ~500MB free disk space (Qdrant + embedding model)
## Setup
1. **Download the project** and change into its directory:
```bash
cd local-context7
```
2. **Copy environment file:**
```bash
cp .env.example .env
```
3. **(Optional) Create sample docs:**
```bash
mkdir -p docs/foundryvtt docs/fastapi docs/my-msfs-copilot
```
4. **Start services:**
```bash
docker compose up -d --build
```
5. **Verify they're running:**
```bash
docker compose ps
```
You should see all three services (`qdrant`, `docs-api`, `docs-mcp`) in "Up" status.
6. **Wait for startup completion** (embedding model loads on first API call):
```bash
docker compose logs -f docs-api # Watch for "Initialization complete."
```
## Add Docs
Place your documentation folders under the root directory:
```bash
mkdir -p docs/foundryvtt/docs
cp /path/to/foundryvtt/*.md docs/foundryvtt/docs/
mkdir -p docs/fastapi
```
Supported file types: `.md`, `.txt`, `.py`, `.js`, `.ts`, `.json`, `.yaml`, `.yml`, `.html`, `.css`, `.pdf` (via pypdf).
To add new documents to the vector store after adding them, run:
```bash
docker compose exec docs-api python -c "from app.ingest import ingest_all; import asyncio; asyncio.run(ingest_all())"
```
Or from another terminal:
```bash
curl -X POST http://localhost:8787/api/v1/ingest/all \
-H "Content-Type: application/json"
```
## Index Docs (Run Ingestion)
After adding documents, index them into the vector store:
```bash
docker compose exec docs-api python -c "from app.ingest import ingest_all; import asyncio; asyncio.run(ingest_all())"
```
Expected output shows progress like:
```
[Detection] Scanning for libraries in: /docs
[Detection] Found 3 library(ies)
[Library] Processing: foundryvtt
[Library] Scanning for files in: /docs/foundryvtt
[Library] Found 5 document(s)
...
```
## Search Docs
### Via API (POST to `/search`)
Request body:
```json
{
"query": "how do hooks work",
"library_id": "foundryvtt",
"limit": 10
}
```
Response example:
```json
{
"query": "hooks",
"library_id": "foundryvtt",
"results": [
{
"id": "...",
"score": 0.854,
"library_id": "foundryvtt",
"path": "core-docs.md",
"title": "Core Hooks",
"chunk_index": 2
}
],
"count": 1
}
```
### Via MCP (resolve-library-id, search-docs tools)
## Connect MCP Clients
To use this system with an MCP-enabled client (e.g., Claude Desktop), configure the MCP server endpoint.
### Example: Claude Desktop Config
Add to your `claude_desktop_config.json`:
```json
{
"mcpServers": {
"context7": {
"command": "npx",
"args": [
"@modelcontextprotocol/server-local-context7",
"--url", "http://localhost:8788"
],
"env": {
"DOCS_API_URL": "http://localhost:8787"
}
}
}
}
```
If the client runs outside Docker and can't reach the API, expose them on host ports or run the MCP server outside Docker (see below).
## Example: Cline/Cursor MCP Config
For Cursor or similar editors using Cline:
```json
// ~/.cursor/mcp.json
{
"context7": {
"type": "stdio",
"command": "docker",
"args": [
"exec",
"-it",
"docs-mcp",
"uvicorn",
"server:app",
"--host",
"0.0.0.0",
"--port",
"8788"
]
}
}
```
Or if exposing MCP on host port:
```json
{
"context7": {
"type": "stdio",
"command": "docker",
"args": [
"run",
"-it",
"--rm",
"-p",
"8788:8788",
"--name",
"context7-mcp-standalone",
"-e",
"DOCS_API_URL=http://host.docker.internal:8787",
"local-context7/docs-mcp"
]
}
}
```
## Troubleshooting
### Services won't start or restart loops
Check logs:
```bash
docker compose logs -f
```
Common issues:
- Port already in use on host → adjust mapping or free the port
- Embedding model failing to load → verify disk space, check for GPU constraints if applicable
### Vector search returns empty results
Ensure you've run ingestion after adding docs:
```bash
docker compose exec docs-api python -c "from app.ingest import ingest_all; import asyncio; asyncio.run(ingest_all())"
```
### Can't connect to docs-api from client outside Docker
Set environment variable for host access in docker-compose.yml or .env:
```yaml
docs-api:
environment:
- DOCS_API_URL=http://host.docker.internal:8787
```
For MCP server specifically:
```yaml
docs-mcp:
environment:
- DOCS_API_URL=http://host.docker.internal:8787
```
## Reset Qdrant and SQLite
To clear all data (vector store and database):
```bash
# Stop services
docker compose down
# Remove volumes (delete Qdrant and db.sqlite)
rm -rf ./data/qdrant ./data/db.sqlite
# Restart fresh
docker compose up -d --build
```
## Expose Through Caddy Reverse Proxy
To add HTTPS and serve under a subdomain, configure Caddy:
**Example `Caddyfile`:**
```caddyfile
docs.yourdomain.com {
reverse_proxy docs-api:8787
handle_path /mcp/* {
reverse_proxy docs-mcp:8788
}
# Enable basic auth (optional, see below)
}
api.yourdomain.com {
reverse_proxy docs-api:8787
}
mcp.yourdomain.com {
reverse_proxy docs-mcp:8788
}
```
## Protect It with Basic Auth
Add authentication using Caddy's built-in `auth_handler` module or `caddy-dedupe-auth`:
**Caddy example with basic auth:**
```caddyfile
docs.yourdomain.com {
reverse_proxy docs-api:8787
auth_token YOUR_API_TOKEN
response_header_accessor path
}
```
Or using the caddy `basic` module from scratch in a reverse proxy setup.
For Docker-based deployment, consider using an authentication middleware or a dedicated reverse proxy with JWT/HTTP Basic configured externally.
## Future Improvements
- Add rate limiting to API endpoints
- Support for streaming responses for large document retrieval
- Chunk overlap configuration via environment variables
- Batch index endpoint improvements
- Metrics/logging aggregation (e.g., Prometheus + Grafana)
- Plugin system for additional data sources