Paperless-ngx is our primary document management system, providing intelligent document organization, OCR capabilities, and AI-powered classification.
- Automatic OCR: Converts scanned documents to searchable text
- AI Classification: Automatic tagging and correspondent assignment
- Multi-format Support: PDF, images, Office documents, emails
- Bulk Upload: Web interface and API for batch processing
- Smart Tagging: AI-powered automatic tag suggestions
- Correspondent Management: Automatic sender/recipient detection
- Document Types: Customizable document classification
- Date Extraction: Automatic date parsing from document content
- Full-text Search: OCR-powered content search
- Advanced Filters: Date ranges, tags, correspondents, types
- Saved Views: Custom search filters for quick access
- Export Options: PDF, CSV, and original format downloads
- Paperless-ngx: Main application server
- PostgreSQL: Document metadata and search index
- Redis: Task queue and caching
- Gotenberg: Document conversion service
- Tika: Text extraction and format detection
- Paperless-AI: Enhanced AI classification
- Document intake (upload, email, scanner)
- Text extraction via Tika
- OCR processing if needed
- AI classification for tags/correspondents
- Storage with full-text indexing
- Thumbnail generation
- Navigate to https://paperless.speicher.family
- Click "Upload" in the top menu
- Drag and drop files or click to browse
- Documents process automatically in background
Send documents to: paperless@speicher.family
- Attachments are automatically imported
- Email body becomes document notes
curl -X POST https://paperless.speicher.family/api/documents/post_document/ \
-H "Authorization: Token YOUR_API_TOKEN" \
-F "document=@document.pdf"
¶ Maintenance Scripts
Located in /home/mspeicher/homelab-lucille4/scripts/:
python delete-unused-tags.py
python merge-correspondents.py
python export-data.py
- Schedule: Daily at 2 AM
- Method: Docker volume snapshots + document export
- Destination: Backblaze B2 + Hetzner Storage
- Retention: 30 days
cd /home/mspeicher/homelab-lucille4
./backup.sh
- Restore PostgreSQL database from backup
- Restore Redis data (optional, can be regenerated)
- Restore document files to media directory
- Restart all Paperless services
- Run document reindex if needed
- Check Gotenberg service:
docker logs paperless-gotenberg
- Verify Tika service:
docker logs paperless-tika
- Restart OCR services if needed
- Check Redis queue:
docker exec paperless-broker redis-cli llen default
- Monitor worker logs:
docker logs paperless
- Increase worker count if needed in docker-compose.yml
## Check all Paperless services
docker ps | grep paperless
## View application logs
docker logs paperless --tail 100
## Check task queue
docker exec paperless-broker redis-cli info
- Endpoint: http://mcp-paperless:8080
- Features: Document search, metadata access
- Used by: AI knowledge system for document context
- Document intake automation
- Email processing rules
- Notification workflows
¶ Database Maintenance
## Vacuum and analyze PostgreSQL
docker exec paperless-db psql -U paperless -c "VACUUM ANALYZE;"
- Configured for persistence with AOF
- Memory limit: 512MB
- Eviction policy: allkeys-lru
- All documents encrypted at rest
- API tokens expire after 90 days
- SSO integration prevents password reuse
- Regular security updates via Docker
- Document Types: Set up for bills, receipts, medical, school
- Quick Search: Use saved views for common document types
- Mobile Access: Works great on phones/tablets via web
- Sharing: Generate share links for specific documents
- Add metadata fields for specific document types
- Examples: warranty expiration, account numbers
- Auto-tag based on content patterns
- Route documents to specific users
- Trigger n8n workflows on document events
- Pre-configured import rules
- Filename parsing patterns
- Automatic folder organization