Semantic Search
Enable semantic image search using Cloudflare Vectorize to index AI-generated captions as embeddings. Users can search by natural language descriptions instead of just keywords.
Why It's Useful
Traditional keyword search only matches exact text in filenames, tags, or alt text. Semantic search understands meaning - searching for "sunset over water" can find images captioned as "orange sky reflecting on ocean" even if those exact words aren't in the metadata.
This requires AI classification to be enabled (which generates captions), then Vectorize stores those captions as 768-dimensional embeddings for similarity matching.
Setup
Enable both AI classification and Vectorize:
ENABLE_AI_CLASSIFICATION='true'
ENABLE_VECTORIZE='true'Terraform provisions the Vectorize index automatically when ENABLE_VECTORIZE='true'. The index stores embeddings with metadata (owner, image ID, caption).
How it Works
When AI classification runs on an image, the generated caption is converted to an embedding vector using the BGE Base embedding model. This vector is stored in Vectorize for semantic similarity queries.
Embedding model: @cf/baai/bge-base-en-v1.5 (768 dimensions)
Storage format:
{
id: "alice:abc123", // owner:imageId
values: [0.1, -0.2, ...], // 768-dim vector
metadata: {
owner: "alice",
imageId: "abc123",
caption: "A golden retriever playing in a park"
}
}Current Implementation Status
Vectorize embeddings are stored but semantic search queries are not yet implemented in the search endpoint. The search currently uses FTS5 (full-text search) with boosted weights for AI captions:
- Image ID: 5x boost
- Filename: 3x boost
- Alt text: 2x boost
- AI caption: 2x boost
- AI labels: 1.5x boost
To implement full semantic search, you'd need to:
- Generate query embedding from user's search text
- Query Vectorize for similar vectors
- Combine vector results with FTS5 keyword matches
- Re-rank results by hybrid score
Cost Considerations
Vectorize pricing:
- First 30M queried dimensions per month: Free
- First 5M stored dimensions per month: Free
- Beyond free tier: $0.04 per million queried dimensions, $0.05 per million stored dimensions
Per-image cost:
- Each embedding: 768 dimensions
- 100 images: 76,800 stored dimensions (well within free tier)
- 10,000 images: 7.68M dimensions (£0.13/month storage)
Vectorize is relatively cheap for small to medium deployments. Most instances stay within the free tier.
Local Development
Vectorize requires a remote Cloudflare connection and doesn't work in local dev with wrangler dev. The code handles this gracefully - embeddings are stored in production but skipped locally without errors.
You'll see logs like:
[WARN] Vectorize not available in local dev, skipping embedding storageThis is expected behaviour. Deploy to staging/production to test Vectorize functionality.
Disabling
To disable semantic search:
ENABLE_VECTORIZE='false'AI classification will continue generating captions, but they won't be stored as embeddings. Existing embeddings remain in the index until you delete them manually via the Cloudflare dashboard or API.