Search Infrastructure
Digital Asset Managers often struggle with metadata. Expecting humans to painstakingly tag every uploaded image is an archaic approach.
The moment a media file hits Picsha's S3 ingress bucket, an asynchronous event fires an ECS task that automatically extracts structural analysis from the image via AWS Rekognition and visual embeddings via AWS Bedrock Titanium, streaming the result into our underlying OpenSearch Vector Database.
Standard Search (Keyword)
By default, triggering a Standard Search executes an ultra-fast query seeking direct matches against the original_name, extracted text (OCR), and any manual metadata tags attached.
This yields instantaneous results but relies heavily on precise keyword matching.
AI Search (Hybrid Mode)
Passing mode: "ai" to our Search endpoint unleashes our most powerful API abstraction.
When you query our servers with conversational NLP (e.g. "Show me the photos of dogs running on the beach from last Tuesday"):
- Agentic Parsing: We route the string through an LLM layer that parses actionable metadata constraints (e.g., isolating
last Tuesdayinto a literal timestamp boundary). - Text-to-Image Embedding: The rest of the semantic request (
"dogs running on the beach") is converted into a high-dimensional multimodal vector array. - k-NN Vector Execution: We perform a blistering k-NN (k-nearest neighbors) query against your Organization's dedicated vector space in OpenSearch.
Strict Intent Filtering (Methodology)
To prevent multimodal embeddings from surfacing false positives (e.g., searching for "Bill laughing" and getting dozens of unrelated but highly scored laughing people), our Agentic Parsing layer enforces strict, absolute metadata constraints prior to executing the vector search:
- Visual Concepts & Colors: Handled purely by vector distances. If you search for "pink flowers", no explicit filters are applied. The embedding inherently calculates and surfaces visually matching images.
- Human Names (Entities): Treated as non-negotiable hard filters. If the parsing layer detects a human name (e.g.,
"Hello bill"), it interprets the goal as a definitive entity search. It generates an explicit OpenSearch filter demanding that the asset must possess a match inside theai.facesoruser.tagsindices. - Methodology Impact: While this guarantees exceptionally high accuracy for photographic face retrieval, it means text-heavy documents containing a person's name buried in a paragraph summary (like an
ai.description) will be intentionally dropped from AI search pools unless the asset is explicitly tagged.
Search Thresholds
AI Search returns hits ordered by relevance score (matchScore). To prevent hallucinated matches from muddying the results, our backend inherently filters out any assets falling below a relevance threshold.
When building applications, you can dictate this exact cutoff threshold explicitly:
{
"query": "corporate headshots with blue backgrounds",
"mode": "ai",
"threshold": 0.65
}