Life Sciences & Computational Biology
The Picsha AI platform provides a unique value proposition for the Life Sciences sector. Computational biologists, data scientists, and ML engineers routinely work with complex, unstructured "dark data"—massive microscopy sets, fluorescent cell assays, and high-resolution biological imaging.
Traditionally, organizing this data relies on extremely brittle file naming conventions and custom ingestion scripts before the data can even be utilized for machine learning. Picsha AI solves this by introducing zero-config semantic retrieval and on-the-fly ML pre-processing.
Core Value Proposition
- Eradicate Metadata Silos: Upload heavy scientific formats (RAW, HEIC, TIFF) natively. Picsha AI extracts the technical details and encodes the visual data using Amazon Bedrock, making the images instantly searchable via Natural Language out of the box.
- "On-the-Fly" ML Pre-Processing: Instead of downloading 10GB TIFF files into a local node only to resize them in Pandas or PyTorch, Picsha's Edge image delivery pipeline allows you to request the image pre-processed. Append
?w=512&fmt=webpdirectly to the request, saving compute, bandwidth, and wait times.
Python SDK Integration
Our @picsha-ai/python-sdk is built to seamlessly integrate with a data scientist's Jupyter environment.
1. Uploading Complex Formats
Upload directly to Picsha from local lab instruments. You can pass explicit hardware or assay tags.
import picsha
client = picsha.Client(api_key="sk_your_key_here")
# Uploading heavy RAW/HEIC formats natively
result = client.upload(
file_path="./data/microscopy/sample_01.heic",
tags=["assay:123", "cancer_cells", "fluorescent"],
metadata={"capture_date": "2024-04-12", "experiment": "alpha"}
)
print("Uploaded Asset ID:", result.asset.id)
2. Zero-Config Hybrid Semantic Search
Instead of remembering file names or strict tags, researchers can query their datasets semantically using AI. Picsha will find files based on what is visually represented.
# 'mode="ai"' converts natural language into hybrid vector/OpenSearch queries
search_results = client.search(
query="fluorescent cells in alpha experiment",
mode="ai",
limit=5
)
for asset in search_results.assets:
print(f"{asset.id} - {asset.original_filename}")
3. Load Arrays Directly into PyTorch or Pandas
By leveraging Picsha's pipeline, you can request standardized WebP or JPEG payloads on the fly, feeding them straight into PIL without having to manually process massive TIFFs.
import requests
from PIL import Image
from io import BytesIO
# Retrieve the first search result
asset = search_results.assets[0]
# Request the edge pipeline to convert to WebP and scale to 512px
img_url = asset.generate_url(width=512, format="jpeg")
# Stream straight into local analysis tools
response = requests.get(img_url)
img = Image.open(BytesIO(response.content))
img.show()
Batch Operations
For high-throughput requirements, such as synchronizing multiple directories from new instrument output, developers can leverage our underlying httpx implementation for parallel ingestion.
import picsha
import asyncio
async def ingest_batch():
async with picsha.AsyncClient(api_key="sk_your_key_here") as client:
# Use asyncio.gather to upload thousands of images asynchronously
upload_result = await client.upload(
file_path="./data/sample_02_100x.raw",
tags=["assay:456"]
)
asyncio.run(ingest_batch())
[!NOTE] For more details on the Python SDK's capabilities, please see the Python SDK Reference page.