#
Multimodal Search
Curiosity Workspace includes capabilities to extract searchable text from images, audio, and video files using Optical Character Recognition (OCR) and Speech-to-Text (STT).
#
Optical Character Recognition (OCR)
OCR extracts text from scanned documents and images, making them searchable within the workspace.
#
Supported Image Formats
Curiosity can process the following image types for OCR:
- Common Images:
.png,.jpg,.jpeg,.gif,.bmp,.webp,.svg - Professional Formats:
.tif,.tiff,.dng,.raw,.heic,.heif,.psb - Other:
.odg,.otg,.odi - PDF Scans:
.pdffiles containing image-only content.
#
Multi-Language Support
The OCR engine supports multiple languages, including English, French, Spanish, German, and Portuguese.
#
Speech-to-Text (STT)
STT converts spoken language from audio and video files into searchable text. Curiosity leverages Whisper models for high-accuracy transcription.
#
Supported Video Formats
.mp4,.wmv,.mpeg,.avi,.mkv,.mov,.ogv,.3gp,.flv
#
Supported Audio Formats
.mp3,.wav,.mka,.wma,.flac,.aac,.aiff,.m4a,.oga,.weba,.webm
#
Features
- Searchable Transcripts: Find specific words spoken within a video or audio file.
- Timestamped Navigation: Jump directly to the relevant part of the media file from search results.
- Language Detection: Automatically recognizes and transcribes a broad range of languages.