# Multimodal Search

Curiosity Workspace includes capabilities to extract searchable text from images, audio, and video files using Optical Character Recognition (OCR) and Speech-to-Text (STT).

# Optical Character Recognition (OCR)

OCR extracts text from scanned documents and images, making them searchable within the workspace.

# Supported Image Formats

Curiosity can process the following image types for OCR:

  • Common Images: .png, .jpg, .jpeg, .gif, .bmp, .webp, .svg
  • Professional Formats: .tif, .tiff, .dng, .raw, .heic, .heif, .psb
  • Other: .odg, .otg, .odi
  • PDF Scans: .pdf files containing image-only content.

# Multi-Language Support

The OCR engine supports multiple languages, including English, French, Spanish, German, and Portuguese.

# Speech-to-Text (STT)

STT converts spoken language from audio and video files into searchable text. Curiosity leverages Whisper models for high-accuracy transcription.

# Supported Video Formats

  • .mp4, .wmv, .mpeg, .avi, .mkv, .mov, .ogv, .3gp, .flv

# Supported Audio Formats

  • .mp3, .wav, .mka, .wma, .flac, .aac, .aiff, .m4a, .oga, .weba, .webm

# Features

  • Searchable Transcripts: Find specific words spoken within a video or audio file.
  • Timestamped Navigation: Jump directly to the relevant part of the media file from search results.
  • Language Detection: Automatically recognizes and transcribes a broad range of languages.