Skip to content

Video Analysis API

Analyze and convert video content to text using PrimeThink's video analysis APIs. These tools enable you to extract structured descriptions, keyframes, and transcripts from video files.

Overview

PrimeThink provides several video processing capabilities:

  • Video Analysis - Analyze video content using Gemini with optional speaker count and instructions
  • Video-to-Text - Convert video to structured text descriptions using two strategies
  • Keyframe Extraction - Extract scene-change keyframes from video as individual images (AI assistant tool)

Authentication

All API requests require authentication. See API Authentication for details on obtaining and using your API key.

API Endpoints

For complete API specifications, see the Interactive API Documentation.

POST /video/analyze

Analyze a video using Gemini. Returns a text description/analysis of the video content.

Parameters (multipart/form-data):

Parameter Type Required Description
file file Yes Video file to analyze
extra_instructions string No Additional instructions for the analysis model

POST /video/video-to-text

Convert a video to a structured text description. Supports two conversion methods and accepts either a file upload or an existing document ID.

Parameters (multipart/form-data):

Parameter Type Required Description
file file No* Video file to analyze
document_id integer No* ID of an existing video document
method string No Conversion method: "full_video" (default) or "frames_plus_audio"
model string No Override the default LLM model
extra_instructions string No Additional instructions for the analysis

*Provide exactly one of file or document_id.

Conversion Methods:

Method Description Best For
full_video Uploads the entire video to Gemini for analysis Short to medium videos, when Gemini/Google API key is configured
frames_plus_audio Extracts scene-change keyframes and combines them with audio transcript for multimodal analysis Longer videos, when you want keyframe-level detail, or when Gemini is not available (falls back to OpenAI)

Response:

{
    "text": "## Summary\n\nThis video shows a product demo...\n\n## Key Points\n- ...\n\n## Timeline\n- [00:00] Opening scene...",
    "method": "full_video"
}

The structured response includes sections for Summary, Key Points, Timeline, Notable Details, and Conclusions.

Quick Start Examples

Basic Video Analysis

curl -X POST "https://api.primethink.ai/video/analyze" \
  -H "Authorization: Token YOUR_API_KEY" \
  -F "file=@presentation.mp4"

Video-to-Text (Full Video)

curl -X POST "https://api.primethink.ai/video/video-to-text" \
  -H "Authorization: Token YOUR_API_KEY" \
  -F "file=@meeting-recording.mp4" \
  -F "method=full_video"

Video-to-Text (Frames + Audio)

curl -X POST "https://api.primethink.ai/video/video-to-text" \
  -H "Authorization: Token YOUR_API_KEY" \
  -F "file=@lecture.mp4" \
  -F "method=frames_plus_audio"

Using an Existing Document

curl -X POST "https://api.primethink.ai/video/video-to-text" \
  -H "Authorization: Token YOUR_API_KEY" \
  -F "document_id=123" \
  -F "method=frames_plus_audio" \
  -F "extra_instructions=Focus on the code examples shown on screen"

When using document_id with frames_plus_audio, if the document already has an extracted transcript, it is reused instead of re-transcribing the audio.

With Extra Instructions

curl -X POST "https://api.primethink.ai/video/video-to-text" \
  -H "Authorization: Token YOUR_API_KEY" \
  -F "file=@demo.mp4" \
  -F "method=full_video" \
  -F "extra_instructions=Pay special attention to the UI interactions and describe each screen transition"

AI Assistant Tools

Video Keyframe Extraction

The AI assistant has access to the extract_video_keyframes tool, which extracts scene-change keyframes from a video document and saves them as individual images in the chat.

How to use it: Ask the AI assistant to extract keyframes from a video document in the chat. For example:

  • "Extract keyframes from the video I just uploaded"
  • "Get the key scenes from document #123"
  • "Extract keyframes with high sensitivity from the presentation video"

Parameters the assistant uses:

Parameter Type Default Description
document_id integer required ID of the video document in the chat
sensitivity string "medium" Detection sensitivity: "low" (few keyframes, major scenes only), "medium" (balanced), "high" (many keyframes, catches subtle changes)
folder string "keyframes" Destination folder for the extracted images

Sensitivity Levels:

Level Description Use Case
low Detects only major scene changes, produces few keyframes Long videos where you want a high-level overview
medium Balanced detection, moderate number of keyframes General purpose, most videos
high Catches subtle changes, produces many keyframes Short videos, detailed analysis, presentations with incremental slides

The extracted keyframes are saved as JPEG images in the specified folder within the chat, and duplicate frames are automatically filtered out using perceptual hashing.

Video Transcription

The AI assistant can also transcribe audio from video documents using the transcribe_video tool. This extracts the audio track and produces a text transcript.

Best Practices

  1. Choosing a Method:
  2. Use full_video for short/medium videos when you need a holistic analysis
  3. Use frames_plus_audio for longer videos or when visual keyframe detail matters

  4. Extra Instructions: Be specific about what you want from the analysis. For example:

  5. "Focus on the text shown on screen"
  6. "Describe the speaker's body language"
  7. "List all products shown in the video"

  8. Document Reuse: When analyzing a video that's already uploaded to a chat, use document_id instead of re-uploading — this saves bandwidth and reuses existing transcripts.

  9. Keyframe Sensitivity: Start with medium and adjust based on results. Use low for overview thumbnails and high for frame-by-frame analysis of presentations.