Video Analysis API¶

Analyze and convert video content to text using PrimeThink's video analysis APIs. These tools enable you to extract structured descriptions, keyframes, and transcripts from video files.

Overview¶

PrimeThink provides several video processing capabilities:

Video Analysis - Analyze video content using Gemini with optional speaker count and instructions
Video-to-Text - Convert video to structured text descriptions using two strategies
Keyframe Extraction - Extract scene-change keyframes from video as individual images (AI assistant tool)

Authentication¶

All API requests require authentication. See API Authentication for details on obtaining and using your API key.

API Endpoints¶

For complete API specifications, see the Interactive API Documentation.

POST `/video/analyze`¶

Analyze a video using Gemini. Returns a text description/analysis of the video content.

Parameters (multipart/form-data):

Parameter	Type	Required	Description
`file`	file	Yes	Video file to analyze
`extra_instructions`	string	No	Additional instructions for the analysis model

POST `/video/video-to-text`¶

Convert a video to a structured text description. Supports two conversion methods and accepts either a file upload or an existing document ID.

Parameters (multipart/form-data):

Parameter	Type	Required	Description
`file`	file	No*	Video file to analyze
`document_id`	integer	No*	ID of an existing video document
`method`	string	No	Conversion method: `"full_video"` (default) or `"frames_plus_audio"`
`model`	string	No	Override the default LLM model
`extra_instructions`	string	No	Additional instructions for the analysis

*Provide exactly one of file or document_id.

Conversion Methods:

Method	Description	Best For
`full_video`	Uploads the entire video to Gemini for analysis	Short to medium videos, when Gemini/Google API key is configured
`frames_plus_audio`	Extracts scene-change keyframes and combines them with audio transcript for multimodal analysis	Longer videos, when you want keyframe-level detail, or when Gemini is not available (falls back to OpenAI)

Response:

{
    "text": "## Summary\n\nThis video shows a product demo...\n\n## Key Points\n- ...\n\n## Timeline\n- [00:00] Opening scene...",
    "method": "full_video"
}

The structured response includes sections for Summary, Key Points, Timeline, Notable Details, and Conclusions.

Quick Start Examples¶

Basic Video Analysis¶

curl -X POST "https://api.primethink.ai/video/analyze" \
  -H "Authorization: Token YOUR_API_KEY" \
  -F "file=@presentation.mp4"

Video-to-Text (Full Video)¶

curl -X POST "https://api.primethink.ai/video/video-to-text" \
  -H "Authorization: Token YOUR_API_KEY" \
  -F "file=@meeting-recording.mp4" \
  -F "method=full_video"

Video-to-Text (Frames + Audio)¶

curl -X POST "https://api.primethink.ai/video/video-to-text" \
  -H "Authorization: Token YOUR_API_KEY" \
  -F "file=@lecture.mp4" \
  -F "method=frames_plus_audio"

Using an Existing Document¶

curl -X POST "https://api.primethink.ai/video/video-to-text" \
  -H "Authorization: Token YOUR_API_KEY" \
  -F "document_id=123" \
  -F "method=frames_plus_audio" \
  -F "extra_instructions=Focus on the code examples shown on screen"

When using document_id with frames_plus_audio, if the document already has an extracted transcript, it is reused instead of re-transcribing the audio.

With Extra Instructions¶

curl -X POST "https://api.primethink.ai/video/video-to-text" \
  -H "Authorization: Token YOUR_API_KEY" \
  -F "file=@demo.mp4" \
  -F "method=full_video" \
  -F "extra_instructions=Pay special attention to the UI interactions and describe each screen transition"

AI Assistant Tools¶

Video Keyframe Extraction¶

The AI assistant has access to the extract_video_keyframes tool, which extracts scene-change keyframes from a video document and saves them as individual images in the chat.

How to use it: Ask the AI assistant to extract keyframes from a video document in the chat. For example:

"Extract keyframes from the video I just uploaded"
"Get the key scenes from document #123"
"Extract keyframes with high sensitivity from the presentation video"

Parameters the assistant uses:

Parameter	Type	Default	Description
`document_id`	integer	required	ID of the video document in the chat
`sensitivity`	string	`"medium"`	Detection sensitivity: `"low"` (few keyframes, major scenes only), `"medium"` (balanced), `"high"` (many keyframes, catches subtle changes)
`folder`	string	`"keyframes"`	Destination folder for the extracted images

Sensitivity Levels:

Level	Description	Use Case
`low`	Detects only major scene changes, produces few keyframes	Long videos where you want a high-level overview
`medium`	Balanced detection, moderate number of keyframes	General purpose, most videos
`high`	Catches subtle changes, produces many keyframes	Short videos, detailed analysis, presentations with incremental slides

The extracted keyframes are saved as JPEG images in the specified folder within the chat, and duplicate frames are automatically filtered out using perceptual hashing.

Video Transcription¶

The AI assistant can also transcribe audio from video documents using the transcribe_video tool. This extracts the audio track and produces a text transcript.

Best Practices¶

Choosing a Method:
Use full_video for short/medium videos when you need a holistic analysis
Use frames_plus_audio for longer videos or when visual keyframe detail matters
Extra Instructions: Be specific about what you want from the analysis. For example:
"Focus on the text shown on screen"
"Describe the speaker's body language"
"List all products shown in the video"
Document Reuse: When analyzing a video that's already uploaded to a chat, use document_id instead of re-uploading — this saves bandwidth and reuses existing transcripts.
Keyframe Sensitivity: Start with medium and adjust based on results. Use low for overview thumbnails and high for frame-by-frame analysis of presentations.

Audio Generation API - Speech-to-text transcription
Audio Diarization API - Speaker-labelled transcription
Media Generation in Live Pages - Media generation from Live Pages