VideoLens
Open source · MIT licensed · ~1,500 video platforms supported

Turn any video into a timestamped, evidence-grounded report.

Drop a bug recording, a meeting, a demo, a tutorial. Ask anything. VideoLens returns a structured analysis with citations to specific moments in the video — for humans or AI agents.

Local files YouTube Loom Vimeo TikTok Twitch Twitter/X + 1,490 more
What it does

Three steps from video to insight.

Most video tools wrap Whisper and stop. VideoLens combines transcription, frame-level vision, and prompt-directed analysis into one cached pipeline.

Drop any video

Local files, YouTube, Loom, Vimeo, TikTok, Twitter/X, Twitch, Reddit, Google Drive, direct URLs. Roughly 1,500 platforms via yt-dlp.

Ask anything

Three modes built in — General, Bug, Meeting — plus your free-form prompt. Each mode tunes the analyst for what matters: repro steps, decisions, frictions, claims.

Evidence at every step

Every finding cites a specific timestamp. Click any citation in the UI to jump the player to that moment. Output as PDF, Markdown, or JSON.

How it works

A cached pipeline you can trust.

Resolve the source. Extract everything. Build a timeline. Synthesize a report. Each step is cached at .videolens/cache/<hash>/ so re-runs are cheap and follow-up questions cost cents.

1 · RESOLVE

Source classification

Detects local file, YouTube, Loom, Vimeo, direct URL, generic page. Reports limitations clearly when a site isn't fully supported.

2 · EXTRACT

Audio + frames + OCR

yt-dlp fetches remote video, ffmpeg samples frames and chunks audio, OpenAI transcribes each chunk, GPT-5.4-mini describes every frame and reads any visible text.

3 · TIMELINE

Time-windowed merge

Frame summaries and transcript segments are merged into time-windowed segments — each with visual, OCR, transcript, scene type, and confidence.

4 · ANALYZE

Mode-driven synthesis

One GPT-5.5 call against the timeline, your prompt, and the active mode → structured findings, evidence citations, recommendations, and ticket-ready tasks.

Built-in modes

A different analyst for each kind of video.

🎬

General

--mode general

Broad review: what's happening, what stands out, what's worth knowing.

Use for: tutorials, demos, content reviews, any video you just want explained.

🐛

Bug

--mode bug

Bug recordings → reproduction steps, severity hint, ticket-ready summary, possible root-cause areas.

Use for: screen recordings of broken UIs, crashes, session replays of failed flows.

🗣️

Meeting

--mode meeting

Decisions, objections, commitments, follow-ups. Uses diarized transcription when available.

Use for: Zoom / Teams / Meet recordings, standups, briefings, sales calls.

🛠️
Coming next: UX (session-replay analysis), Tutorial (step extraction), Product Demo (feature inventory), Content Critique, Privacy (sensitive-info redaction). Each mode is a small prompt-fragment file — adding a new one is ~30 lines.
Quickstart

Run it locally in two minutes.

Python 3.12+, ffmpeg, and an OpenAI API key. That's the whole list.

1 · Install
# macOS
brew install ffmpeg
git clone https://github.com/shadoprizm/videolens.git
cd videolens
uv sync --extra ui
2 · Run
export OPENAI_API_KEY=sk-...

# Web UI
uv run videolens ui

# Or CLI
uv run videolens analyze ./bug.mov \
  --mode bug --prompt "What broke?"
💡
Rough cost: a 5-minute video with 20 frames ≈ $0.20 via OpenAI. A 30-minute meeting ≈ $0.50–$1.50. Per-prompt analysis cache means follow-up questions on the same video cost only cents.
Hosted version

Don't want to self-host?

VideoLens Cloud is on the way — drop a video in the browser, get the same analysis without managing infra. Get notified when it's live.

No spam. One email when it launches.

Roadmap

Where this is going.

Q&A loop

"Analyze once, ask many times" — follow-up questions reuse the cached timeline for cents instead of dollars.

MCP server

Native Model Context Protocol server so Claude Code, Cursor, and other agents can analyze video as a first-class tool.

Semantic search

Embeddings over processed timelines: "find where they talked about pricing", "find every error message", across your whole library.

Session-replay parsers

PostHog, Clarity, Hotjar, FullStory, LogRocket, OpenReplay — read event exports, not just rendered video.