Drop a bug recording, a meeting, a demo, a tutorial. Ask anything. VideoLens returns a structured analysis with citations to specific moments in the video — for humans or AI agents.
Most video tools wrap Whisper and stop. VideoLens combines transcription, frame-level vision, and prompt-directed analysis into one cached pipeline.
Local files, YouTube, Loom, Vimeo, TikTok, Twitter/X, Twitch, Reddit, Google Drive, direct URLs. Roughly 1,500 platforms via yt-dlp.
Three modes built in — General, Bug, Meeting — plus your free-form prompt. Each mode tunes the analyst for what matters: repro steps, decisions, frictions, claims.
Every finding cites a specific timestamp. Click any citation in the UI to jump the player to that moment. Output as PDF, Markdown, or JSON.
Resolve the source. Extract everything. Build a timeline. Synthesize a report. Each step is cached at .videolens/cache/<hash>/ so re-runs are cheap and follow-up questions cost cents.
Detects local file, YouTube, Loom, Vimeo, direct URL, generic page. Reports limitations clearly when a site isn't fully supported.
yt-dlp fetches remote video, ffmpeg samples frames and chunks audio, OpenAI transcribes each chunk, GPT-5.4-mini describes every frame and reads any visible text.
Frame summaries and transcript segments are merged into time-windowed segments — each with visual, OCR, transcript, scene type, and confidence.
One GPT-5.5 call against the timeline, your prompt, and the active mode → structured findings, evidence citations, recommendations, and ticket-ready tasks.
--mode general
Broad review: what's happening, what stands out, what's worth knowing.
Use for: tutorials, demos, content reviews, any video you just want explained.
--mode bug
Bug recordings → reproduction steps, severity hint, ticket-ready summary, possible root-cause areas.
Use for: screen recordings of broken UIs, crashes, session replays of failed flows.
--mode meeting
Decisions, objections, commitments, follow-ups. Uses diarized transcription when available.
Use for: Zoom / Teams / Meet recordings, standups, briefings, sales calls.
Python 3.12+, ffmpeg, and an OpenAI API key. That's the whole list.
# macOS
brew install ffmpeg
git clone https://github.com/shadoprizm/videolens.git
cd videolens
uv sync --extra ui
export OPENAI_API_KEY=sk-... # Web UI uv run videolens ui # Or CLI uv run videolens analyze ./bug.mov \ --mode bug --prompt "What broke?"
VideoLens Cloud is on the way — drop a video in the browser, get the same analysis without managing infra. Get notified when it's live.
No spam. One email when it launches.
"Analyze once, ask many times" — follow-up questions reuse the cached timeline for cents instead of dollars.
Native Model Context Protocol server so Claude Code, Cursor, and other agents can analyze video as a first-class tool.
Embeddings over processed timelines: "find where they talked about pricing", "find every error message", across your whole library.
PostHog, Clarity, Hotjar, FullStory, LogRocket, OpenReplay — read event exports, not just rendered video.