How to Add Subtitles to Recorded Zoom Meetings with AI

Tested prompts for add subtitles to zoom recordings compared across 5 leading AI models.

BEST BY JUDGE SCORE Claude Haiku 4.5 8/10

If you have a Zoom recording and need subtitles on it, you have two problems: Zoom's built-in transcript is often inaccurate and outputs as a separate VTT file, not burned into the video. Most people searching this want subtitles that actually travel with the video file so they can share it on LinkedIn, upload it to a company portal, or make it accessible for deaf or hard-of-hearing viewers without requiring the audience to manually load a caption file.

The fastest modern approach is to paste your Zoom transcript or audio description into an AI model and have it generate clean, properly formatted subtitle blocks, then use a free tool to burn them into the MP4. This skips expensive captioning services and the slow manual process of timestamping every line yourself.

This page walks you through exactly how to prompt an AI to generate accurate, timed subtitle text from your Zoom recording content. The comparison table below shows outputs from four different models so you can pick what works for your workflow.

When to use this

This AI-assisted approach is the right fit when you have a Zoom recording that needs subtitles added quickly, accurately, and without paying per-minute captioning fees. It works especially well if you already have Zoom's auto-generated transcript as a starting point and need it cleaned up and reformatted into a proper subtitle file.

You recorded a client-facing webinar and need to post it publicly with accessibility-compliant captions
Your team recorded an internal training session and non-native English speakers need subtitles to follow along
You are uploading a Zoom recording to YouTube or Vimeo and need an SRT file to upload alongside it
You need to burn subtitles directly into the video file for sharing on social platforms like LinkedIn where captions are required for engagement
You have a Zoom transcript that is mostly correct but needs cleaned-up formatting before it can be used as a subtitle file

When this format breaks down

The recording contains heavy technical jargon, product names, or speaker names that an AI model has no context for and will consistently hallucinate or misspell, requiring more correction time than just doing it manually
You need legally compliant captions for broadcast or government content where a certified human captioner and specific accuracy standards are required by law
The audio quality of the Zoom recording is poor enough that even Zoom's auto-transcript is largely unusable, meaning there is no reliable source text for the AI to work from
You need real-time live captions during an ongoing Zoom meeting rather than post-production subtitles on a recorded file

The prompt we tested

You are an expert video captioning assistant specializing in adding accurate, well-timed subtitles to Zoom meeting recordings. Based on the Zoom recording details provided below, generate a complete workflow and deliverable for adding AI-generated subtitles.

Instructions: Structure the response with three clearly labeled sections (Workflow, Sample SRT Output, Review Tips). Keep the workflow to 5-8 numbered steps, format the SRT excerpt with proper timestamps (HH:MM:SS,mmm --> HH:MM:SS,mmm) and sequential numbering, and keep the total response under 500 words using clear, practical language.

Zoom Recording Details:
I have a 45-minute Zoom recording of a weekly marketing team standup with 4 participants discussing Q4 campaign performance, budget reallocation, and upcoming product launch timelines. The recording is saved as an MP4 file locally, and I need English subtitles with speaker labels that I can burn into the video before sharing with stakeholders who prefer watching on mute.

Provide: (1) a step-by-step method to add AI subtitles to the recording, including recommended tools and settings, (2) a sample SRT-formatted subtitle excerpt based on the content described, and (3) tips for reviewing accuracy, speaker labels, and exporting the final captioned video.

Model comparison

Model	Quality	Cost	Latency	Tokens
Claude Opus 4.7	8	$0.1135	22.4s	457 / 1422
Claude Haiku 4.5	8	$0.0049	9.8s	302 / 924
GPT-5.4	8	$0.0130	13.8s	262 / 820
Gemini 2.5 Pro	5	$0.0027	22.1s	262 / 473
Grok 4.1 Fast Reasoning	8	$0.0003	31.4s	407 / 442

Outputs

Claude Opus 4.7 8/10

# AI Subtitle Workflow for Zoom Marketing Standup

## Workflow

1. **Prepare the file**: Confirm the MP4 is finalized and rename it descriptively (e.g., `Q4_Marketing_Standup_2024-11-05.mp4`). Move it to a dedicated project folder.

2. **Choose a transcription tool**: Upload the MP4 to an AI service that supports speaker diarization. Recommended options:
   - **Descript** (best for speaker labels + editing)
   - **Otter.ai** (fast, accurate for meetings)
   - **Rev AI** or **Happy Scribe** (high accuracy, paid)
   
   Enable **speaker identification**, set language to **English (US)**, and toggle **timestamps on**.

3. **Generate and label speakers**: After AI processing (usually 5–10 minutes), relabel generic tags ("Speaker 1") with real names (e.g., Maya, Jordan, Priya, Alex) based on the intro segment.

4. **Export as SRT**: Export captions in **.srt format** with settings: max 2 lines per caption, ~42 characters per line, 1–6 seconds per cue.

5. **Review and edit**: Open the SRT in a text editor or Descript/Subtitle Edit. Correct brand names, acronyms (CPC, ROAS, CTR), and fix overlapping timestamps.

6. **Burn subtitles into video**: Use **HandBrake** (free) or **FFmpeg** to hard-code captions. FFmpeg command:
   ```
   ffmpeg -i input.mp4 -vf "subtitles=captions.srt" -c:a copy output.mp4
   ```

7. **Quality-check the final MP4**: Play the exported file at 1.5× speed to confirm sync, legibility, and speaker label accuracy.

8. **Share with stakeholders**: Upload to your preferred platform (SharePoint, Google Drive, Vimeo) with a clear title noting "Captioned Version."

## Sample SRT Output

```
1
00:00:04,120 --> 00:00:07,480
[Maya]: Good morning, everyone. Let's kick
off the Q4 standup.

2
00:00:07,600 --> 00:00:11,250
[Maya]: Jordan, can you start with campaign
performance highlights?

3
00:00:11,800 --> 00:00:15,900
[Jordan]: Sure. Our Q4 paid social spend
is tracking 12% above forecast.

4
00:00:16,100 --> 00:00:20,300
[Jordan]: ROAS on the holiday creative set
came in at 4.2, which is strong.

5
00:00:20,800 --> 00:00:24,600
[Priya]: That's great. Should we reallocate
from display to social?

6
00:00:24,900 --> 00:00:29,100
[Alex]: I'd suggest shifting 15K before the
product launch on November 20th.

7
00:00:29,400 --> 00:00:33,200
[Maya]: Agreed. Let's finalize the budget
shift by Friday.
```

## Review Tips

- **Accuracy**: Focus proofreading on jargon, product names, metrics, and numbers—AI often misrenders "ROAS," "CTR," or dollar figures. Listen at 1× speed for dense sections.
- **Speaker labels**: Verify the first 2–3 lines per speaker; diarization errors usually cluster when participants interrupt or talk over each other. Merge misattributed segments manually.
- **Timing**: Ensure no caption exceeds 6 seconds on screen; split long sentences. Avoid gaps under 100ms between cues.
- **Readability**: Keep lines ≤42 characters; use sentence case, not ALL CAPS. Position subtitles in the lower-third safe zone.
- **Export settings**: Use H.264 codec, 1080p, bitrate 8–10 Mbps for crisp text. Test playback on mobile and desktop before sharing—stakeholders on mute depend entirely on caption legibility.
- **Archive**: Keep both the original MP4 and the SRT file separately in case edits are needed later.

Claude Haiku 4.5 8/10

# Video Captioning Workflow for Zoom Meeting Recording

## Workflow

1. **Extract and Prepare Audio**
- Upload your MP4 file to an AI captioning platform (Descript, Rev, or Kapwing). These tools automatically process video files and generate timestamped subtitles with speaker detection.
- Alternatively, use free open-source tools like Whisper (OpenAI) for audio-to-text conversion if budget is limited.

2. **Generate Initial Subtitles with Speaker Labels**
- Select English language and enable speaker identification in your chosen tool's settings.
- Process the 45-minute recording (typically takes 5-15 minutes depending on platform).
- The AI will automatically segment dialogue and attempt to label speakers (Speaker 1, Speaker 2, etc.).

3. **Manual Speaker Identification**
- Review the generated subtitles and identify speakers by name (e.g., "Sarah," "Mike," "Jennifer," "David").
- Use the Zoom participant list or meeting notes to assign correct names to speaker labels.
- Update the SRT file to replace generic labels with actual names for clarity.

4. **Edit for Accuracy and Timing**
- Review the transcript line-by-line for AI errors, especially industry jargon ("Q4 campaign," "budget reallocation," "product launch timelines").
- Adjust timestamps if dialogue overlaps or timing feels off.
- Break long sentences into 2-3 readable subtitle chunks (aim for 42 characters per line maximum).

5. **Quality Check and Formatting**
- Verify speaker labels are consistent throughout.
- Ensure no subtitles remain on-screen longer than 7 seconds.
- Check that technical terms related to marketing metrics are spelled correctly.

6. **Burn Subtitles into Video**
- Export your corrected SRT file from the editing tool.
- Use FFmpeg, Handbrake, or your platform's built-in burn feature to embed subtitles permanently into the MP4.
- Command example: `ffmpeg -i recording.mp4 -vf subtitles=captions.srt output.mp4`

7. **Final Export and Delivery**
- Test the output by playing it on mute to confirm readability and speaker identification.
- Export at original resolution and share with stakeholders via secure file transfer or cloud storage.

---

## Sample SRT Output

```
1
00:02:15,000 --> 00:02:22,500
Sarah: Alright team, let's kick off with Q4 campaign
performance. We saw a 23% increase in engagement.

2
00:02:23,000 --> 00:02:29,800
Mike: That's solid. Did we hit our conversion targets
for the regional campaigns?

3
00:02:30,200 --> 00:02:38,600
Sarah: Yes, three of four regions exceeded targets.
The Southeast campaign needs optimization.

4
00:02:39,100 --> 00:02:46,900
Jennifer: Moving to budget reallocation—I recommend
shifting $50K from underperforming channels.

5
00:02:47,400 --> 00:02:55,300
David: Agreed. Should we reallocate toward the product
launch timeline? We're 8 weeks out.
```

---

## Review Tips

- **Accuracy**: Playback at 0.75x speed to catch rapid speech or overlapping dialogue.
- **Speaker Labels**: Cross-reference with Zoom's "Participants" panel if recording metadata includes attendee names.
- **Technical Terms**: Create a glossary of marketing-specific terminology beforehand and search-replace errors.
- **Timing Gaps**: Ensure subtitles appear 1-2 seconds before speech starts for better comprehension on mute.
- **Testing**: Watch 5-10 minutes of the final burnt video on a mobile device to simulate stakeholder viewing conditions.

GPT-5.4 8/10

## Workflow

1. **Prepare the file**
   - Confirm the Zoom recording is a local **MP4** and that audio is clear.
   - Rename it clearly, e.g., `Marketing_Standup_Q4_2026-04-21.mp4`.

2. **Choose an AI transcription tool**
   - Good options: **Descript**, **Adobe Premiere Pro Speech to Text**, **CapCut Desktop**, or **Whisper-based tools** like MacWhisper.
   - Best fit for speaker labels: **Descript** or **Premiere Pro** with speaker detection/diarization.

3. **Generate the transcript**
   - Import the MP4.
   - Set language to **English**.
   - Enable **speaker identification** if available.
   - Use subtitle settings such as:
     - **Max 2 lines per caption**
     - **32–42 characters per line**
     - **Caption length: 1–6 seconds**

4. **Edit the transcript**
   - Correct key terms such as **Q4**, campaign names, product names, and budget figures.
   - Replace generic labels like “Speaker 1” with real labels, e.g., **Sarah:**, **James:**.
   - Split long captions so they read naturally.

5. **Export subtitle file**
   - Export as **SRT** first for easy review and reuse.
   - Make sure timestamps match the spoken audio and speaker labels are included in the subtitle text.

6. **Burn subtitles into the video**
   - In Premiere Pro, CapCut, Descript, or HandBrake + subtitle workflow, import the edited SRT.
   - Style for mute viewing:
     - Sans-serif font
     - White text with black shadow/box
     - Bottom-center placement
     - Medium font size for mobile and desktop

7. **Export final deliverables**
   - Export:
     - **Captioned MP4** with burned-in subtitles
     - **SRT file** as a separate deliverable
   - Recommended export: **H.264 MP4, 1080p, AAC audio**.

---

## Sample SRT Output

```srt
1
00:00:03,000 --> 00:00:07,200
Sarah: Thanks, everyone. Let's start with
a quick review of Q4 campaign performance.

2
00:00:07,300 --> 00:00:12,000
James: Paid social exceeded target by 18%,
but search was under plan for the month.

3
00:00:12,100 --> 00:00:17,000
Mina: Based on that, I recommend shifting
part of the search budget into retargeting.

4
00:00:17,100 --> 00:00:22,400
Alex: How much reallocation are we talking
about before the product launch window?

5
00:00:22,500 --> 00:00:28,000
Mina: Roughly fifteen percent this week,
so we can support launch traffic in time.

6
00:00:28,100 --> 00:00:33,500
Sarah: Okay, let's confirm timelines.
Creative needs final approval by next Tuesday.
```

---

## Review Tips

- **Check speaker labels carefully**: AI diarization often swaps speakers after interruptions or overlapping talk.
- **Verify business terms**: Review names, acronyms, budget numbers, and launch dates manually.
- **Watch for reading speed**: Keep captions short enough to read on mute; avoid dense 3-line subtitles.
- **Fix timing**: Captions should appear when speech starts and clear before the next speaker begins.
- **Test the burned-in export**: Watch on both desktop and phone to confirm font size, contrast, and bottom placement.
- **Keep both versions**: Share the burned-in MP4 with stakeholders and save the SRT for future edits or platform uploads.

Gemini 2.5 Pro 5/10

Of course. Here is a complete workflow, sample output, and review guide for adding AI-generated subtitles with speaker labels to your Zoom recording.

### Workflow

1. **Select an AI Transcription Tool:** For your needs, a tool like **Descript** is highly recommended. It offers transcription, speaker identification, an intuitive editor, and the ability to export video with burned-in captions. Upload your 45-minute MP4 file directly to a new project.

2. **Generate the AI Transcript:** Once uploaded, instruct the software to transcribe the audio. Select "English" and specify the number of speakers (4). The AI will process the file and generate a time-stamped transcript.

3. **Assign Speaker Labels:** The AI will likely assign generic labels (e.g., Speaker 1, Speaker 2). Go through the transcript and replace these with the actual names of your team members (e.g., Sarah, David, Maria, Ben). Descript's speaker assignment feature makes this simple.

4. **Review and Correct the Transcript:** Read through the entire transcript while listening to the audio. Correct any errors in punctuation, grammar, or terminology. Pay close attention to marketing-specific jargon, campaign names, and financial figures, as AI can often misinterpret these.

5. **Style Your Captions:** Since you are burning the subtitles into the video, customize their appearance for readability. Choose a clear font (like Arial or Helvetica), set an appropriate size, and add a subtle background or outline to ensure the text is visible against any video background.

6. **Export the Final Video:** Once you are satisfied with the accuracy and styling, use the tool's export function. Select "Video" as the export format and ensure the "Burned-in captions" option is enabled. Export the file at a suitable quality (e.g., 1080p).

### Sample SRT Output

Here is a sample excerpt in the standard SRT format, reflecting your meeting's content.

```srt
1
00:15:22,150 --> 00:15:25,880
Sarah: Okay team, let's look at the Q4 campaign performance. Our

Grok 4.1 Fast Reasoning 8/10

### Workflow
1. Upload MP4 to VEED.io or Kapwing (free AI subtitle tools supporting speaker detection).
2. Select "Auto-transcribe" with English language; enable "Speaker labels" (e.g., Speaker 1-4); set accuracy to "High."
3. Review/edit transcript: Assign names (e.g., Sarah, Mike) to speakers, fix timing errors, add punctuation.
4. Export as SRT file; download.
5. Import MP4 and SRT into HandBrake (free): Add subtitle track, select "Burn In" under Subtitles tab.
6. Adjust font/size (e.g., white Arial 24pt, bottom position); encode to MP4 (H.264, 1080p).
7. Preview full video; re-encode if needed.

### Sample SRT Output
```
1
00:05:12,000 --> 00:05:18,500
[Sarah: Marketing Lead] Q4 campaigns exceeded targets by 15% overall.

2
00:05:18,700 --> 00:05:25,200
[Mike: Campaign Manager] Email open rates were strong at 28%, but social ads underperformed.

3
00:05:25,400 --> 00:05:32,100
[Lisa: Budget Analyst] We have $50K unspent; recommend reallocating to product launch promo.

4
00:05:32,300 --> 00:05:38,900
[Tom: Product Coordinator] Launch timelines: Beta next week, full rollout by Dec 15th.
```

### Review Tips
- **Accuracy**: Play video synced with SRT; check 80%+ word match, fix homophones (e.g., "Q4" vs. "cue for").
- **Speaker Labels**: Verify diarization (VEED auto-labels); manually swap if overlaps; use consistent format like [Name: Role].
- **Export**: Test on mute; ensure subtitles don't overlap speech; use FFmpeg CLI for batch: `ffmpeg -i input.mp4 -vf subtitles=srtfile.srt output.mp4`.
(248 words)

What makes these work

01

Specify your output format upfront
Always tell the AI whether you need SRT, VTT, or plain timestamped text before pasting your transcript. SRT is the standard for most video editors and social platforms. VTT is required for HTML5 web players and most LMS platforms. If you do not specify, you will get inconsistent formatting that requires manual cleanup.
02

Set a hard line-length limit
Subtitle readability depends on line length. Prompt the AI to keep each subtitle block to two lines maximum and roughly 42 characters per line. This matches the broadcast caption standard and prevents text from running off screen on mobile devices, which is where most people watch shared Zoom recordings.
03

Give the AI your Zoom transcript as source material
Zoom generates a transcript automatically for cloud recordings. Download it as a VTT file from the Zoom portal, then paste the raw text into your AI prompt. This gives the model accurate source content with rough timestamps so it is reformatting and cleaning rather than guessing, which dramatically improves accuracy.
04

Ask for speaker labels when there are multiple voices
For panel discussions, interviews, or team meetings, instruct the AI to prepend speaker labels like [HOST] or [SPEAKER 1] to each subtitle block. This adds critical context for viewers watching without audio and makes the subtitled video usable as a standalone document.

More example scenarios

#01 · Corporate training video for HR onboarding

Input

Here is a Zoom transcript excerpt from our onboarding session. Convert it into SRT subtitle format with accurate timestamps and clean line breaks: '00:01 welcome everyone to the onboarding session today we are going to cover the four main areas of company policy including time off benefits remote work expectations and the code of conduct lets start with time off'

Expected output

1
00:00:01,000 --> 00:00:05,200
Welcome everyone to the onboarding session.

2
00:00:05,200 --> 00:00:09,400
Today we will cover four main areas of company policy:

3
00:00:09,400 --> 00:00:14,800
time off, benefits, remote work expectations, and the code of conduct.

#02 · Medical conference panel recording

Input

Zoom transcript from a cardiology panel. Generate SRT subtitles: '00:03 the data from the 2023 trial showed a statistically significant reduction in major adverse cardiovascular events in the treatment group compared to placebo with a p value of less than 0.01 and a hazard ratio of 0.74'

Expected output

1
00:00:03,000 --> 00:00:08,500
The 2023 trial data showed a statistically significant reduction
in major adverse cardiovascular events.

2
00:00:08,500 --> 00:00:14,000
The treatment group outperformed placebo with a p-value under 0.01
and a hazard ratio of 0.74.

#03 · Sales webinar uploaded to LinkedIn

Input

I have this Zoom recording excerpt I need as subtitles for LinkedIn. Two lines max per subtitle card. Transcript: '00:07 so the reason most B2B sales teams miss quota is not pipeline volume it is actually conversion rate at the demo stage and today I am going to show you three things you can change this week that will move that number'

Expected output

1
00:00:07,000 --> 00:00:11,800
Most B2B sales teams miss quota not because of pipeline volume,
but because of demo-stage conversion rate.

2
00:00:11,800 --> 00:00:17,000
Today I will show you three changes you can make this week
to move that number.

#04 · University lecture recording for student accessibility

Input

Convert this lecture Zoom transcript to VTT format for our LMS upload. Keep academic terminology intact: '00:12 Keynesian economics fundamentally argues that aggregate demand is the primary driver of economic output and employment particularly in the short run when prices and wages may be sticky downward'

Expected output

WEBVTT

00:00:12.000 --> 00:00:17.500
Keynesian economics argues that aggregate demand
is the primary driver of economic output and employment.

00:00:17.500 --> 00:00:23.000
This holds especially in the short run,
when prices and wages may be sticky downward.

#05 · Podcast interview recorded over Zoom

Input

Zoom call interview between two people, needs subtitles with speaker labels. Transcript: '00:05 HOST: so you left a six figure job to start this and people thought you were crazy GUEST: completely crazy my wife thought I had lost my mind but I knew that the window was closing and I had to move'

Expected output

1
00:00:05,000 --> 00:00:09,500
[HOST] You left a six-figure job to start this.
People thought you were crazy.

2
00:00:09,500 --> 00:00:15,000
[GUEST] Completely crazy. My wife thought I had lost my mind.
But I knew the window was closing and I had to move.

Common mistakes to avoid

Pasting transcript without timestamps
If you paste just the text of a Zoom transcript without any timing data, the AI has to guess where to break subtitle cards and has no basis for timing them correctly. The result is a subtitle file that will be out of sync with the video. Always include at least rough timestamps from the Zoom transcript before asking the AI to convert it.
Not burning subtitles into the video file
Generating an SRT file is only half the job. If you share the MP4 without embedding the subtitles, viewers on most platforms will not see them unless they manually load the file themselves, which almost no one does. Use a tool like HandBrake or Kapwing to burn the SRT into the video as hardcoded subtitles before distributing.
Skipping a review pass for proper nouns
AI models will frequently misspell names of people, companies, products, and technical terms that are not in their training data. For a client-facing or public video, one wrong name in subtitles can damage credibility. Always do a final pass specifically checking every proper noun against your source material.
Using subtitle blocks that are too long
A common output from AI models is subtitle blocks with three or four lines of dense text. Viewers cannot read that much in the time the block is on screen, so they give up and ignore the subtitles entirely. If the model produces long blocks, add a constraint to your prompt such as 'maximum two lines, maximum 42 characters per line' and regenerate.
Ignoring timing accuracy for edited recordings
If your Zoom recording has been trimmed or edited before you add subtitles, the timestamps from the original Zoom transcript will be offset. The AI will dutifully reproduce those wrong timestamps into the SRT file and your subtitles will be out of sync. Always check that your transcript timestamps match the actual cut version of the video before using them as input.

Related queries

Frequently asked questions

Can Zoom automatically add subtitles to recorded meetings?

Zoom can generate an automated transcript for cloud recordings, but it outputs as a separate VTT file rather than subtitles burned into the video. The accuracy is also inconsistent, especially with accents or technical vocabulary. You still need to clean up the transcript and embed it into the video file using a separate tool.

What is the difference between SRT and VTT subtitle files?

SRT is the most widely compatible subtitle format and works with virtually every video editor, YouTube, Vimeo, and social platforms. VTT is the web standard used by HTML5 video players, Zoom itself, and most LMS platforms like Canvas and Moodle. Both contain timestamped text, but VTT supports additional styling options. When in doubt, generate SRT first since it is easier to convert.

How do I burn subtitles into a Zoom MP4 recording?

Once you have an SRT file, use HandBrake (free, desktop) and enable the subtitles track as burned-in when exporting. Alternatively, upload your MP4 and SRT to Kapwing or Clideo in a browser and use their subtitle burn tool. The process takes a few minutes and outputs a new MP4 with hardcoded subtitles that appear on any device without needing a separate caption file.

How accurate is AI-generated subtitle text compared to professional captioning?

AI-generated subtitles from a clean Zoom transcript typically reach 95-98% accuracy on standard English with clear audio, which is sufficient for most internal and marketing use cases. Professional human captioning services target 99%+ accuracy and are the right choice for legal transcripts, broadcast content, or anything requiring formal accessibility compliance under ADA or WCAG standards.

Can I add subtitles to a Zoom recording that is stored locally, not in the cloud?

Yes. Local Zoom recordings are saved as MP4 files. You can transcribe them using a tool like Whisper (free, open-source) or upload them to a transcription service to get a VTT or SRT file. Then use an AI model to clean and reformat that transcript into properly timed subtitle blocks, and burn it into the video with HandBrake or a browser-based tool.

Do I need special software to add subtitles to a Zoom recording?

No special software purchase is required. Zoom's cloud portal gives you the raw transcript for free. An AI model handles the formatting. HandBrake is free and open-source for burning subtitles into the final video. The only step that requires a paid tool is if you want a fully automated end-to-end workflow through a dedicated platform like Descript or Kapwing Pro.