How to Transcribe a Zoom Meeting Recording Automatically

Tested prompts for how to transcribe a zoom meeting recording compared across 5 leading AI models.

BEST BY JUDGE SCORE Claude Opus 4.7 9/10

You recorded a Zoom meeting and now you need the words in text form. Maybe you missed part of the conversation, need to send action items to someone who was not there, or have to create a written record for compliance. Whatever the reason, you need a transcript, and you need it without spending hours typing manually.

Zoom has a built-in transcription feature, but it only works if your account plan supports cloud recording, and the accuracy is inconsistent, especially when multiple people talk over each other or accents are involved. The practical alternative is to take the audio or video file from your Zoom recording and run it through an AI transcription tool, either a dedicated service or a general-purpose large language model with audio capabilities.

This page shows you exactly how to do that using an AI prompt-based approach. You will see a tested prompt, how four different models handled it, and a side-by-side comparison so you can pick the right tool for your situation. Whether you are a project manager, a journalist, an HR professional, or a researcher, the workflow is the same: export the file, run the prompt, clean the output.

When to use this

This approach works best when you have a finished Zoom recording saved as an MP4 or M4A file and you need a readable, searchable transcript quickly. It fits situations where accuracy matters more than speed, where you want speaker labels, or where you need to customize the output format for a specific downstream use like meeting minutes or a report.

  • You need to share meeting notes with stakeholders who did not attend the call
  • You are extracting action items or decisions from a recorded client or team meeting
  • You have a recorded interview or user research session that needs a verbatim text record
  • Your company requires written documentation of meetings for legal or compliance purposes
  • You want to create a searchable archive of recorded internal briefings or training sessions

When this format breaks down

  • Live or real-time transcription during an active Zoom call: this workflow requires a completed recording file, not a live audio stream
  • Recordings longer than 2 hours with 6 or more speakers: speaker diarization accuracy drops significantly and manual cleanup time often exceeds the time saved
  • Highly confidential recordings such as attorney-client privileged conversations or medical consultations where uploading audio to a third-party AI service creates a compliance or legal risk
  • Recordings with extremely poor audio quality, heavy background noise, or multiple overlapping speakers throughout: the transcript will require so many corrections that it may be faster to transcribe manually

The prompt we tested

You are an expert meeting transcriptionist and productivity assistant. A user wants to transcribe a Zoom meeting recording and needs clear, accurate guidance. Using the user's context below, produce a helpful response that explains how to transcribe their Zoom recording.

Follow these output rules:
Structure the response with: (1) a short intro confirming the best approach for their situation, (2) a numbered step-by-step guide covering both Zoom's built-in audio transcript feature (for Pro/Business/Enterprise cloud recordings) and a fallback method using a third-party AI tool (e.g., Otter.ai, Whisper, Rev, Descript) for local MP4 recordings, (3) a brief tips section covering speaker labels, accuracy, editing, and exporting (SRT/TXT/DOCX), and (4) a one-line recommendation tailored to their context. Keep it under 400 words, use plain language, and include exact menu paths like Settings > Recording > Audio transcript.

User context:
I recorded a 90-minute Zoom call with 5 participants last week and saved it locally to my Mac as an MP4 file. I have a free Zoom account, so I don't think I have the cloud transcription feature. I need a transcript with speaker names to share with my team.

Produce the transcription guide now.

Model comparison

ModelQualityCostLatencyTokens
Claude Opus 4.7 9 $0.0809 16.5s 440 / 991
Claude Haiku 4.5 8 $0.0031 6.1s 306 / 567
GPT-5.4 8 $0.0098 10.7s 267 / 612
Gemini 2.5 Pro 7 $0.0034 17.8s 279 / 613
Grok 4.1 Fast Reasoning 8 $0.0003 7.5s 416 / 386

Outputs

Claude Opus 4.7 9/10
## Best Approach for Your Situation

Since you have a free Zoom account and a local MP4 file, Zoom's built-in audio transcript won't work for you (it requires a Pro/Business/Enterprise plan with cloud recording). Your best bet is a third-party AI transcription tool that supports speaker identification. I've included the Zoom method below as a reference in case you upgrade later.

## Step-by-Step Guide

**Option A — Third-Party AI Tool (Recommended for You)**

1. **Pick a tool with speaker labels.** Good options: Otter.ai (free tier: 300 min/month), Descript, Rev (paid, high accuracy), or OpenAI Whisper (free, local).
2. **Convert if needed.** Most tools accept MP4 directly. If not, extract audio to MP3 using VLC (File > Convert/Stream) or HandBrake.
3. **Upload your file.** In Otter.ai: Home > Import > select your MP4. For Whisper on Mac: install via Terminal (`pip install openai-whisper`) and run `whisper yourfile.mp4 --model medium`.
4. **Enable speaker detection.** In Otter, this is on by default ("Speaker 1, Speaker 2…"). In Descript, go to Edit > Transcribe > Identify Speakers.
5. **Rename speakers.** Click each "Speaker 1" label and type the participant's actual name — it updates throughout the transcript.
6. **Review and export** (see Tips below).

**Option B — Zoom's Built-In Transcript (Pro+ Plans Only)**

1. Sign in at zoom.us > **Settings > Recording > Audio transcript** > toggle ON.
2. Record future meetings to the cloud (not locally).
3. After the meeting, go to **Recordings > Cloud Recordings** and wait for processing.
4. Download the VTT transcript file from the recording entry.

## Tips

- **Speaker labels:** Accuracy improves when participants speak for 30+ seconds uninterrupted. Expect to manually fix a few mislabels in a 5-person call.
- **Accuracy:** For a 90-minute call, budget 15–20 minutes for editing. Poor audio, crosstalk, and accents are the main issues. Whisper's "medium" or "large" model gives the best free accuracy.
- **Editing:** Use the tool's built-in editor — play audio while correcting text. Descript and Otter let you click a word to jump to that moment.
- **Exporting:** Most tools support TXT (plain), DOCX (formatted with speakers/timestamps), and SRT (subtitles). For team sharing, DOCX is cleanest.

## My Recommendation

For a one-time 90-minute file with 5 speakers, upload it to **Otter.ai's free tier** — you'll get speaker labels, easy renaming, and a clean DOCX export in under 10 minutes of hands-on time.
Judge notes: Nearly flawless for the user's specific context with accurate menu paths, concrete tool recommendations, and a tailored workflow — the only minor nitpick is that Whisper's terminal install command (`pip install openai-whisper`) may require additional setup steps on Mac (Python, ffmpeg) that a non-technical user could stumble on.
Claude Haiku 4.5 8/10
# How to Transcribe Your 90-Minute Zoom Meeting

Since you saved your recording locally as an MP4 file on your Mac and have a free Zoom account, you won't have access to Zoom's built-in cloud transcription. The best approach is to use a third-party AI transcription tool that can identify speakers and export with labels.

## Step-by-Step Guide

**Option 1: Using Otter.ai (Recommended for speaker labels)**
1. Go to otter.ai and create a free account (free tier includes 600 minutes/month)
2. Click "Upload" and select your MP4 file from your Mac
3. Wait for processing (typically 10–15 minutes for a 90-minute file)
4. Review the transcript and manually correct speaker names in the transcript editor
5. Export as TXT, DOCX, or PDF to share with your team

**Option 2: Using Descript (Best for editing + visuals)**
1. Download Descript (free tier available) on your Mac
2. Create a new project and import your MP4 file
3. Descript automatically transcribes and attempts speaker detection
4. Manually label speakers by clicking on each section and assigning names
5. Export as formatted document or SRT/VTT for subtitles

**Option 3: Using OpenAI Whisper (Free, technical)**
1. Install Whisper via Terminal: `pip install openai-whisper`
2. Run: `whisper yourfile.mp4 --language en`
3. Generates a plain-text transcript (no speaker labels by default; you'll need to manually add them)

## Key Tips

- **Speaker labels:** No tool perfectly identifies speakers automatically. Plan to spend 10–15 minutes manually assigning names to sections in the transcript.
- **Accuracy:** Review the first 5 minutes carefully; AI tools typically reach 85–95% accuracy for clear audio. Zoom meetings with poor audio quality may need more editing.
- **Editing:** Use your tool's built-in editor to fix names, technical terms, or timestamps before exporting.
- **Exporting:** All three tools export to DOCX or TXT—perfect for sharing via email or Slack.

## Recommendation

**Use Otter.ai**—it's the fastest option for your situation, offers the best free-tier limits for a single 90-minute file, and provides straightforward speaker labeling that you can quickly customize before sharing.
Judge notes: The output is production-ready and well-tailored to the user's exact context, but the prompt required covering Zoom's built-in audio transcript feature (Settings > Recording > Audio transcript) even as a contrast/reference point, which was omitted entirely, and the response slightly exceeds the 400-word limit.
GPT-5.4 8/10
Since you have a **free Zoom account** and the meeting was saved **locally as an MP4 on your Mac**, the best path is to use a **third-party AI transcription tool**. I’ll also show the Zoom built-in method in case you upgrade or someone else on your team has the cloud recording.

1. **Check if Zoom’s built-in transcript is available**
   - Zoom audio transcript only works for **cloud recordings** on **Pro/Business/Enterprise** plans.
   - If available, turn it on here:
     - **Zoom web portal > Settings > Recording > Cloud Recording > Audio transcript**
   - Then record to the cloud:
     - **Zoom app > Record > Record to the Cloud**
   - After the meeting processes, get the transcript here:
     - **Zoom web portal > Recordings > Cloud Recordings > [meeting]**
   - You can view, edit, and download transcript files.

2. **For your situation: transcribe the local MP4 with a third-party tool**
   - Pick a tool that supports speaker detection:
     - **Otter.ai**, **Descript**, **Rev**, or **Whisper**-based apps.
   - Typical steps:
     1. Open the transcription tool.
     2. Upload your **MP4** file from your Mac.
     3. Enable **speaker identification / speaker labels** if offered.
     4. Start transcription and wait for processing.
     5. Review the transcript and rename speakers (for example, “Speaker 1” to actual team names).
   - Tool examples:
     - **Otter.ai**: Import audio/video, then edit speaker names.
     - **Descript**: Drag in the MP4, auto-transcribe, then assign speakers.
     - **Rev**: Upload file and choose AI or human transcription.
     - **Whisper**: Great accuracy, but speaker labels may require another app or manual editing.

3. **Edit and export the transcript**
   - Clean up names, technical terms, and action items.
   - Export in the format your team needs:
     - **TXT** for plain text
     - **DOCX** for sharing/editing
     - **SRT** for captions/subtitles
   - Most tools have:
     - **Export > TXT / DOCX / SRT**

**Tips**
- **Speaker names:** AI usually guesses speakers first; you’ll need to assign real names manually.
- **Accuracy:** Good audio matters. Crosstalk in a 5-person call may need light cleanup.
- **Editing:** Scan timestamps, decisions, and action items before sharing.
- **Exporting:** Use **DOCX** for team review and **SRT** if you also want captions.

**Recommendation:** For your 90-minute local MP4 with 5 participants, use **Otter.ai or Descript** because they’re the easiest options for speaker-labeled transcripts you can quickly edit and share.
Judge notes: Solid, well-structured, and context-specific, but the "Tips" section partially duplicates content already covered in the steps, and the output slightly exceeds 400 words, violating the explicit constraint.
Gemini 2.5 Pro 7/10
Of course. Since you have a local MP4 file and a free Zoom account, a third-party AI transcription tool is the perfect solution for your current recording. Here is a guide to get it done and prepare for the future.

### How to Transcribe Your Zoom Recording

**Method 1: For Your Current MP4 File (Using a Third-Party AI Tool)**

This is the method you need for the file saved on your Mac.

1.  **Choose a Tool:** Select an AI transcription service. Popular choices with free tiers or trials include Otter.ai, Descript, and Rev.
2.  **Create an Account:** Sign up for the service you chose.
3.  **Upload Your File:** Find the "Upload" or "Import" button in the tool's interface and select the 90-minute MP4 file from your Mac.
4.  **Process and Edit:** The service will automatically transcribe the audio. This may take 10-20 minutes. Once complete, you can use its built-in editor to correct any errors and assign speaker names.
5.  **Export:** Download the finished transcript in your desired format (e.g., DOCX, TXT) to share with your team.

**Method 2: For Future Recordings (Using Zoom's Built-in Feature)**

If you upgrade to a paid Zoom plan (Pro, Business, or Enterprise), you can automate this process.

1.  **Enable the Setting:** As the account admin, go to **Settings > Recording** and ensure the **Audio transcript** checkbox is enabled under the "Cloud recording" section.
2.  **Record to the Cloud:** When you start your next meeting, select the "Record to the Cloud" option.
3.  **Access the Transcript:** After the meeting, you will receive an email with a link to your recording. The transcript will be available there as a .VTT file.

### Tips for a Great Transcript

*   **Speaker Labels:** AI tools will label speakers generically (e.g., "Speaker 1," "Speaker 2"). You will need to listen to the first few lines from each person and manually change the labels to their actual names.
*   **Accuracy Check:** AI is about 90-95% accurate. Always proofread the transcript to fix any misheard words, especially industry jargon or proper names.
*   **Easy Editing:** Most transcription tools have an interactive editor that links the text to the audio. Click on a word to hear that part of the recording, making corrections fast and simple.
*   **Export Options:** You can typically export the final transcript as a TXT, DOCX, or SRT file, depending on your needs.

**Recommendation:** For your current MP4 file, using an AI service like Otter.ai or Descript is your most direct path to an accurate, shareable transcript with speaker labels.
Judge notes: The output is solid and well-structured but slightly exceeds 400 words, the speaker label tip undersells available AI diarization capabilities (implying full manual work is always required), and the Zoom built-in method step 3 mentions a .VTT file without noting SRT/TXT/DOCX export options as the prompt required.
Grok 4.1 Fast Reasoning 8/10
### Best Approach for Your Zoom MP4
Since you have a free Zoom account and a local MP4 recording, use a third-party AI tool like Otter.ai—it's free for basics, handles speaker labels well for 5 participants, and works great on Mac.

### Step-by-Step Guide
1. **Check Zoom's Built-in Audio Transcript (Paid Cloud Only)**: Log into Zoom web portal > Settings > Recording > Cloud recording tab. Toggle "Audio transcript" on (requires Pro/Business/Enterprise). Record future meetings to cloud, then download transcript post-processing. (Not available for your local/free MP4.)

2. **Upload to Otter.ai (Recommended Free Fallback)**: Go to otter.ai, sign up (free tier handles 90 mins). Click "Import" > Upload your MP4. It auto-transcribes with speaker ID (train it with names via "Speakers" menu). Edit and export.

3. **Alternatives for Local MP4**:
   - **Whisper (Free, Local on Mac)**: Download MacWhisper app (via GitHub or App Store). Drag MP4 in, select model (base/large), enable speaker diarization. Exports TXT/SRT.
   - **Rev/Descript (Paid, High Accuracy)**: Upload to rev.com ($1.50/min) or descript.com (subscription). Both label speakers and offer editable timelines.

### Quick Tips
- **Speaker Labels**: Manually assign/edit names post-transcription for accuracy (AI is ~85-95% on clear audio).
- **Boost Accuracy**: Ensure good audio; review/edit for 90-min calls.
- **Export**: Choose SRT (subtitles/timed), TXT (plain), or DOCX (editable) to share with team.

**Tailored Rec**: Otter.ai is perfect for your free Mac setup—upload now for speaker-labeled transcript in minutes! (248 words)
Judge notes: Solid, well-structured response that directly addresses the user's free-account/local-MP4 context, but the Otter.ai free tier actually caps imports at 3 hours/month (not "handles 90 mins" as a standalone claim), and the word count annotation "(248 words)" at the end is an artifact that would need removal before production use.

What makes these work

  1. 01
    Specify the output format upfront

    Tell the model exactly what you want before it processes the audio: timestamps, speaker labels, a summary, bullet points, or verbatim text. If you ask for formatting after the fact, the model often has to re-process or approximates the result. One clear instruction at the start saves a revision cycle.

  2. 02
    Provide speaker context when you have it

    If you know the names and roles of people on the call, include them in your prompt. For example, tell the model that the first speaker is the sales rep and the second is the client. This produces labeled transcripts that are immediately usable instead of generic Speaker 1, Speaker 2 labels you still have to decode.

  3. 03
    Ask for a confidence flag on unclear audio

    Instruct the model to mark low-confidence sections with a tag like [inaudible] or [unclear] rather than guessing. A silent flag on a bad guess is worse than an honest marker you can find and fix. This is especially important for formal records like HR meetings or legal depositions.

  4. 04
    Separate transcription from summarization tasks

    Do not ask for both a verbatim transcript and a summary in the same output without defining which comes first. Verbatim first, then summary produces cleaner results because the model summarizes from completed text rather than toggling between modes mid-task. Structure your prompt in that order explicitly.

More example scenarios

#01 · Weekly engineering team standup
Input
Here is the transcript of our 20-minute engineering standup recording. Please clean it up, fix obvious speech-to-text errors, label each speaker as Speaker 1, Speaker 2, etc., and produce a bullet-point summary of blockers and action items at the top. Recording text: [paste raw transcript or attach audio file]
Expected output
Summary: Blockers: API rate limit issue blocking Speaker 2's deployment. Action items: Speaker 1 to review PR #412 by Thursday. Speaker 3 to send updated staging credentials to the team by EOD. Full cleaned transcript follows with speaker labels and timestamps.
#02 · Client discovery call for a marketing agency
Input
This is a 45-minute discovery call recording with a new client in the e-commerce space. Please transcribe it, assign speaker labels, and then produce a structured summary with sections for: client goals, current pain points, budget signals mentioned, and agreed next steps.
Expected output
Client Goals: Increase repeat purchase rate and reduce cart abandonment. Pain Points: Current email platform lacks segmentation. Budget Signals: Client mentioned 'mid five figures' for Q3. Next Steps: Agency to send proposal by Friday; client to share GA4 access by Wednesday.
#03 · HR disciplinary meeting documentation
Input
Please produce a verbatim transcript of the attached meeting recording. This is a formal HR meeting. Do not summarize or paraphrase. Label speakers as HR Representative and Employee. Flag any sections where audio was unclear with [inaudible].
Expected output
HR Representative: 'This meeting is being recorded with your consent, as confirmed at the start. Can you walk me through what happened on the 14th?' Employee: 'I was not aware that the policy had changed.' HR Representative: 'The updated policy was distributed on March 1st via email.' [inaudible - approx. 4 seconds] Employee: 'I may have missed it.'
#04 · Qualitative user research interview
Input
Transcribe this 30-minute user research interview. The interviewer is from our product team, the participant is a small business owner testing our invoicing feature. After the transcript, pull out every quote where the participant expresses confusion, frustration, or a feature request. Label these quotes by theme.
Expected output
Confusion: 'I kept looking for a save button and I could not find it anywhere.' Frustration: 'Why does it log me out every time I switch tabs? That happened three times.' Feature Request: 'I really wish I could duplicate an invoice instead of starting from scratch every month.' Full transcript follows.
#05 · Executive all-hands meeting for internal archive
Input
This is a recording of our quarterly all-hands meeting, approximately 60 minutes. Please transcribe it and produce two outputs: 1) a full cleaned transcript with speaker names where identifiable, and 2) a 200-word executive summary suitable for employees who could not attend.
Expected output
Executive Summary: The Q2 all-hands covered three main topics. CEO Jane Smith announced the company exceeded revenue targets by 12% and outlined the plan to expand into two new markets by Q4. COO David Reyes confirmed two new hires in the logistics team and addressed questions about the hybrid work policy update effective August 1st. The session closed with a product demo from the engineering lead. Full transcript available below.

Common mistakes to avoid

  • Uploading the full MP4 when only audio is needed

    Many AI tools that accept audio will also accept MP4 video files, but the file size is much larger and upload times increase substantially. Extract the audio as M4A or MP3 using a free tool like VLC or Audacity before uploading. Smaller files process faster and hit fewer size limits.

  • Not checking Zoom's own transcript first

    If your Zoom account has cloud recording enabled, Zoom may have already generated a transcript automatically. Check your Zoom web portal under Recordings before doing any manual work. The built-in transcript is often good enough for casual use and can serve as a rough draft you clean up with an AI prompt.

  • Assuming speaker diarization is automatic

    Most AI transcription tools do not automatically identify who is speaking unless you explicitly ask for speaker separation and provide enough audio contrast between voices. Without that instruction, the output is one undifferentiated block of text. Always include speaker labeling in your prompt if you need it.

  • Skipping the cleanup step on formal documents

    AI transcripts almost always contain errors: filler words, misheard technical terms, or wrong punctuation. Sending a raw AI transcript to a client, an HR file, or a legal record without review is a real risk. Build in 10-15 minutes of proofreading for every 30 minutes of recording as a baseline expectation.

  • Ignoring privacy and data handling policies

    When you upload a meeting recording to any external AI service, the audio or text may be stored, logged, or used for model training depending on the provider's terms. For recordings that include personal data, financial information, or confidential business strategy, verify the provider's data retention policy or use an on-premise or self-hosted transcription option.

Related queries

Frequently asked questions

Does Zoom automatically transcribe recordings?

Zoom can auto-transcribe cloud recordings if you have a Pro, Business, or Enterprise plan and enable the feature in your account settings. Go to Settings, then Recording, and turn on Audio Transcript. The file will appear alongside your recording in the Zoom web portal after processing. Local recordings saved to your computer do not get auto-transcribed by Zoom.

What is the best free tool to transcribe a Zoom recording?

Otter.ai, Whisper (OpenAI's open-source model), and Google's speech-to-text all offer free tiers with reasonable accuracy. Whisper is free to run locally with no data leaving your machine, which makes it a strong choice for confidential recordings. Otter.ai offers 300 free minutes per month and produces cleaner formatting out of the box.

How do I get the audio file out of a Zoom recording?

If you recorded locally, Zoom saves both an MP4 video file and an M4A audio file in your Zoom folder by default, usually found in Documents or your user directory. If you used cloud recording, log into the Zoom web portal, go to Recordings, and download either the video or audio-only file. The M4A audio file is smaller and usually sufficient for transcription purposes.

How accurate is AI transcription for Zoom meetings?

Accuracy typically ranges from 85% to 95% for clear audio with a single speaker or well-separated speakers in English. It drops with heavy accents, crosstalk, technical jargon, or poor microphone quality. Plan to review and correct the transcript before using it in any formal document. Providing a glossary of proper nouns or technical terms in your prompt can improve accuracy for specialized fields.

Can I transcribe a Zoom recording without downloading any software?

Yes. Several browser-based tools transcribe recordings without requiring a software install. You can upload your Zoom MP4 or M4A file directly to services like Otter.ai, Sonix, Descript, or a multimodal AI model with file upload support. You do not need to install anything locally as long as you are comfortable with the provider's data handling terms.

How long does it take to transcribe a one-hour Zoom recording?

Most AI transcription tools process one hour of audio in 3 to 10 minutes depending on file size and server load. Whisper running locally on consumer hardware typically takes 5 to 15 minutes for a one-hour file. Cloud services like Otter.ai or Descript are usually closer to 3 to 5 minutes for the same length. Add your own review time on top of that.