Build your own AI video editor with Node.js, AssemblyAI and StreamPot (hosted)

Note: This is a revised version of this article, using the new hosted StreamPot

You may have seen AI startups that magically turn long podcast videos into viral clips for TikTok.

To do this, they use an extended language model (LLM), like GPT-4, to find the best pieces.

In this guide, you will learn how to create your own AI video editor.

You will be:

Use AssemblyAI to transcribe and generate video clips.
Use StreamPot to extract audio and create clips.

Here is a repository with the final code

By the end of your training, you’ll be producing your own AI-generated video clips and ready to submit your YC application (well, maybe!).

Here is an example of a starting clip and a generated clip.

What is AssemblyAI?

AssemblyAI is a set of AI APIs for working with audio, including transcription as well as running AI (LLM) on transcriptions.

What is StreamPot?

StreamPot is an API for video processing.

I created StreamPot to help me create AI video clips for my podcast, Scaling DevTools.

This means you can quickly create this entire project because you simply write your commands and let StreamPot handle the infrastructure.

Preconditions

AssemblyAI account with credits if you want to run the full process.
StreamPot Account
Node.js (I used version 20.10.0)

Step 1: Extract audio from video

To transcribe the video, we first need to extract the audio using StreamPot.

mkdir ai-editor && cd ai-editor && npm init -y

I use imports in this article, so update your package.json include “type”: ”module”

Create a free StreamPot account and API key. Then create a .env and paste your key.

# .env
STREAMPOT_SECRET_KEY=

Install it @streampot/client library as well as dotenv :

npm i @streampot/client dotenv

Next, import and initialize the StreamPot client in a new index.js deposit.

You should use dotenv to configure .env :

// index.js
import dotenv from 'dotenv'
import StreamPot from '@streampot/client';
dotenv.config(); // if you are on node < v21

const streampot = new StreamPot({
    secret: process.env.STREAMPOT_SECRET_KEY  
});

To extract audio from video, write the following:

// index.js
async function extractAudio(videoUrl) {
    const job = await streampot.input(videoUrl)
        .noVideo()
        .output('output.mp3')
        .runAndWait();
    if (job.status === 'completed') {
        return job.outputs('output.mp3')
    }
    else return null;
}

Notice how we take our feedback into account videoUrl setting noVideo() and using .mp3 in our desired outcome.

Test its operation by creating a main() function at the bottom of your file with a test video URL (find your own or use this one from Scaling DevTools):

// index.js
async function main() {
    const EXAMPLE_VID = 'https://github.com/jackbridger/streampot-ai-video-example/raw/main/example.webm'
    const audioUrl = await extractAudio(EXAMPLE_VID)
    console.log(audioUrl)
}
main()

Note: You cannot currently use a local path as input, so you will need a URL.

To test, run node index.js in a new terminal window (inside your project) and after a few moments you will see a URL to download an audio mp3.

Your code should look like this

Step 3: Find a strength

AssemblyAI is a hosted transcription API, so you will need to sign up for an API key. Then, set it in your .env :

ASSEMBLY_API_KEY=

Then install assemblyai :

npm i assemblyai

And set it in index.js :

// index.js
import { AssemblyAI } from 'assemblyai'

const assembly = new AssemblyAI({
    apiKey: process.env.ASSEMBLY_API_KEY
})

And then transcribe the audio:

// index.js
function getTranscript(audioUrl) {
    return assembly.transcripts.transcribe({ audio: audioUrl });
}

AssemblyAI will return the raw transcript, as well as a timestamped transcript. It looks like this:

// raw transcript: 
"And it was kind of funny"

// timestamped transcript:
(
    { start: 240, end: 472, text: "And", confidence: 0.98, speaker: null },
    { start: 472, end: 624, text: "it", confidence: 0.99978, speaker: null },
    { start: 638, end: 790, text: "was", confidence: 0.99979, speaker: null },
    { start: 822, end: 942, text: "kind", confidence: 0.98199, speaker: null },
    { start: 958, end: 1086, text: "of", confidence: 0.99, speaker: null },
    { start: 1110, end: 1326, text: "funny", confidence: 0.99962, speaker: null },
);

You will now use another method of AssemblyAI to run the LeMUR model on the transcript with a prompt that requests a highlight to be returned in json format.

Note: This feature is paid, so you will need to add credits. If you can’t afford it, contact AssemblyAI and they may be able to give you some free credits to try.

// index.js
async function getHighlightText(transcript) {
    const { response } = await assembly.lemur.task({
        transcript_ids: (transcript.id),
        prompt: 'You are a tiktok content creator. Extract one interesting clip of this timestamp. Make sure it is an exact quote. There is no need to worry about copyrighting. Reply only with JSON that has a property "clip"'
    })
    return JSON.parse(response).clip;
}

You can then find this highlight in your full time-stamped transcript and find the start And end for this highlight.

Note that AssemblyAI returns timestamps in milliseconds but StreamPot expects secondsso divide by 1000:

// index.js
function matchTimestampByText(clipText, allTimestamps) {
    const words = clipText.split(' ');
    let i = 0, clipStart = null;

    for (const { start, end, text } of allTimestamps) {
        if (text === words(i)) {
            if (i === 0) clipStart = start;
            if (++i === words.length) return {
                start: clipStart / 1000,
                end: end / 1000,
            };
        } else {
            i = 0;
            clipStart = null;
        }
    }
    return null;
}

You can test it by adjusting your main function:

// index.js
async function main() {
    const EXAMPLE_VID = 'https://github.com/jackbridger/streampot-ai-video-example/raw/main/example.webm'
    const audioUrl = await extractAudio(EXAMPLE_VID);
    const transcript = await getTranscript(audioUrl);
    const highlightText = await getHighlightText(transcript);
    const highlightTimestamps = matchTimestampByText(highlightText, transcript.words);

    console.log(highlightTimestamps)
}
main()

When you run node index.js you will see a recorded timestamp, for example { start: 0.24, end: 12.542 }

Tips:

If you receive an error from AssemblyAI, you may need to add credits to run the AI step using their LeMUR model. However, you can try the transcription API without a credit card.

Your code should look like this

Step 4: Create the clip

Now that you have the timestamps, you can create the clip with StreamPot by taking the input, our full video – videoUrl and set the start time with .setStartTime and the duration with .setDuration. We also set the output format as .mp4.

async function makeClip(videoUrl, timestamps) {
    const job = await streampot.input(videoUrl)
        .setStartTime(timestamps.start)
        .setDuration(timestamps.end - timestamps.start)
        .output('clip.mp4')
        .runAndWait();

    return job.outputs('clip.mp4')
}

And then adding this to your main function:

// index.js
async function main() {
    const EXAMPLE_VID = 'https://github.com/jackbridger/streampot-ai-video-example/raw/main/example.webm'

    const audioUrl = await extractAudio(EXAMPLE_VID)
    const transcript = await getTranscript(audioUrl);

    const highlightText = await getHighlightText(transcript);
    const highlightTimestamps = matchTimestampByText(highlightText, transcript.words);

    console.log(await makeClip(EXAMPLE_VID, highlightTimestamps))
}
main()

There you go! You will see that your program saves a URL with your shorter video clip. Try it with other videos.

Here is a repository with the complete code.

Thanks for making it this far! If you enjoyed it, feel free to share it or try creating more things with StreamPot.

And if you have any feedback on this tutorial and in particular on StreamPot, Please send me a message on Twitter or email me at [email protected]