HubArticleHow I Made a Talking Pet Podcast Video with Veo 3.1

How I Made a Talking Pet Podcast Video with Veo 3.1

Updated: Dec 23, 2025
How I Made a Talking Pet Podcast Video with Veo 3.1

I’ve been generating images and videos on PicLumen for a long time.

Lately, I noticed something interesting in our traffic: people are clearly paying more attention to AI video generation.

So I decided to try something fun.

You’ve probably seen these videos before — cats or dogs sitting there, talking like they’re hosting a podcast, casually roasting humans. They look simple, but if you’ve tried to make one, you know the truth: most models completely mess it up.

I tested the models currently available on PicLumen.

And one model stood out:

Veo 3.1.

It was the only one that didn’t feel like I was fighting the model the whole time.

image (2).png

Why Veo 3.1 Works for This Kind of Video

This type of video lives or dies by timing.

It’s not about big movements or flashy camera work.

It’s about who speaks, who stays quiet, and when the reaction hits.

Veo 3.1 handles this surprisingly well.

  • It supports native audio generation.

  • Mouth movement actually matches the voice.

  • It understands turn-taking. One character talks, the other waits.

Most importantly, I didn’t have to regenerate the same prompt ten times just to get something usable. That alone is a win.

The Full Prompt I Used

Before breaking anything down, here’s the full prompt.

You can copy it and try it as-is.

A single static shot in a cozy late-night radio studio. Two real, round, adorable cats sit side by side behind a wooden desk, each wearing large, comfortable studio headphones. A small microphone is placed between them. Warm amber lighting, soft shadows, realistic fur texture, visible whiskers, and natural eye reflections. The camera remains fixed with no cuts or movement.

At the top of the frame, artistic display typography reads ‘PicLumen Pet Radio’. The text is centered, stylish, and remains visible for the entire video.

The two cats host a radio-style conversation. The cat on the left speaks first, moving its mouth clearly while the other cat stays silent and listens. It speaks in a fast, teasing tone: ‘Humans keep complaining that we shed too much.’

After the first cat finishes speaking, the second cat responds calmly, moving only its own mouth: ‘Yeah, they say our fur gets everywhere.’

The first cat immediately follows up, sounding amused and sarcastic: ‘But how can they complain about shedding…’

Only after the sentence fully ends, the second cat delivers the punchline with a dry, confident tone: ‘…when they barely have any hair left to shed?’

During the dialogue, the cats maintain natural eye contact, with subtle head turns, ear flicks, blinking, and expressive facial reactions. There is no laughter while any line is being spoken.

After the final punchline is fully delivered, both cats pause briefly, look at each other, and then burst into open, joyful laughter together. Their mouths open wide, eyes squint slightly, and their heads and bodies shake naturally as they laugh. The video ends on this shared laughter, clearly humorous and warm.

The cats must look fully real, not cartoon, not animated, not stylized. The overall mood is playful, witty, and cozy, like a humorous pet radio talk show.

Now let’s talk about why this works.

Step 1: Lock the Camera or Everything Falls Apart

The first rule for podcast-style videos is simple:

Do not let the camera move.

That’s why the prompt clearly says things like:

  • single static shot

  • camera remains fixed

  • no cuts or movement

This isn’t about being lazy.

A fixed camera gives the model fewer things to mess up, which means better lip sync and more stable expressions.

If the camera starts drifting, the mouth sync usually goes with it.

Step 2: Be Extremely Clear About Who Is Talking

This part matters more than people think.

I explicitly tell the model:

  • Which cat speaks first

  • Which one stays silent

  • When the second cat is allowed to respond

I also use phrases like “after the first cat finishes speaking” and “only after the sentence fully ends.”

Why?

Because AI understands order better than time.

Seconds are vague.

Finished sentences are not.

If you skip this, both characters will talk at the same time. Every. Single. Time.

Step 3: Small Movements, Not Big Ones

This video works because nothing dramatic happens.

The cats blink.

Their ears flick a little.

They turn their heads slightly.

That’s it.

These micro-movements sell the realism.

Big gestures usually break it.

Veo 3.1 is good at this, but you still have to guide it. I always describe subtle actions instead of large ones.

Step 4: Treat the Punchline Like an Event

Here’s a mistake I made early on:

I tried to mix the punchline and the laughter together.

Bad idea.

In the working version, the structure is very clear:

  1. Punchline finishes.

  2. Short pause.

  3. Eye contact.

  4. Laughter starts.

The laughter is its own moment.

Not background noise.

This is where the humor actually lands.

Step 5: Force Realism (Or It Turns Into a Cartoon)

If you don’t explicitly say “not cartoon, not animated, not stylized,” the model will absolutely try to be cute in the wrong way.

I learned this the hard way.

For pet podcast videos, realism makes the joke funnier. The more serious the setup looks, the better the punchline hits.

Common Ways This Can Go Wrong

If your result looks weird, it’s usually one of these:

  • Both characters talk at the same time

  • Audio plays but mouths don’t move

  • The camera suddenly zooms in

  • Laughter starts before the punchline ends

When that happens, don’t panic.

Go back to the prompt and make the sequence clearer. Veo 3.1 usually fixes it once the instructions are explicit enough.

Final Thoughts

This kind of video looks effortless when it works, but getting there takes the right model and a very specific way of writing prompts.

For me, Veo 3.1 is currently the best option on PicLumen for this style. It understands dialogue, timing, and restraint — which sounds boring, but actually makes all the difference.

If you want to experiment, try swapping the animals, changing the personalities, or turning it into a three-host setup. The structure stays the same.

Once it clicks, it’s honestly a lot of fun.

And yeah — humans probably are shedding more than they admit. 😄

Jessie
Jessie
209
21
0
3,558Views
Dec 23, 2025
Discussion
Add a comment