Zapim

Labs studio · v2
idle

An instrument for designing, cloning and directing speech — thirty languages, every nuance from a whisper to a stadium speech.

Begin a session

choose your method

Composition

describe the voice, then write what it says
0 characters

From the library

click any preset to load

Apply a style

styles fill the personality field · effects insert tags into your text
Emotion
Scenario
Effects · click to insert into your text i
All non-verbal tagsYou can type any of these directly in your text — chips below are just shortcuts. Laughs & sighs: • [laugh] — quick laugh (alias: [chuckle]) • [sigh] — audible sigh Pauses & fillers: • [think] — thinking pause / "uhmm" (alias: [hesitate]) • [hush] — "shh" sound (alias: [quiet]) Question intonations (try each — they shape the rising lift differently): • [ask] — ah-shape • [wonder] — ei-shape • [inquire] — en-shape • [puzzle] — oh-shape Surprise / amazement: • [surprise] — "wa!" burst • [amaze] — "yo!" reaction Dissatisfaction: • [displease] — disapproving grunt (alias: [frown]) Tip: spread tags across sentences rather than stacking them in one phrase — gives the model room to set up each effect.

Fine adjustments

advanced · safe to leave alone
Adherence i
AdherenceHow strictly the model follows your personality description. • Lower (1.4–1.7): more variety + emotional range. Best for dramatic reads. • Default (2.0): balanced. • Higher (2.2–3.0): rigid, very stable, less surprise. Best for repeatable brand voice.
how strictly the model follows your description
2.5
Quality i
Quality vs SpeedRefinement passes per audio clip. • Low (4–8): faster, slightly rougher prosody. • Default (10): balanced. • High (15–25): smoother, more natural. Best for production.
refinement passes per audio clip
24
Output i
Sample rateFinal audio frequency. • Studio 48 kHz — full quality, default. • Voice 16 kHz — speech grade, ~3× smaller files. • Phone 8 kHz — telephony quality (G.711). How it sounds on a real phone call. Lower rates do NOT speed up generation — only shrink the file.
studio, voice, or telephony rate
Background i
Ambient backgroundLayer a low-volume ambient track under the speech to make it feel like a real environment — call center for support flows, rally for political speeches, street for door-to-door, etc. Mixed server-side.
layered ambient under the speech
Background level i
Background levelHow loud the ambient bed sits under the speech. • 100% — full per-category default (most prominent, can compete with speech). • 70% — recommended default. Audible but not overpowering. • 30–50% — barely-there, "this is happening in a place" feel. • 0% — mute (effectively turns the background off without changing the dropdown). Step: 10%.
how loud the ambient sits under the voice (0 mutes, 100 = full)
65%
Text cleanup i
Text cleanupPre-process numbers ("123" → "one hundred twenty three"), expand abbreviations, normalize punctuation. Leave on for almost everything.
expand numbers and abbreviations
on