Promptr Studio

Prompt engineering with version control and evaluation

Promptr Studio is the prompt management surface of the EVC Platform. Author prompts with a rich editor, manage versions with tagging and promotion, run evaluations against test suites, and compare performance across providers with A/B experiments.

Prompt authoring and the editor

The prompt editor provides a structured environment for writing, formatting, and annotating prompts. Prompts are stored as versioned documents within a project context.

Each prompt has a name, description, and body. The body supports multi-section layouts so you can separate system instructions, context blocks, and task descriptions. Syntax highlighting helps distinguish variable placeholders from static content.

Prompt editor showing a multi-section prompt with variable highlighting

Prompts belong to a project and can be referenced by builds. When a build triggers with a saved prompt, the exact version used is recorded in the evidence bundle for reproducibility.

Prompt library listing all prompts in a project with version badges

Version management

Every edit creates a new version. Versions can be tagged, compared, and promoted through a lifecycle that moves from draft to production.

Creating versions

When you save changes to a prompt, a new version is created automatically. Each version is immutable once saved, giving you a complete history of every change.

Version history panel showing list of versions with timestamps and authors

Tagging and promotion

Tag versions with semantic labels (e.g., v1.0, release-candidate) for easy reference. Promote a version to production status to make it the default used by builds in the project.

[Screenshot: Version promotion dialog with tag input and production toggle — pending capture]

Only one version can be in production status at a time. Promoting a new version automatically demotes the previous one.

Prompt variables and parameters

Variables let you create reusable prompt templates with dynamic values that are resolved at execution time. Parameters define the schema and defaults for those variables.

Define variables using double-brace syntax: {{variable_name}}. Each variable can have a type (string, number, boolean, enum), a default value, and a description that appears in the execution UI.

[Screenshot: Variable configuration panel showing type, default, and description fields — pending capture]

When a build or playground execution references a prompt with variables, the UI displays a form for filling in values. If defaults are configured, they pre-populate the form.

[Screenshot: Execution form with pre-populated variable inputs from prompt parameters — pending capture]

Execution and the playground

The playground lets you test prompts interactively without triggering a full build. Run a prompt against any supported provider and model, review the response, and iterate quickly.

Open a prompt and click Playground.
Fill in variable values if the prompt uses parameters.
Select a provider and model from the dropdown.
Click Run to execute the prompt. The response streams in real time.
Review token usage, latency, and the full response in the output panel.

[Screenshot: Playground with prompt on the left and streaming response on the right — pending capture]

Playground executions do not create builds or evidence bundles. They are recorded in the prompt execution history for analytics but are separate from the governed build pipeline.

Evaluation suites and test cases

Evaluation suites let you define test cases that automatically verify prompt quality across versions and providers. Each test case specifies inputs, expected behaviors, and scoring criteria.

Creating a suite

Navigate to the Evaluations tab of a prompt and click New Suite. Give it a name and description, then add test cases with input variables and expected output criteria.

[Screenshot: Evaluation suite editor with test case list and scoring criteria — pending capture]

Running evaluations

Run a suite against any prompt version and provider combination. The platform executes each test case, applies scoring criteria, and produces an aggregate quality report with per-case pass/fail results.

[Screenshot: Evaluation results showing pass/fail per test case with quality scores — pending capture]

Evaluations help catch quality regressions when editing prompts. Run the suite before promoting a version to production.

A/B experiments

A/B experiments compare two prompt versions or provider configurations head to head. The platform runs both variants against the same inputs and produces a comparative analysis.

Navigate to the Experiments tab and click New Experiment.
Select the two variants to compare: different prompt versions, different models, or different providers.
Choose the evaluation suite to use as the comparison basis.
Click Run Experiment. Both variants execute against every test case.
Review the results table showing quality, cost, and latency for each variant.

[Screenshot: A/B experiment results comparing two variants with quality and cost metrics — pending capture]

Execution tracing and analytics

Every prompt execution, whether from the playground, an evaluation, or a build, is recorded with full tracing data. Analytics surfaces help you understand usage patterns, cost trends, and quality over time.

The execution history shows every run with provider, model, token counts, latency, and cost. Filter by date range, version, or provider to identify trends.

[Screenshot: Execution history table with filters for version, provider, and date range — pending capture]

The analytics dashboard aggregates execution data into charts showing cost per execution over time, average latency by provider, and quality score trends across evaluation runs.

[Screenshot: Promptr analytics dashboard with cost, latency, and quality trend charts — pending capture]