How Trovald Saves You up to 70% of Your AI Costs

My partner Pedro Ansio and I came to the conclusion that a lot of companies and developers (including ourselves) were wasting money by sending their AI requests to the same LLM over and over again.

That’s why we built trovald.

Trovald is essentially an AI router, but there’s much more happening under the hood.

How It Works

For starters, trovald injects a system prompt defined by your organization into every request. These are instructions that the LLM must follow regardless of the user prompt.

Layer 1: Exact Cache (Redis)

The request first goes through a Redis cache layer.

If there’s an exact match, the response is retrieved directly from our database. This means the request never reaches the LLM, saving both time and cost.

Layer 2: Semantic Cache (pgvector)

If there’s no exact match, the request moves to a semantic cache powered by pgvector.

pgvector is a PostgreSQL extension that allows storing embeddings (dense vector representations of text) directly in the database.

Each embedding represents the semantic meaning of a prompt using a 1536-dimensional vector.

To compare similarity between prompts, we use cosine similarity:

1 → identical meaning
0 → completely unrelated

We use a threshold of 0.95 to avoid false positives.

If a prompt is semantically similar enough, trovald treats it as a cache hit.

Example

"What's the weather like tomorrow?"
"Can you tell me the weather prediction for tomorrow?"

Even though the wording is different, both prompts mean the same thing. This results in a semantic cache hit, avoiding an unnecessary LLM call.

Depiction of cosine similarity by Sindhu Seelam

Storing New Prompts

If there is neither an exact nor a semantic cache hit, the prompt is stored as an embedding. This allows future requests to benefit from semantic matching.

Prompt Classification

Next, trovald classifies the prompt into one of the following categories:

code
chat
summarization
translation
complex_reasoning
creative_writing
data_analysis

This classification is based on the semantic meaning of the prompt using embeddings and a set of predefined rules.

Model Routing

Finally, the prompt is routed to the most cost-effective LLM for that specific task.

Trovald dynamically selects the optimal provider based on performance and cost.

Some of Trovald's models, image by Rolf Mistelbacher

Fallback System

If the selected provider fails (due to downtime or any issue), trovald uses a 4-level fallback chain to ensure the request is always completed.

This guarantees reliability without sacrificing efficiency.

Why It Matters

By combining caching, semantic understanding, and intelligent routing, trovald significantly reduces:

LLM usage costs (30-70% savings)
Latency
Redundant processing

All while maintaining high-quality responses.