My partner Pedro Ansio and I came to the conclusion that a lot of companies and developers (including ourselves) were wasting money by sending their AI requests to the same LLM over and over again.
That’s why we built trovald.
Trovald is essentially an AI router, but there’s much more happening under the hood.
How It Works
For starters, trovald injects a system prompt defined by your organization into every request. These are instructions that the LLM must follow regardless of the user prompt.
Layer 1: Exact Cache (Redis)
The request first goes through a Redis cache layer.
If there’s an exact match, the response is retrieved directly from our database. This means the request never reaches the LLM, saving both time and cost.
Layer 2: Semantic Cache (pgvector)
If there’s no exact match, the request moves to a semantic cache powered by pgvector.
pgvector is a PostgreSQL extension that allows storing embeddings (dense vector representations of text) directly in the database.
Each embedding represents the semantic meaning of a prompt using a 1536-dimensional vector.
To compare similarity between prompts, we use cosine similarity:
- 1 → identical meaning
- 0 → completely unrelated
We use a threshold of 0.95 to avoid false positives.
If a prompt is semantically similar enough, trovald treats it as a cache hit.
Example
- "What's the weather like tomorrow?"
- "Can you tell me the weather prediction for tomorrow?"
Even though the wording is different, both prompts mean the same thing. This results in a semantic cache hit, avoiding an unnecessary LLM call.
Storing New Prompts
If there is neither an exact nor a semantic cache hit, the prompt is stored as an embedding. This allows future requests to benefit from semantic matching.
Prompt Classification
Next, trovald classifies the prompt into one of the following categories:
- code
- chat
- summarization
- translation
- complex_reasoning
- creative_writing
- data_analysis
This classification is based on the semantic meaning of the prompt using embeddings and a set of predefined rules.
Model Routing
Finally, the prompt is routed to the most cost-effective LLM for that specific task.
Trovald dynamically selects the optimal provider based on performance and cost.
Fallback System
If the selected provider fails (due to downtime or any issue), trovald uses a 4-level fallback chain to ensure the request is always completed.
This guarantees reliability without sacrificing efficiency.
Why It Matters
By combining caching, semantic understanding, and intelligent routing, trovald significantly reduces:
- LLM usage costs (30-70% savings)
- Latency
- Redundant processing
All while maintaining high-quality responses.
