How Trovald Saves You up to 70% of Your AI Costs

    How Trovald Saves You up to 70% of Your AI Costs

    Javier Loureiro
    April 3, 2026 at 20:36 CET
    AItokenssaverouter

    My partner Pedro Ansio and I came to the conclusion that a lot of companies and developers (including ourselves) were wasting money by sending their AI requests to the same LLM over and over again.

    That’s why we built trovald.

    Trovald is essentially an AI router, but there’s much more happening under the hood.

    How It Works

    Trovald Architecture
    Trovald Architecture

    For starters, trovald injects a system prompt defined by your organization into every request. These are instructions that the LLM must follow regardless of the user prompt.

    Layer 1: Exact Cache (Redis)

    The request first goes through a Redis cache layer.

    If there’s an exact match, the response is retrieved directly from our database. This means the request never reaches the LLM, saving both time and cost.

    Layer 2: Semantic Cache (pgvector)

    If there’s no exact match, the request moves to a semantic cache powered by pgvector.

    pgvector is a PostgreSQL extension that allows storing embeddings (dense vector representations of text) directly in the database.

    Each embedding represents the semantic meaning of a prompt using a 1536-dimensional vector.

    To compare similarity between prompts, we use cosine similarity:

    • 1 → identical meaning
    • 0 → completely unrelated

    We use a threshold of 0.95 to avoid false positives.

    If a prompt is semantically similar enough, trovald treats it as a cache hit.

    Example

    • "What's the weather like tomorrow?"
    • "Can you tell me the weather prediction for tomorrow?"

    Even though the wording is different, both prompts mean the same thing. This results in a semantic cache hit, avoiding an unnecessary LLM call.

    Depiction of cosine similarity
    Depiction of cosine similarity by Sindhu Seelam

    Storing New Prompts

    If there is neither an exact nor a semantic cache hit, the prompt is stored as an embedding. This allows future requests to benefit from semantic matching.

    Prompt Classification

    Next, trovald classifies the prompt into one of the following categories:

    • code
    • chat
    • summarization
    • translation
    • complex_reasoning
    • creative_writing
    • data_analysis

    This classification is based on the semantic meaning of the prompt using embeddings and a set of predefined rules.

    Model Routing

    Finally, the prompt is routed to the most cost-effective LLM for that specific task.

    Trovald dynamically selects the optimal provider based on performance and cost.

    Some of Trovald's models
    Some of Trovald's models, image by Rolf Mistelbacher

    Fallback System

    If the selected provider fails (due to downtime or any issue), trovald uses a 4-level fallback chain to ensure the request is always completed.

    This guarantees reliability without sacrificing efficiency.

    Why It Matters

    By combining caching, semantic understanding, and intelligent routing, trovald significantly reduces:

    • LLM usage costs (30-70% savings)
    • Latency
    • Redundant processing

    All while maintaining high-quality responses.