<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Trieve’s Blog</title><description>Sell more and answer every question with Conversational Discovery. Trieve uses GenAI to show your users what they&apos;re looking for every time</description><link>https://trieve.ai/</link><language>en-us</language><item><title>Accurate Hallucination Detection With NER</title><link>https://trieve.ai/blog/accurate-hallucination-detection-with-ner/</link><guid isPermaLink="true">https://trieve.ai/blog/accurate-hallucination-detection-with-ner/</guid><pubDate>Tue, 07 Jan 2025 21:33:00 GMT</pubDate><content:encoded>&lt;h2&gt;How We Do It: Smart Use of NER&lt;/h2&gt;
&lt;p&gt;Our method zeroes in on the most common and critical hallucinations—those that could mislead or confuse users. Based on our research, a large percentage of hallucinations fall into three categories:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Proper nouns&lt;/strong&gt; (people, places, organizations)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Numerical values&lt;/strong&gt; (dates, amounts, statistics)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Made-up terminology&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Instead of throwing complex language models at the problem with a LLM-as-a-judge approach, we use Named Entity Recognition (NER) to spot proper nouns and compare them between the gen AI completion and the retrieved reference text. For numbers and unknown words, we use similarly straightforward techniques to flag potential issues.&lt;/p&gt;
&lt;p&gt;Our approach will only work in use-cases where RAG is present which is fine given that Trieve is a search and RAG API. Further, because the most common approach to limiting hallucinations is RAG, this approach will work for any team building solutions on top of other search engines.&lt;/p&gt;
&lt;h3&gt;Why This Is Important:&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Lightning fast&lt;/strong&gt;: Processes in 100-300 milliseconds.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Fully self-contained&lt;/strong&gt;: No need for external AI services.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Customizable&lt;/strong&gt;: Works with domain-specific NER models.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Minimal setup&lt;/strong&gt;: Can run on CPU nodes.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Benchmark Results&lt;/h2&gt;
&lt;h3&gt;RAGTruth Dataset Performance&lt;/h3&gt;
&lt;p&gt;We achieved a 67% accuracy rate on the &lt;a href=&quot;https://github.com/PpostMedia/RAGTruth&quot;&gt;RAGTruth dataset&lt;/a&gt;, which provides a comprehensive benchmark for hallucination detection in RAG systems. This result is particularly impressive considering our lightweight approach compared to more complex solutions.&lt;/p&gt;
&lt;h3&gt;Comparison with Vectara&lt;/h3&gt;
&lt;p&gt;When tested against &lt;a href=&quot;https://huggingface.co/datasets/vectara/hcm-examples-aug-2024&quot;&gt;Vectara’s examples&lt;/a&gt;, our system showed:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;70% alignment with Vectara’s model predictions&lt;/li&gt;
&lt;li&gt;Comparable performance on obvious hallucinations&lt;/li&gt;
&lt;li&gt;Strong detection of numerical inconsistencies&lt;/li&gt;
&lt;li&gt;High accuracy on entity-based hallucinations&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This level of alignment is significant because we achieve it without the computational overhead of a full language model.&lt;/p&gt;
&lt;h2&gt;Why This Works&lt;/h2&gt;
&lt;p&gt;Our method focuses on the types of hallucinations that matter most. Made-up entities, wrong numbers, and gibberish words. By sticking to these basics, we’ve built a system that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Catches high-impact errors&lt;/strong&gt;: No more fake organizations or incorrect stats.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Runs lightning fast&lt;/strong&gt;: Minimal delay in real-time systems.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Fits anywhere&lt;/strong&gt;: Easily integrates into production pipelines with no fancy hardware needed.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Why It Matters in the Real World&lt;/h2&gt;
&lt;p&gt;Speed and simplicity are the stars of this show. Our system processes responses in &lt;strong&gt;100-300ms&lt;/strong&gt;, making it perfect for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Real-time applications (think chatbots and virtual assistants)&lt;/li&gt;
&lt;li&gt;High-volume systems where efficiency is key&lt;/li&gt;
&lt;li&gt;Low-resource setups, like edge devices or small servers&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In short, this approach bridges the gap between effectiveness and practicality. You get solid hallucination detection without slowing everything down or breaking the bank.&lt;/p&gt;
&lt;h2&gt;What’s Next: Room to Grow&lt;/h2&gt;
&lt;p&gt;While we’re thrilled with these results, we’ve got a lot of ideas for the future:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Smarter Entity Recognition&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;Train models for industry-specific jargon and custom entity types.&lt;/li&gt;
&lt;li&gt;Improve recognition for niche use cases.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Better Number Handling&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;Add context-aware analysis for ranges, approximations, and units.&lt;/li&gt;
&lt;li&gt;Normalize and convert units for consistent comparisons.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Expanded Word Validation&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;Incorporate specialized vocabularies for different fields.&lt;/li&gt;
&lt;li&gt;Make it multilingual and more context-aware.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Hybrid Methods&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;Optionally tap into language models for tricky edge cases.&lt;/li&gt;
&lt;li&gt;Combine with semantic similarity scores or structural analysis for tougher challenges.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;The Takeaway&lt;/h2&gt;
&lt;p&gt;Our system shows that &lt;strong&gt;you don’t need heavyweight tools&lt;/strong&gt; to handle hallucination detection. By focusing on the most common issues, we’ve built a fast, reliable solution that’s production-ready and easy to scale.&lt;/p&gt;
&lt;p&gt;It’s a practical tool for anyone looking to improve the trustworthiness of AI outputs, especially in environments where speed and resource efficiency are non-negotiable.&lt;/p&gt;
&lt;p&gt;Check out our work, give it a try, and let us know what you think!&lt;/p&gt;
&lt;p&gt;You can find all the code involved in our NER system, including benchmarks, at &lt;a href=&quot;https://github.com/devflowinc/trieve/tree/main/hallucination-detection&quot;&gt;github.com/devflowinc/trieve/tree/main/hallucination-detection&lt;/a&gt;.&lt;/p&gt;
</content:encoded><category>tutorials</category><category>news</category><author>Dens Sumesh</author></item><item><title>Uncle Flaviar: Conversational AI Drives 5-Figure Revenue Lift</title><link>https://trieve.ai/blog/ai-sales-agent-on-shopify-flaviar/</link><guid isPermaLink="true">https://trieve.ai/blog/ai-sales-agent-on-shopify-flaviar/</guid><pubDate>Mon, 19 May 2025 20:09:00 GMT</pubDate><content:encoded>&lt;h2&gt;About Flaviar&lt;/h2&gt;
&lt;p&gt;Flaviar isn&apos;t just an e-commerce platform; it&apos;s the nation&apos;s—and quite possibly the world&apos;s—premier destination for wine and spirits. Since its founding in 2011 by Jugoslav Petkovic, Matija Rijavec, and Grisa Soba, Flaviar has become a true ecommerce unicorn, known for its innovation and success.&lt;/p&gt;
&lt;p&gt;With a &lt;strong&gt;curated selection&lt;/strong&gt; of rare and exclusive bottles, Flaviar offers members a unique journey of discovery through the world of fine beverages. Its innovative &lt;strong&gt;tasting boxes&lt;/strong&gt; and personalized recommendations further elevate the experience, making it much more than just a place to buy drinks.&lt;/p&gt;
&lt;h2&gt;A Discerning AI for a Discerning Audience: How Trieve and Flaviar Teamed Up&lt;/h2&gt;
&lt;p&gt;Flaviar built its business on a deep understanding of its customers and products. Long before the current creator economy, they made an early bet on serving knowledgeable consumers who are constantly in the loop. This strategy has paid off, compounding over the years into a brand synonymous with reliability, quality, and a fiercely loyal community. When we connected with their team about integrating AI features, their forward-thinking culture was a perfect match for ours.&lt;/p&gt;
&lt;p&gt;The challenge for Flaviar was clear: how to embrace the AI wave while maintaining their consistently on-brand, positive customer experience.&lt;/p&gt;
&lt;h2&gt;In Their Own Words&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Vasja Fužir, Lead Product Manager at Flaviar, shares his perspective:&lt;/strong&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Trieve has genuinely transformed how our shoppers navigate the store&lt;/strong&gt;. Their AI chat guides product discovery in a way that feels natural, intuitive, and genuinely helpful. From day one, their team has been incredibly responsive and proactive. &lt;strong&gt;They’ve acted like true partners, not just vendors—bringing ideas to the table, iterating fast&lt;/strong&gt;, and helping us get real value quickly.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;The tech lives up to its promise: personalized product suggestions, intelligent answers to shopper questions, and a smooth path from search to checkout. It adapts seamlessly to our catalog, syncs with our brand voice, and reduces the load on our support team—&lt;strong&gt;all without requiring heavy setup on our end&lt;/strong&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This partnership ensures Flaviar remains at the forefront of AI innovation, with Trieve continuously delivering cutting-edge features that adapt to evolving market demands and customer expectations.&lt;/p&gt;
&lt;h2&gt;The Work &amp;amp; Results: Introducing Uncle Flaviar&lt;/h2&gt;
&lt;p&gt;We collaborated intensely with Flaviar to launch &lt;strong&gt;Uncle Flaviar&lt;/strong&gt;, a sitewide, branded, and savvy AI concierge. Its sole purpose? To help customers discover the products they truly want—or better yet, the ones they didn&apos;t even know they needed.&lt;/p&gt;
&lt;p&gt;Uncle Flaviar has consistently delivered a &lt;strong&gt;5-figure monthly increase in revenue&lt;/strong&gt;. We&apos;ve observed that simple, effective &quot;Quick Queries&quot; like &quot;Bottles to treat myself&quot; or &quot;Surprise me!&quot; do wonders. They:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Create momentum&lt;/strong&gt; in the purchasing journey.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Turn curiosity into purchase intent.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Convert purchase intent into add-to-carts&lt;/strong&gt; with brief, compelling descriptions tailored to each product.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Beyond product discovery, &lt;strong&gt;Trieve&apos;s advanced capabilities mean it can ingest your specific policies, return rules, and other critical business logic.&lt;/strong&gt; This allows the AI to accurately answer common &quot;Where Is My Order?&quot; (WISMO) and other customer service questions, further reducing support load and ensuring consistent, on-brand information delivery.&lt;/p&gt;
&lt;p&gt;We&apos;re incredibly grateful to Flaviar and their customers for entrusting us with their queries. We remain committed to delivering the latest in enterprise-grade AI, as we&apos;ve done for millions of searchers.&lt;/p&gt;
&lt;p&gt;If you&apos;re looking to fill an AI-shaped gap in your business, we&apos;d love to connect.&lt;/p&gt;
</content:encoded><category>reviews</category><author>Federico Chavez Torres</author></item><item><title>How to Build Agentic RAG for any PDF in 10 minutes</title><link>https://trieve.ai/blog/build-agentic-rag-for-any-pdf-in-10-minutes-with-chunkr-and-trieve/</link><guid isPermaLink="true">https://trieve.ai/blog/build-agentic-rag-for-any-pdf-in-10-minutes-with-chunkr-and-trieve/</guid><pubDate>Sun, 15 Jun 2025 21:01:00 GMT</pubDate><content:encoded>&lt;h2&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Retrieval Augmented Generation (RAG) has revolutionized how we build AI applications, allowing Large Language Models (LLMs) to answer questions based on custom data. But what if the LLM could &lt;em&gt;decide&lt;/em&gt; when and how to search that data, like a smart assistant? That&apos;s where &lt;strong&gt;Agentic RAG&lt;/strong&gt; comes in.&lt;/p&gt;
&lt;p&gt;With Trieve, you can easily set up an agentic RAG pipeline that leverages advanced OCR for PDFs (via &lt;a href=&quot;https://chunkr.ai&quot;&gt;Chunkr&lt;/a&gt;) and gives your LLM the autonomy to intelligently query your knowledge base.&lt;/p&gt;
&lt;p&gt;If you&apos;re not interested in the guide and just want to see the code in order to give it to your agent as a starting point, you can find a fully complete CLI demonstrating this functionality in a single file on github at &lt;a href=&quot;https://github.com/devflowinc/trieve/blob/main/clients/cli/index.ts&quot;&gt;github.com/devflowinc/trieve/blob/main/clients/cli/index.ts&lt;/a&gt; or install it via &lt;a href=&quot;https://www.npmjs.com/package/trieve-cli&quot;&gt;&lt;code&gt;npm i -g trieve-cli&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here&apos;s a short video where I break down how agentic RAG performs against Gemini with the entire file in the context window along with naive RAG (no agentic search) for the 2025 CrossFit Games Rulebook. Credit to &lt;a href=&quot;https://canonical.chat/blog/model_assisted_generation&quot;&gt;canonical ai&apos;s original post&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&amp;lt;YoutubeEmbed
src=&quot;https://www.youtube.com/embed/SAV-esDsRUk?si=IvuvXfsZxSL9Y3jL&quot;
title=&quot;Analysis of Agentic RAG Performance vs. Gemini for the 2025 CrossFit Games Rulebook&quot;
/&amp;gt;&lt;/p&gt;
&lt;h2&gt;Step 1: Sign Up for Trieve and Set Up Your Dataset&lt;/h2&gt;
&lt;p&gt;If you haven&apos;t already, sign up for a Trieve account at &lt;a href=&quot;https://dashboard.trieve.ai/&quot;&gt;dashboard.trieve.ai&lt;/a&gt;. Once logged in, create a new dataset and upload your PDFs. Trieve will automatically process them using Chunkr, extracting text and metadata for efficient searching.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Prerequisites:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;A Trieve account (sign up at &lt;a href=&quot;https://dashboard.trieve.ai/&quot;&gt;dashboard.trieve.ai&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Copy your Trieve &lt;code&gt;API_KEY&lt;/code&gt;, &lt;code&gt;DATASET_ID&lt;/code&gt;, and &lt;code&gt;ORGANIZATION_ID&lt;/code&gt; from the dashboard.&lt;/li&gt;
&lt;li&gt;Node.js and npm/yarn installed.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;Step 2: Initialize Your Node.js Project and Trieve Client&lt;/h2&gt;
&lt;p&gt;Create a new Node.js script (e.g., &lt;code&gt;agentic-rag.js&lt;/code&gt;) and set up your Trieve client:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import fs from &quot;fs&quot;;
import { TrieveSDK, UpdateDatasetReqPayload } from &quot;trieve-ts-sdk&quot;;

// ---- Configuration ----
// Replace with your actual credentials
const TRIEVE_API_KEY = process.env.TRIEVE_API_KEY || &quot;YOUR_TRIEVE_API_KEY&quot;;
const TRIEVE_DATASET_ID =
  process.env.TRIEVE_DATASET_ID || &quot;YOUR_TRIEVE_DATASET_ID&quot;;
const TRIEVE_ORGANIZATION_ID =
  process.env.TRIEVE_ORGANIZATION_ID || &quot;YOUR_TRIEVE_ORGANIZATION_ID&quot;;

if (
  TRIEVE_API_KEY === &quot;YOUR_TRIEVE_API_KEY&quot; ||
  TRIEVE_DATASET_ID === &quot;YOUR_TRIEVE_DATASET_ID&quot; ||
  TRIEVE_ORGANIZATION_ID === &quot;YOUR_TRIEVE_ORGANIZATION_ID&quot;
) {
  console.error(
    &quot;Please set your TRIEVE_API_KEY, TRIEVE_DATASET_ID, and TRIEVE_ORGANIZATION_ID in the script or as environment variables.&quot;,
  );
  process.exit(1);
}

const trieveClient = new TrieveSDK({
  apiKey: TRIEVE_API_KEY,
  datasetId: TRIEVE_DATASET_ID,
  organizationId: TRIEVE_ORGANIZATION_ID, // Required for dataset updates
});

console.log(&quot;Trieve SDK initialized.&quot;);
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Step 3: Configure Your Agent&apos;s Search Tool&lt;/h2&gt;
&lt;p&gt;For an LLM to act as an agent, it needs clear instructions on &lt;em&gt;when&lt;/em&gt; and &lt;em&gt;how&lt;/em&gt; to use its tools (in this case, searching your Trieve dataset). We configure this at the dataset level.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;System Prompt (&lt;code&gt;SYSTEM_PROMPT&lt;/code&gt;):&lt;/strong&gt; This is the overarching instruction for the LLM. It should emphasize relying on the search tool.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tool Description (&lt;code&gt;tool_description&lt;/code&gt;):&lt;/strong&gt; This tells the LLM &lt;em&gt;when&lt;/em&gt; it should use the search tool. For robust RAG, you often want it to &lt;em&gt;always&lt;/em&gt; use the tool.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Query Parameter Description (&lt;code&gt;query_parameter_description&lt;/code&gt;):&lt;/strong&gt; This guides the LLM on &lt;em&gt;how&lt;/em&gt; to formulate its search queries effectively.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Tips and Tricks for Descriptions:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Be Explicit:&lt;/strong&gt; Don&apos;t assume the LLM knows. Tell it directly, e.g., &quot;ALWAYS call this search tool for EVERY user question.&quot;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Emphasize Data Freshness:&lt;/strong&gt; Remind the LLM that its internal knowledge might be outdated and the search tool provides current information from your documents.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Encourage Specificity in Queries:&lt;/strong&gt; Guide the LLM to extract keywords and form precise queries. Suggest trying multiple queries if the first attempt isn&apos;t fruitful.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Iterate:&lt;/strong&gt; These descriptions are powerful. Experiment with different phrasings to see what works best for your use case.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here&apos;s how to update your dataset configuration using the SDK:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;async function configureSearchTool() {
  console.log(&quot;🔧 Configuring dataset for agentic search...&quot;);
  try {
    const updatePayload: UpdateDatasetReqPayload = {
      dataset_id: TRIEVE_DATASET_ID, // Ensure this is the correct dataset ID
      server_configuration: {
        SYSTEM_PROMPT: &quot;You are an AI assistant that helps people find information in a set of documents. You have access to a search tool that can retrieve relevant information from the documents based on a query. YOU MUST ALWAYS CALL AND USE THE SEARCH TOOL FOR EVERY USER QUESTION WITHOUT EXCEPTION. Do not rely on your own knowledge - it may be outdated or incorrect. For each user question: 1) Use the search tool with a well-crafted query 2) If the first search doesn&apos;t yield satisfactory results, try additional searches with different terms 3) Only after searching should you formulate your response, citing the information found. Always inform the user that your answer is based on search results from their documents. If you don&apos;t find relevant information after multiple searches, be honest about this limitation.&quot;,
        TOOL_CONFIGURATION: {
          query_tool_options: {
            tool_description: &quot;ALWAYS use the search tool for EVERY user question, even if you think you already know the answer. Your knowledge is limited and potentially outdated - you must rely on the provided search tool to get the most accurate and up-to-date information.&quot;,
            query_parameter_description: &quot;Write a specific query with critical keywords from the user question. Use multiple search queries with different terms if needed to get comprehensive results.&quot;,
            // You can also define descriptions for other filters if you plan to use them agentically
            // price_filter_description: &quot;The page range filter to use for the search&quot;,
            // max_price_option_description: &quot;The maximum page to filter by&quot;,
            // min_price_option_description: &quot;The minimum page to filter by&quot;,
          },
        },
      },
    };

    await trieveClient.updateDataset(updatePayload);
    console.log(&quot;✅ Dataset configuration updated successfully!&quot;);
  } catch (error) {
    console.error(&quot;❌ Failed to update dataset configuration:&quot;, error instanceof Error ? error.message : error);
  }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Step 4: Upload and Chunk Your PDF with Chunkr&lt;/h2&gt;
&lt;p&gt;Chunkr is Trieve&apos;s advanced file processing service. When you upload a file (like a PDF) with the &lt;code&gt;chunkr_create_task_req_payload&lt;/code&gt; field, Chunkr uses sophisticated OCR technology that excels at understanding document layouts, tables, and images. This results in higher-quality chunks for your RAG pipeline.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Request Explanation:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;base64_file&lt;/code&gt;: The file content, base64 encoded.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;file_name&lt;/code&gt;: The original name of the file.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;group_tracking_id&lt;/code&gt; (Optional but Recommended): A unique ID you can use to later check the status of the file processing and group related chunks. If not provided, Trieve might generate one.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;chunkr_create_task_req_payload: {}&lt;/code&gt;: This empty object is the key! It signals Trieve to process this file using Chunkr for advanced chunking. You can pass specific Chunkr options here if needed, but an empty object uses sensible defaults.&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;async function uploadPdfWithChunkr(filePath: string, trackingId?: string) {
  console.log(`📤 Uploading PDF: ${filePath} with Chunkr...`);
  try {
    const fileBuffer = fs.readFileSync(filePath);
    const base64File = fileBuffer.toString(&apos;base64&apos;);
    const fileName = filePath.split(&apos;/&apos;).pop() || filePath;

    const generatedTrackingId = trackingId || `chunkr-doc-${fileName}-${Date.now()}`;

    const response = await trieveClient.uploadFile({
      base64_file: base64File,
      file_name: fileName,
      group_tracking_id: generatedTrackingId,
      chunkr_create_task_req_payload: {}, // This enables Chunkr processing!
    });

    console.log(&quot;📄 File upload initiated. Response:&quot;, response);
    console.log(`✨ File sent to Chunkr for processing. Tracking ID: ${generatedTrackingId}`);
    console.log(&quot;⏳ You can check processing status using this tracking ID with other Trieve endpoints.&quot;);
    return generatedTrackingId; // Return for potential status checking
  } catch (error) {
    console.error(&quot;❌ PDF upload failed:&quot;, error instanceof Error ? error.message : error);
  }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Note: File processing with Chunkr is asynchronous. The &lt;code&gt;uploadFile&lt;/code&gt; endpoint returns quickly, but the actual chunking happens in the background. You&apos;d typically use the &lt;code&gt;group_tracking_id&lt;/code&gt; to poll for completion status if needed, though for this example, we&apos;ll assume it completes.&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;Step 4: Asking Agentic Questions&lt;/h2&gt;
&lt;p&gt;Now for the magic! To ask an agentic question:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Create a Topic:&lt;/strong&gt; Topics are like conversation threads. Each agentic interaction typically happens within a topic.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Create a Message Reader:&lt;/strong&gt; When creating a message within a topic, set:
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;use_agentic_search: true&lt;/code&gt;: This tells Trieve to use the agentic flow defined by your dataset&apos;s tool configuration.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;model&lt;/code&gt;: Specify an LLM capable of agentic behavior. Trieve offers models like &lt;code&gt;o3&lt;/code&gt; (Claude 3 Opus), &lt;code&gt;c3.5s&lt;/code&gt; (Claude 3.5 Sonnet), &lt;code&gt;gpro&lt;/code&gt; (GPT-4o), &lt;code&gt;gpt4t&lt;/code&gt; (GPT-4 Turbo) which are well-suited for this. Check Trieve documentation for the latest list of supported models. &lt;code&gt;o3&lt;/code&gt; is a powerful option.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The response will be streamed back, often including &quot;thinking&quot; steps from the agent before the final answer.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import { TrieveSDK, ChunkMetadata, Topic } from &apos;trieve-ts-sdk&apos;;
import chalk from &apos;chalk&apos;; // Optional: for styled console output

// Assume trieveClient is already initialized and configured globally for this script
// const trieveClient = new TrieveSDK({ ... });

async function askAgenticQuestion(
  question: string,
  existingTopicId?: string,
  userId: string = &apos;default-blog-user-&apos; + Date.now() // Example user ID
): Promise&amp;lt;{ actualAnswer: string; parsedChunks: ChunkMetadata[]; topicId: string } | undefined&amp;gt; {
  console.log(chalk.blue(`🤔 Asking agentic question: &quot;${question}&quot;`));
  try {
    let topicIdToUse = existingTopicId;

    if (!topicIdToUse) {
      const topicName = question.substring(0, 50) + &quot;...&quot;; // Simple topic name
      console.log(chalk.magenta(`📝 Creating new topic: &quot;${topicName}&quot; for user &quot;${userId}&quot;`));
      // Ensure Topic type is correctly imported if using createTopic&apos;s return type explicitly
      const topicData: Topic = await trieveClient.createTopic({
        name: topicName,
        owner_id: userId,
      });
      topicIdToUse = topicData.id;
      console.log(chalk.magenta(`🏷️ Topic created with ID: ${topicIdToUse}`));
    }

    // This check is good practice, though topicIdToUse should be set if creation was successful
    if (!topicIdToUse) {
        console.error(chalk.red(&quot;❌ Critical error: Topic ID could not be established.&quot;));
        return undefined;
    }

    const { reader } = await trieveClient.createMessageReaderWithQueryId({
      topic_id: topicIdToUse,
      new_message_content: question,
      use_agentic_search: true, // Crucial for agentic behavior
      model: &quot;o3&quot;, // Or &quot;c3.5s&quot;, &quot;gpro&quot;, &quot;gpt4t&quot;. Using a powerful model is recommended.
                    // &quot;o3&quot; refers to Claude 3 Opus in Trieve&apos;s context.
    });

    console.log(chalk.cyan(&quot;💬 Streaming response from agent:&quot;));
    process.stdout.write(chalk.bold(&quot;🤖 Agent: &quot;)); // Start the agent&apos;s response line

    const decoder = new TextDecoder();
    let actualAnswer: string = &apos;&apos;;
    let parsedChunks: ChunkMetadata[] = [];
    let chunkDataAccumulator: string = &apos;&apos;; // Accumulates parts of the stream that might form chunk JSON
    let isParsingChunks: boolean = false;   // Flag: actively accumulating/expecting chunk JSON
    let isThinkingSection: boolean = false; // Flag: agent is sending &quot;thinking&quot; status messages

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      const streamChunkText = decoder.decode(value);

      // 1. Handle &quot;thinking&quot; or status messages from the agent stream
      if (streamChunkText.includes(&apos;🤔&apos;) || streamChunkText.includes(&apos;📝&apos;) || streamChunkText.includes(&apos;✅&apos;) || streamChunkText.includes(&apos;🔍&apos;)) {
        isThinkingSection = true; // We are in a special status update from the agent
        process.stdout.write(chalk.yellow(streamChunkText)); // Print thinking messages
        continue; // Process next part of the stream
      } else {
        // If we were in a thinking section and now receive a non-thinking chunk, reset the flag.
        // This ensures subsequent text is treated as part of the answer or chunk data.
        isThinkingSection = false;
      }

      // 2. Detect the start of the chunk JSON data section (inspired by CLI&apos;s logic)
      // Assumes chunk data starts with &apos;[{&apos;
      if (streamChunkText.includes(&apos;[{&apos;) &amp;amp;&amp;amp; !isParsingChunks) {
        isParsingChunks = true;
        chunkDataAccumulator = &apos;&apos;; // Reset accumulator for this new potential JSON block
      }

      // 3. Accumulate and parse chunk JSON data
      if (isParsingChunks) {
        chunkDataAccumulator += streamChunkText;

        // Check for the delimiter &quot;||&quot; which, by CLI convention, separates chunk JSON from the main answer
        if (chunkDataAccumulator.includes(&apos;||&apos;)) {
          isParsingChunks = false; // We&apos;ve likely received the full chunk JSON block
          const parts = chunkDataAccumulator.split(&apos;||&apos;);
          const jsonDataPart = parts[0].trim();

          if (jsonDataPart) {
            try {
              // The CLI expects the stream to format chunks as: [{ &quot;chunk&quot;: ChunkMetadata }, ...]
              const rawChunkObjects: { chunk: ChunkMetadata }[] = JSON.parse(jsonDataPart);
              if (Array.isArray(rawChunkObjects)) {
                parsedChunks = rawChunkObjects.map(item =&amp;gt; item.chunk);
                // Optional: Log that chunks were found during streaming
                process.stdout.write(chalk.dim(`\n[ℹ️ System: Extracted ${parsedChunks.length} reference chunks.]\n`));
                process.stdout.write(chalk.bold(&quot;🤖 Agent: &quot;)); // Re-prompt for agent answer part
              }
            } catch (e) {
              process.stdout.write(chalk.yellow(&apos;\n⚠️ Warning: Could not parse chunk JSON. Content (partial): &apos; + jsonDataPart.substring(0, 100) + &quot;...\n&quot;));
              // If parsing fails, it might be an error or unexpected format.
              // The CLI logs an error. Consider if this data should be part of `actualAnswer`.
              // For now, mirroring CLI&apos;s behavior of just warning.
            }
          }

          // The part after &quot;||&quot; is considered the start of the textual answer from the LLM
          if (parts[1]) {
            const answerInitialPart = parts[1];
            actualAnswer += answerInitialPart;
            process.stdout.write(answerInitialPart);
          }
          chunkDataAccumulator = &apos;&apos;; // Clear the accumulator
        }
        // If &quot;||&quot; is not yet found, `chunkDataAccumulator` continues to build in next iteration.
      } else if (!isThinkingSection) {
        // 4. Accumulate the actual textual answer from the LLM
        // This runs if a) not parsing chunks, and b) not in a &quot;thinking&quot; message from the agent
        actualAnswer += streamChunkText;
        process.stdout.write(streamChunkText);
      }
    }

    reader.releaseLock();
    process.stdout.write(&apos;\n&apos;); // Ensure a new line after the full streamed response

    console.log(chalk.green(&apos;\n✅ Agentic response complete.&apos;));

    // Optional: Summarize found chunks after the stream
    if (parsedChunks.length &amp;gt; 0) {
        console.log(chalk.blueBright(`\n📚 Summary of ${parsedChunks.length} references found:`));
        parsedChunks.forEach((chunk, index) =&amp;gt; {
            console.log(
                chalk.grey(`  Ref ${index + 1}: ID (${chunk.tracking_id || chunk.id.substring(0,8)}) `) +
                (chunk.link ? chalk.cyan(`Link: ${chunk.link} `) : &apos;&apos;) +
                (chunk.metadata ? chalk.magenta(`File: ${chunk.metadata.file_name || &apos;N/A&apos;}`) : &apos;&apos;)
            );
        });
    } else {
        console.log(chalk.yellow(&apos;⚠️ No reference chunks were explicitly parsed from this stream via &quot;||&quot; delimiter.&apos;));
    }

    return { actualAnswer, parsedChunks, topicId: topicIdToUse };

  } catch (error) {
    console.error(chalk.red(&apos;❌ Failed to process agentic question:&apos;), error instanceof Error ? error.message : error);
    return undefined;
  }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Putting It All Together&lt;/h2&gt;
&lt;p&gt;Let&apos;s create a simple &lt;code&gt;main&lt;/code&gt; function to run these steps. Make sure you have a PDF file (e.g., &lt;code&gt;sample.pdf&lt;/code&gt;) in the same directory or provide a full path.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;async function main() {
  // Step 1: Configure the dataset&apos;s search tool (run once or when you want to update)
  await configureSearchTool();

  // Step 2: Upload a PDF using Chunkr
  // Replace &apos;sample.pdf&apos; with the path to your PDF file
  const pdfFilePath = &quot;sample.pdf&quot;;
  if (!fs.existsSync(pdfFilePath)) {
    console.error(
      `Error: PDF file not found at ${pdfFilePath}. Please create a sample.pdf or update the path.`,
    );
    return;
  }
  const trackingId = await uploadPdfWithChunkr(pdfFilePath);

  if (!trackingId) {
    console.error(&quot;File upload failed, cannot proceed to ask question.&quot;);
    return;
  }

  // Give some time for Chunkr to process (in a real app, you&apos;d poll status)
  console.log(
    &quot;\n⏳ Waiting 30 seconds for Chunkr to process the PDF (adjust as needed for larger files)...&quot;,
  );
  await new Promise((resolve) =&amp;gt; setTimeout(resolve, 30000));

  // Step 3: Ask an agentic question related to the content of your PDF
  const question = &quot;What are the main topics discussed in this document?&quot;; // Change this to fit your PDF
  const result = await askAgenticQuestion(question);

  if (result) {
    console.log(`\n🗣️ You asked: &quot;${question}&quot;`);
    // The fullResponse might contain structured data depending on the stream.
    // For this example, we primarily focused on printing text as it came.
    // console.log(`\n📝 Agent&apos;s Full Streamed Response (may include intermediate steps):\n${result.fullResponse}`);
  }
}

main().catch(console.error);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;To run this:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Save the combined code as &lt;code&gt;agentic-rag.js&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Place a &lt;code&gt;sample.pdf&lt;/code&gt; in the same directory or update &lt;code&gt;pdfFilePath&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Set your environment variables or update the placeholders in the script.&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;node agentic-rag.js&lt;/code&gt; (if you&apos;re not using TypeScript directly, you might need &lt;code&gt;ts-node agentic-rag.ts&lt;/code&gt; or compile it first).&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;You&apos;ve just built a powerful Agentic RAG pipeline! By configuring your Trieve dataset&apos;s tool descriptions, leveraging Chunkr for superior PDF processing, and enabling agentic search, you&apos;ve empowered an LLM to intelligently query your custom documents.&lt;/p&gt;
&lt;p&gt;This is just the beginning. You can expand on this by:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Implementing robust status checking for file uploads.&lt;/li&gt;
&lt;li&gt;Building a more sophisticated UI to handle streamed responses and citations.&lt;/li&gt;
&lt;li&gt;Experimenting with different agentic models and prompt engineering for your tool descriptions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Trieve makes complex AI tasks like Agentic RAG surprisingly accessible. Happy building!&lt;/p&gt;
</content:encoded><category>explainers</category><category>tutorials</category><author>Nick Khami</author></item><item><title>Build a Hotel Voice Agent with Trieve + Vapi</title><link>https://trieve.ai/blog/build-hotel-voice-assistant-with-trieve-and-vapi/</link><guid isPermaLink="true">https://trieve.ai/blog/build-hotel-voice-assistant-with-trieve-and-vapi/</guid><pubDate>Tue, 04 Mar 2025 18:12:00 GMT</pubDate><content:encoded>&lt;p&gt;If you are a more API focused user and just interested in the HTTP requests you would use to build this system, then please reference the included cURL requests throughout the blog.&lt;/p&gt;
&lt;p&gt;&amp;lt;VimeoEmbed src=&quot;https://player.vimeo.com/video/1067180490?badge=0&amp;amp;autopause=0&amp;amp;player_id=0&amp;amp;app_id=58479&quot; title=&quot;Build a Hotel Voice Assistant Using Trieve and Vapi&quot; /&amp;gt;&lt;/p&gt;
&lt;h1&gt;Hotel Voice Agent Requirements&lt;/h1&gt;
&lt;p&gt;We are going to be building a voice assistant for &lt;a href=&quot;https://www.ihg.com/vignettecollection/hotels/us/en/san-francisco/sfosh/hoteldetail&quot;&gt;Hotel Spero in San Francisco&lt;/a&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;[x] Answer questions about check-in/check-out time&lt;/li&gt;
&lt;li&gt;[x] Provide directions to the hotel and parking information&lt;/li&gt;
&lt;li&gt;[x] Provide information about hotel amenities (e.g., gym, pool, restaurant, spa)&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;Trieve Portion&lt;/h1&gt;
&lt;p&gt;Creating a Trieve account is a pre-requisite to this tutorial. Navigate to &lt;a href=&quot;https://trieve.ai&quot;&gt;trieve.ai&lt;/a&gt; and click &quot;Sign Up&quot; to create your account and access the dashboard. Alternatively, navigate directly to &lt;a href=&quot;https://dashboard.trieve.ai&quot;&gt;dashboard.trieve.ai&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Step 1: Create a Trieve Dataset&lt;/h2&gt;
&lt;p&gt;Datasets in Trieve correlate to knowledge bases. Typically you will want to have one &lt;code&gt;dataset&lt;/code&gt; per voice agent in your system.&lt;/p&gt;
&lt;p&gt;You can create your first dataset by clicking on the &quot;create dataset&quot; button when you sign up for Trieve:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.trieve.ai/blog/hotel-voice-agent/hotel-voice-agent-create-dataset.png&quot; alt=&quot;create trieve dataset&quot; /&gt;&lt;/p&gt;
&lt;p&gt;After pressing that button you will see the following modal, simply enter your dataset name and press enter. You can ignore the advanced settings.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.trieve.ai/blog/hotel-voice-agent/create-dataset-modal.png&quot; alt=&quot;create dataset modal&quot; /&gt;&lt;/p&gt;
&lt;p&gt;You can also do this via an HTTP request. If creating the dataset via HTTP, make sure to copy the ID of the dataset from the response and save it for later requests.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl &apos;https://api.trieve.ai/api/dataset&apos; \
  -H &apos;Authorization: &amp;lt;replace-with-your-trieve-api-key&amp;gt;&apos; \
  -H &apos;cache-control: no-cache&apos; \
  -H &apos;content-type: application/json&apos; \
  -H &apos;tr-organization: &amp;lt;replace-with-your-organization-id&amp;gt;&apos; \
  --data-raw &apos;{&quot;dataset_name&quot;:&quot;vignette-san-francisco&quot;}&apos;
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Step 2: Trigger a crawl&lt;/h2&gt;
&lt;p&gt;Hotel Spero&apos;s website contains information to answer every common question about the hotel. Parking, directions, gym, pool, and other details are all available. We will be performing a crawl of their website to make that information accessible to our voice agent.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Navigate to the Crawling Options in the left sidenav on the dashboard page for your dataset&lt;/li&gt;
&lt;li&gt;Paste the site&apos;s URL into the &lt;code&gt;Site URL&lt;/code&gt; input&lt;/li&gt;
&lt;li&gt;Click start new crawl&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.trieve.ai/blog/hotel-voice-agent/hotel-crawl-guide.png&quot; alt=&quot;trigger a crawl&quot; /&gt;&lt;/p&gt;
&lt;p&gt;You can see the progress in the results page after your crawl is triggered:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.trieve.ai/blog/hotel-voice-agent/crawl-progress-screenshot.png&quot; alt=&quot;crawl progress screenshot&quot; /&gt;&lt;/p&gt;
&lt;p&gt;You can also trigger the crawl through an HTTP request:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl &apos;https://api.trieve.ai/api/crawl&apos; \
  -H &apos;Authorization: &amp;lt;replace-with-your-trieve-api-key&amp;gt;&apos; \
  -H &apos;content-type: application/json&apos; \
  -H &apos;tr-dataset: &amp;lt;replace-with-your-dataset-id&amp;gt;&apos; \
  --data-raw &apos;{&quot;crawl_options&quot;:{&quot;allow_external_links&quot;:false,&quot;boost_titles&quot;:true,&quot;exclude_paths&quot;:[],&quot;exclude_tags&quot;:[&quot;navbar&quot;,&quot;footer&quot;,&quot;aside&quot;,&quot;nav&quot;,&quot;form&quot;, &quot;header&quot;],&quot;include_paths&quot;:[],&quot;include_tags&quot;:[],&quot;interval&quot;:&quot;daily&quot;,&quot;limit&quot;:1000,&quot;site_url&quot;:&quot;https://www.ihg.com/vignettecollection/hotels/us/en/san-francisco/sfosh/hoteldetail&quot;,&quot;scrape_options&quot;:null}}&apos;
&lt;/code&gt;&lt;/pre&gt;
&lt;h1&gt;Vapi Portion&lt;/h1&gt;
&lt;p&gt;Now that we have all of the Hotel&apos;s context pulled into to a &lt;code&gt;Dataset&lt;/code&gt;, we will need to create a Voice assistant with Vapi and connect it to the Trieve &lt;code&gt;Dataset&lt;/code&gt; for Hotel Spiro as a Knowledge Base.&lt;/p&gt;
&lt;p&gt;&amp;lt;Info&amp;gt;
Creating a Vapi account is a pre-requisite to this tutorial. Navigate to &lt;a href=&quot;https://vapi.ai&quot;&gt;vapi.ai&lt;/a&gt; and click &quot;Sign Up&quot; to create your account and access the dashboard. Alternatively, navigate directly to &lt;a href=&quot;https://dashboard.vapi.ai&quot;&gt;dashboard.vapi.ai&lt;/a&gt;.
&amp;lt;/Info&amp;gt;&lt;/p&gt;
&lt;h2&gt;Add your Trieve API Key as a Vapi Provider Credential&lt;/h2&gt;
&lt;h3&gt;Create and copy Trieve api key in your &lt;a href=&quot;https://dashboard.trieve.ai&quot;&gt;Trieve dashboard at dashboard.trieve.ai&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Click on the on the create api key button in the &quot;API Key&quot; tab of the Trieve dashboard.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.trieve.ai/blog/hotel-voice-agent/create-new-key-screenshot.png&quot; alt=&quot;create a trieve api key&quot; /&gt;&lt;/p&gt;
&lt;h3&gt;Copy your API key from Trieve&lt;/h3&gt;
&lt;p&gt;You will need to copy the key to add it to Vapi after generating it.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.trieve.ai/blog/hotel-voice-agent/copy-api-key.png&quot; alt=&quot;trieve copy api key&quot; /&gt;&lt;/p&gt;
&lt;h3&gt;Navigate your Provider Credentials Page in Vapi&lt;/h3&gt;
&lt;p&gt;Now that you have the Trieve copied, you need to navigate to the provider credentials settings in Vapi.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.trieve.ai/blog/hotel-voice-agent/navigate-to-vapi-provider-credentials.png&quot; alt=&quot;vapi provider credential&quot; /&gt;&lt;/p&gt;
&lt;h3&gt;Add your Trieve API key to Vapi&lt;/h3&gt;
&lt;p&gt;Vapi needs to have your Trieve key so it can make API calls to your &lt;code&gt;Dataset&lt;/code&gt; within Trieve.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.trieve.ai/blog/hotel-voice-agent/vapi-add-trieve-key.png&quot; alt=&quot;add trieve api key to vapi&quot; /&gt;&lt;/p&gt;
&lt;h2&gt;Create a Vapi Knowledge Base that connects to Trieve&lt;/h2&gt;
&lt;p&gt;Vapi has a HTTP API route documented at &lt;a href=&quot;https://docs.vapi.ai/api-reference/knowledge-bases/create&quot;&gt;docs.vapi.ai/api-reference/knowledge-bases/create&lt;/a&gt; which allows you to create a knowledge base.&lt;/p&gt;
&lt;h3&gt;Save your Dataset ID from the Trieve dashboard&lt;/h3&gt;
&lt;p&gt;Open your dataset in the Trieve dashboard and save it&apos;s ID so you can connect it to Vapi via HTTP API call later. You will need to copy this later.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.trieve.ai/blog/hotel-voice-agent/trieve-copy-dataset-id.png&quot; alt=&quot;trieve copy dataset id&quot; /&gt;&lt;/p&gt;
&lt;h3&gt;Copy your Vapi secret key from the Vapi dashboard&lt;/h3&gt;
&lt;p&gt;You will need your secret API key from Vapi in order to complete the final required step and connect your Trieve knowledge-base.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.trieve.ai/blog/hotel-voice-agent/vapi-copy-api-key.png&quot; alt=&quot;copy Vapi API key&quot; /&gt;&lt;/p&gt;
&lt;h3&gt;Make the HTTP API call to create the Knowledge Base in Vapi that connects to Trieve&lt;/h3&gt;
&lt;p&gt;Navigate to Vapi&apos;s docs for the request at &lt;a href=&quot;https://docs.vapi.ai/api-reference/knowledge-bases/create&quot;&gt;docs.vapi.ai/api-reference/knowledge-bases/create&lt;/a&gt; and click on the &quot;try it&quot; button. Then follow these steps.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.trieve.ai/blog/hotel-voice-agent/create-knowledge-base-try-it.png&quot; alt=&quot;vapi create KB try it button&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Once the popover opens, you will need to fill in the fields to make the request in order to create your knowledge base.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Paste your Vapi api key into the token text input field&lt;/li&gt;
&lt;li&gt;Select Trieve for the request body&lt;/li&gt;
&lt;li&gt;Select Import for &lt;code&gt;createPlan&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Paste your Trieve &lt;code&gt;Dataset&lt;/code&gt; ID for the &lt;code&gt;providerId&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Add the &lt;code&gt;name&lt;/code&gt; parameter and set it to the name of your KB&lt;/li&gt;
&lt;li&gt;Press send request&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.trieve.ai/blog/hotel-voice-agent/send-request-to-create-knowledge-base.png&quot; alt=&quot;send request to create knowledge base&quot; /&gt;&lt;/p&gt;
&lt;p&gt;If your request succeeds, you will see the following message:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.trieve.ai/blog/hotel-voice-agent/successful-kb-response.png&quot; alt=&quot;successful create KB response&quot; /&gt;&lt;/p&gt;
&lt;p&gt;This step can also be performed via a HTTP request through cURL or another method of your choice:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl -X POST https://api.vapi.ai/knowledge-base \
     -H &quot;Authorization: Bearer &amp;lt;replace-with-your-vapi-key&amp;gt;&quot; \
     -H &quot;Content-Type: application/json&quot; \
     -d &apos;{
      &quot;provider&quot;: &quot;trieve&quot;,
      &quot;searchPlan&quot;: {
        &quot;searchType&quot;: &quot;hybrid&quot;,
        &quot;scoreThreshold&quot;: 0
      },
      &quot;createPlan&quot;: {
        &quot;type&quot;: &quot;import&quot;,
        &quot;providerId&quot;: &quot;&amp;lt;replace-with-your-trieve-api-key&amp;gt;&quot;
      },
      &quot;name&quot;: &quot;IHG-SF-Hotel&quot;
    }&apos;
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Setup your Voice agent in the Vapi dashboard&lt;/h2&gt;
&lt;p&gt;The final step required to create your hotel voice agent is creating the Assistant inside of Vapi.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Click Create Assistant&lt;/li&gt;
&lt;li&gt;Setup your prompts&lt;/li&gt;
&lt;li&gt;Select the knowledge base you named and created in the previous step.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.trieve.ai/blog/hotel-voice-agent/create-vapi-assistant.png&quot; alt=&quot;create the Vapi assistant&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Here is the raw text of our first message and prompts:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;First message: Hi! I\&apos;m James here to assist you with any questions about the Spiro hotel in San Francisco

System prompt: You are Steve, the friendly and witty voice assistant for Hotel Spiro located in the heart of downtown San Francisco. Our doors are open 24/7 to welcome guests from near and far. You handle inquiries about the hotel, from room availability and amenities to event bookings.  Here’s your conversational blueprint:  1\. Start with a lighthearted greeting. 2\. Gather the caller\&apos;s full name and purpose for calling. 3\. Use the retrieved context to help yourself respond to users and assist them with their queries about the hotel    Remember to keep things casual and quick, like a friendly chat. Be concise, using friendly fillers like \&quot;Well...\&quot;, \&quot;You know...\&quot;, and \&quot;Let\&apos;s see...\&quot;. Your goal is to make guests feel welcome while efficiently helping with their requests.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you would prefer, you can also do this via HTTP request as follows:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl -X POST https://api.vapi.ai/assistant \
     -H &quot;Authorization: Bearer bec788fe-cdaf-4126-82ee-a90aa06dad47&quot; \
     -H &quot;Content-Type: application/json&quot; \
     -d &apos;{
      &quot;model&quot;: {
        &quot;provider&quot;: &quot;openai&quot;,
        &quot;model&quot;: &quot;gpt-4o-mini&quot;,
        &quot;knowledgeBaseId&quot;: &quot;31bd355c-1137-4a1f-93a9-f01020207d05&quot;,
        &quot;messages&quot;: [
          {
            &quot;content&quot;: &quot;You are Steve, the friendly and witty voice assistant for Hotel Spiro located in the heart of downtown San Francisco. Our doors are open 24/7 to welcome guests from near and far. You handle inquiries about the hotel, from room availability and amenities to event bookings.  Here’s your conversational blueprint:  1\. Start with a lighthearted greeting. 2\. Gather the caller\&apos;s full name and purpose for calling. 3\. Use the retrieved context to help yourself respond to users and assist them with their queries about the hotel    Remember to keep things casual and quick, like a friendly chat. Be concise, using friendly fillers like \&quot;Well...\&quot;, \&quot;You know...\&quot;, and \&quot;Let\&apos;s see...\&quot;. Your goal is to make guests feel welcome while efficiently helping with their requests.&quot;,
            &quot;role&quot;: &quot;system&quot;
          }
        ]
      },
      &quot;firstMessage&quot;: &quot;Hi! I\&apos;m James here to assist you with any questions about the Spiro hotel in San Francisco&quot;,
      &quot;name&quot;: &quot;James&quot;
    }&apos;
&lt;/code&gt;&lt;/pre&gt;
&lt;h1&gt;Conclusion&lt;/h1&gt;
&lt;p&gt;And with that, you&apos;ve successfully built a hotel voice assistant using Trieve and Vapi! By leveraging Trieve&apos;s powerful knowledge base capabilities and Vapi&apos;s voice agent platform, you&apos;ve created a system that can answer common customer questions, provide directions, and offer information about hotel amenities. This setup not only enhances the guest experience but also streamlines operations, freeing up staff to focus on more complex tasks.&lt;/p&gt;
&lt;p&gt;This tutorial provides a solid foundation for building similar voice agents for other hotels or even different types of businesses. Remember to tailor your datasets and prompts to the specific needs of your target audience. As you continue to explore the possibilities of AI-powered voice assistants, consider experimenting with different AI models, prompt engineering techniques, and knowledge base strategies to further optimize your results. The future of customer service is conversational, and with tools like Trieve and Vapi, you&apos;re well-equipped to be at the forefront of this exciting revolution. Happy building!&lt;/p&gt;
</content:encoded><category>tutorials</category><category>explainers</category></item><item><title>How we Built 300μs Typo Detection for 1.3M Words in Rust</title><link>https://trieve.ai/blog/building-blazingly-fast-typo-correction-in-rust/</link><guid isPermaLink="true">https://trieve.ai/blog/building-blazingly-fast-typo-correction-in-rust/</guid><pubDate>Mon, 09 Sep 2024 13:42:00 GMT</pubDate><content:encoded>&lt;p&gt;We launched our &lt;a href=&quot;https://hn.trieve.ai&quot;&gt;Hacker News search and RAG engine&lt;/a&gt; with a half-baked typo correction system. Our first draft took 30+ms for correctly spelled queries which was slow enough that we defaulted it to off. Our latest version is 100 times faster, 300μs for correctly spelled queries and ~5ms/word for misspellings. We explain how it was accomplished in this post!&lt;/p&gt;
&lt;p&gt;![video demo of spellcheck](&lt;a href=&quot;https://cdn.trieve.ai/blog/building-30%CE%BCs-typo-tolerance-for-1.3M-words%20using%20Rust/typo-tolerance-demo.gif&quot;&gt;https://cdn.trieve.ai/blog/building-30%CE%BCs-typo-tolerance-for-1.3M-words%20using%20Rust/typo-tolerance-demo.gif&lt;/a&gt;)&lt;/p&gt;
&lt;h1&gt;Sample Queries You Can Try&lt;/h1&gt;
&lt;p&gt;Click the links to try the typo correction system out yourself on &lt;a href=&quot;https://hn.trieve.ai&quot;&gt;hn.trieve.ai&lt;/a&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://hn.trieve.ai/?score_threshold=5&amp;amp;page_size=30&amp;amp;prefetch_amount=30&amp;amp;rerank_type=none&amp;amp;highlight_delimiters=+%2C-%2C_%2C.%2C%2C&amp;amp;highlight_threshold=0.85&amp;amp;highlight_max_length=50&amp;amp;highlight_max_num=50&amp;amp;highlight_window=0&amp;amp;recency_bias=0&amp;amp;highlight_results=true&amp;amp;use_quote_negated_terms=true&amp;amp;q=OpnAi&amp;amp;storyType=story&amp;amp;matchAnyAuthorNames=&amp;amp;matchNoneAuthorNames=&amp;amp;popularityFilters=%7B%7D&amp;amp;sortby=relevance&amp;amp;dateRange=all&amp;amp;searchType=fulltext&amp;amp;page=1&amp;amp;getAISummary=false&quot;&gt;OpnAI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://hn.trieve.ai/?score_threshold=5&amp;amp;page_size=30&amp;amp;prefetch_amount=30&amp;amp;rerank_type=none&amp;amp;highlight_delimiters=+%2C-%2C_%2C.%2C%2C&amp;amp;highlight_threshold=0.85&amp;amp;highlight_max_length=50&amp;amp;highlight_max_num=50&amp;amp;highlight_window=0&amp;amp;recency_bias=0&amp;amp;highlight_results=true&amp;amp;use_quote_negated_terms=true&amp;amp;q=Cnva+devloper+platfirm&amp;amp;storyType=story&amp;amp;matchAnyAuthorNames=&amp;amp;matchNoneAuthorNames=&amp;amp;popularityFilters=%7B%7D&amp;amp;sortby=relevance&amp;amp;dateRange=all&amp;amp;searchType=fulltext&amp;amp;page=1&amp;amp;getAISummary=false&quot;&gt;Cnva devloper platfirm&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://hn.trieve.ai/?score_threshold=5&amp;amp;page_size=30&amp;amp;prefetch_amount=30&amp;amp;rerank_type=none&amp;amp;highlight_delimiters=+%2C-%2C_%2C.%2C%2C&amp;amp;highlight_threshold=0.85&amp;amp;highlight_max_length=50&amp;amp;highlight_max_num=50&amp;amp;highlight_window=0&amp;amp;recency_bias=0&amp;amp;highlight_results=true&amp;amp;use_quote_negated_terms=true&amp;amp;q=prviacy+focsed+email&amp;amp;storyType=story&amp;amp;matchAnyAuthorNames=&amp;amp;matchNoneAuthorNames=&amp;amp;popularityFilters=%7B%7D&amp;amp;sortby=relevance&amp;amp;dateRange=all&amp;amp;searchType=fulltext&amp;amp;page=1&amp;amp;getAISummary=false&quot;&gt;prviacy focsed email&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;Creating a dictionary of Words and Frequencies&lt;/h1&gt;
&lt;p&gt;For small datasets, this is an easy task. You can scroll ~1000 HN post size text blobs in 10 seconds with one worker and basic word splitting. However, as you scale to the size of our &lt;a href=&quot;https://hn.trieve.ai&quot;&gt;Hacker News Demo (38M+ posts)&lt;/a&gt;, work needs to be distributed.&lt;/p&gt;
&lt;p&gt;Eventually, we decided on 2 distinct workers for dictionary building:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/devflowinc/trieve/blob/main/server/src/bin/word-id-cronjob.rs&quot;&gt;Cronjob&lt;/a&gt; to scroll all of the documents present in each of our users&apos; search indices and add chunk ids from our database into a Redis queue 500 at a time.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://github.com/devflowinc/trieve/blob/main/server/src/bin/word-worker.rs&quot;&gt;Word worker&lt;/a&gt; that pops off the queue and procesesses 500 chunks at a time. Text for each chunk is pulled, split into words, and each word is then loaded into Clickhouse.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;We chose &lt;a href=&quot;https://clickhouse.com/&quot;&gt;ClickHouse&lt;/a&gt; to store the dictionary as we ran into deadlock and performance issues with Postgres writes as we scaled the number of workers. ClickHouse&apos;s async inserts are fantastic for this task and allowed us to ingest the entire 38M+ document dataset in &amp;lt; 1hr.&lt;/p&gt;
&lt;h1&gt;Using a BKTree data structure to identify and correct typos&lt;/h1&gt;
&lt;p&gt;We take the &lt;a href=&quot;https://nullwords.wordpress.com/2013/03/13/the-bk-tree-a-data-structure-for-spell-checking/&quot;&gt;standard approach to typo correction&lt;/a&gt; and build per-dataset Burkhard-Keller Trees (BKTrees) for efficient comparision of words in the search query and the dataset&apos;s dictionary in O(log N) time complexity. Explaining this data structure in depth is outside the scope of this blog, but you can read our &lt;a href=&quot;https://github.com/devflowinc/trieve/blob/main/server/src/operators/typo_operator.rs#L35-112&quot;&gt;Rust implementation here&lt;/a&gt; or its &lt;a href=&quot;https://en.wikipedia.org/wiki/BK-tree&quot;&gt;wiki&lt;/a&gt; for more information.&lt;/p&gt;
&lt;p&gt;We utilized a third &lt;a href=&quot;https://github.com/devflowinc/trieve/blob/main/server/src/bin/bktree-worker.rs&quot;&gt;bktree-worker&lt;/a&gt; to build the BKTrees. It takes datasets with completed dictonaries stored in Clickhouse then uses their words and frequencies to construct a tree.&lt;/p&gt;
&lt;p&gt;Once the BKTree is constructed, the worker then stores it in Redis such that it can be efficiently loaded into the API server&apos;s memory when needed at first query time for a given dataset.&lt;/p&gt;
&lt;p&gt;This was challenging for larger datasets where the tree was hundreds of megabytes large and timed out redis on write and read. We developed a &lt;a href=&quot;https://github.com/devflowinc/trieve/blob/main/server/src/operators/typo_operator.rs#L40-112&quot;&gt;serialization method&lt;/a&gt; which flattens and gzips to reduce the size in redis as well as the latency when pulling and pushing from it.&lt;/p&gt;
&lt;h1&gt;Writing the Business Logic to Perform Typo Corrections&lt;/h1&gt;
&lt;p&gt;On the API server side, in our &lt;a href=&quot;https://github.com/devflowinc/trieve/blob/main/server/src/operators/typo_operator.rs&quot;&gt;typo_operator&lt;/a&gt;, we optimized to reduce time required for corrections down to ~300μs for correctly spelled queries and ~10ms/word for mispelled queries.&lt;/p&gt;
&lt;p&gt;![typo-operator-graph](&lt;a href=&quot;https://cdn.trieve.ai/blog/building-30%CE%BCs-typo-tolerance-for-1.3M-words%20using%20Rust/typo-operator-graph.webp&quot;&gt;https://cdn.trieve.ai/blog/building-30%CE%BCs-typo-tolerance-for-1.3M-words%20using%20Rust/typo-operator-graph.webp&lt;/a&gt;)&lt;/p&gt;
&lt;h2&gt;Pulling from Redis&lt;/h2&gt;
&lt;p&gt;Pulling a massive data structure, like the BKTree for HN, from Redis takes 300+μs. This is &lt;strong&gt;nonviable&lt;/strong&gt; to do on each search, so we developed a cache layer server-side to store BKTrees after they had been pulled once using &lt;code&gt;lazy_static!&lt;/code&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;lazy_static! {
    static ref BKTREE_CACHE: BKTreeCache = BKTreeCache::new();
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;On the first search with typo-tolerance enabled, we initiate a ~200-400ms cold start to pull the BKTree for the dataset being queried from Redis into server memory. Searches following this operation then use the BKTree to check for typos which only takes 100-300μs.&lt;/p&gt;
&lt;h2&gt;Identifying English Words&lt;/h2&gt;
&lt;h3&gt;1. Preliminary English Word Identification&lt;/h3&gt;
&lt;p&gt;Since our BKTrees are constructed solely from dataset-specific dictionaries, they may not have all valid English words. To prevent inaccurate corrections of legitimate words absent from our trees, we use a preliminary English word identification step:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We maintain an in-memory hashset of approximately 400,000 English words, stored using &lt;code&gt;lazy_static!&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;static ref ENGLISH_WORDS: HashSet&amp;lt;String&amp;gt; = {
        include_str!(&quot;../words.txt&quot;)
            .lines()
            .map(|s| s.to_lowercase())
            .collect()
    };
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;2. Affix Analysis&lt;/h3&gt;
&lt;p&gt;We then check for if the word is just an english word with a prefix or suffix:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We construct separate Tries for common prefixes and suffixes.&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;static ref PREFIX_TRIE: Trie = {
        let prefixes = vec![
            &quot;anti&quot;, &quot;auto&quot;, &quot;de&quot;, &quot;dis&quot;, &quot;down&quot;, &quot;extra&quot;, &quot;hyper&quot;, &quot;il&quot;, &quot;im&quot;, &quot;in&quot;, &quot;ir&quot;, &quot;inter&quot;,
            &quot;mega&quot;, &quot;mid&quot;, &quot;mis&quot;, &quot;non&quot;, &quot;over&quot;, &quot;out&quot;, &quot;post&quot;, &quot;pre&quot;, &quot;pro&quot;, &quot;re&quot;, &quot;semi&quot;, &quot;sub&quot;,
            &quot;super&quot;, &quot;tele&quot;, &quot;trans&quot;, &quot;ultra&quot;, &quot;un&quot;, &quot;under&quot;, &quot;up&quot;,
        ];
        Trie::new(&amp;amp;prefixes)
    };
    static ref SUFFIX_TRIE: Trie = {
        let suffixes = vec![
            &quot;able&quot;, &quot;al&quot;, &quot;ance&quot;, &quot;ation&quot;, &quot;ative&quot;, &quot;ed&quot;, &quot;en&quot;, &quot;ence&quot;, &quot;ent&quot;, &quot;er&quot;, &quot;es&quot;, &quot;est&quot;,
            &quot;ful&quot;, &quot;ian&quot;, &quot;ible&quot;, &quot;ic&quot;, &quot;ing&quot;, &quot;ion&quot;, &quot;ious&quot;, &quot;ise&quot;, &quot;ish&quot;, &quot;ism&quot;, &quot;ist&quot;, &quot;ity&quot;,
            &quot;ive&quot;, &quot;ize&quot;, &quot;less&quot;, &quot;ly&quot;, &quot;ment&quot;, &quot;ness&quot;, &quot;or&quot;, &quot;ous&quot;, &quot;s&quot;, &quot;sion&quot;, &quot;tion&quot;, &quot;ty&quot;,
            &quot;y&quot;,
        ];
        Trie::new(&amp;amp;suffixes)
    };
&lt;/code&gt;&lt;/pre&gt;
&lt;ol&gt;
&lt;li&gt;For each word in the query, we search these Tries to identify the longest matching prefix and suffix.&lt;/li&gt;
&lt;li&gt;We then strip these affixes from the word, leaving us with the root.&lt;/li&gt;
&lt;li&gt;After stripping, we perform a final dictionary check:&lt;/li&gt;
&lt;li&gt;The stripped root is searched against the english word corpus.&lt;/li&gt;
&lt;/ol&gt;
&lt;pre&gt;&lt;code&gt;fn is_likely_english_word(word: &amp;amp;str) -&amp;gt; bool {
    if ENGLISH_WORDS.contains(&amp;amp;word.to_lowercase()) {
        return true;
    }

    // Check for prefix
    if let Some(prefix_len) = PREFIX_TRIE.longest_prefix(word) {
        if ENGLISH_WORDS.contains(&amp;amp;word[prefix_len..].to_lowercase()) {
            return true;
        }
    }

    // Check for suffix
    if let Some(suffix_len) = SUFFIX_TRIE.longest_suffix(word) {
        if ENGLISH_WORDS.contains(&amp;amp;word[..word.len() - suffix_len].to_lowercase()) {
            return true;
        }
    }

    // Check for compound words
    if word.contains(&apos;-&apos;) {
        let parts: Vec&amp;lt;&amp;amp;str&amp;gt; = word.split(&apos;-&apos;).collect();
        if parts
            .iter()
            .all(|part| ENGLISH_WORDS.contains(∂.to_lowercase()))
        {
            return true;
        }
    }

    false
}
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;4. BKTree Search for Non-Dictionary Words&lt;/h3&gt;
&lt;p&gt;For words that don&apos;t pass our English word checks, we initiate a BKTree search:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Query the BKTree to find the closest matching words.&lt;/li&gt;
&lt;li&gt;Generate a set of candidate corrections for each non-dictionary word.&lt;/li&gt;
&lt;/ol&gt;
&lt;pre&gt;&lt;code&gt;let mut best_correction = None;
            let mut best_score = 0;

            for ((correction, freq), distance) in tree.find(word.to_string(), max_distance) {
                if distance == 0 {
                    best_correction = None;
                    break;
                }
                if !is_best_correction(word, correction) {
                    continue;
                }

                let score = (max_distance - distance) * 1000 + *freq as isize;

                if score &amp;gt; best_score || best_correction.is_none() {
                    best_correction = Some(correction);
                    best_score = score;
                }
            }

            if let Some(correction) = best_correction {
                corrections.insert(word, correction.to_string());
            }
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;5. Correction Selection&lt;/h3&gt;
&lt;p&gt;From the set of correction candidates, we use a scoring algorithm to select the best correction:&lt;/p&gt;
&lt;p&gt;Our algorithm prioritizes prefix matches and factors in the frequency of each candidate word within the dataset.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;fn is_best_correction(word: &amp;amp;str, correction: &amp;amp;str) -&amp;gt; bool {
    // Length-based filter
    let len_diff = (word.len() as i32 - correction.len() as i32).abs();
    if len_diff &amp;gt; 2 {
        return false;
    }

    // Prefix matching (adjust the length as needed)
    let prefix_len = std::cmp::min(1, std::cmp::min(word.len(), correction.len()));
    if word[..prefix_len] != correction[..prefix_len] {
        return false;
    }

    // Character set comparison
    let word_chars: HashSet&amp;lt;char&amp;gt; = word.chars().collect();
    let correction_chars: HashSet&amp;lt;char&amp;gt; = correction.chars().collect();
    let common_chars = word_chars.intersection(&amp;amp;correction_chars).count();
    let similarity_ratio =
        common_chars as f32 / word_chars.len().max(correction_chars.len()) as f32;

    similarity_ratio &amp;gt;= 0.8
}
&lt;/code&gt;&lt;/pre&gt;
&lt;h1&gt;Unexpected Benefits&lt;/h1&gt;
&lt;p&gt;&lt;a href=&quot;https://news.ycombinator.com/item?id=41396655&quot;&gt;Levitating commented on HN&lt;/a&gt; that a query for &lt;code&gt;FreeBSD&lt;/code&gt; sorted by points returned irrelevant results. Our tokenizer splits on camel case so &lt;code&gt;FreeBSD&lt;/code&gt; got turned into &lt;code&gt;Free BSD FreeBSD&lt;/code&gt; and stories only containing the word &quot;Free&quot; had more points than anything containing &quot;FreeBSD&quot; and thus ranked higher. Being able to effectively check for full words on a dataset-level allowed us to automatically require non-English words such that a query for &lt;code&gt;FreeBSD&lt;/code&gt; turned into &lt;code&gt;&quot;FreeBSD&quot;&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;![auto required word demo](&lt;a href=&quot;https://cdn.trieve.ai/blog/building-30%CE%BCs-typo-tolerance-for-1.3M-words%20using%20Rust/auto-required-word-demo.gif&quot;&gt;https://cdn.trieve.ai/blog/building-30%CE%BCs-typo-tolerance-for-1.3M-words%20using%20Rust/auto-required-word-demo.gif&lt;/a&gt;)&lt;/p&gt;
&lt;h1&gt;Future Ideas&lt;/h1&gt;
&lt;p&gt;We plan to leverage this same system to implement query splitting and concatenation as those features share the same requirement of quickly looking up words in a dictionary.&lt;/p&gt;
&lt;p&gt;Trieve will always pursue the best possible relevance out of the box! Try it on our &lt;a href=&quot;https://hn.trieve.ai&quot;&gt;HN search engine&lt;/a&gt;, &lt;a href=&quot;https://dashboard.trieve.ai&quot;&gt;sign up for a free cloud account&lt;/a&gt;, or &lt;a href=&quot;https://docs.trieve.ai/self-hosting/aws&quot;&gt;see our self-hosting guides&lt;/a&gt;.&lt;/p&gt;
</content:encoded><category>explainers</category><category>tutorials</category><author>Dens Sumesh</author></item><item><title>Building Search For the YC Company Directory With Trieve, Bun, and SolidJS</title><link>https://trieve.ai/blog/building-search-for-yc-company-directory/</link><guid isPermaLink="true">https://trieve.ai/blog/building-search-for-yc-company-directory/</guid><pubDate>Mon, 19 Feb 2024 18:52:00 GMT</pubDate><content:encoded>&lt;h2&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Want to sign on Trieve as a Vendor or just talk to a human? Book a meeting with a founder at Trieve &lt;a href=&quot;https://cal.com/nick.k/meet&quot;&gt;here on cal.com&lt;/a&gt;. Also &lt;a href=&quot;https://github.com/devflowinc/trieve&quot;&gt;star us on Github here&lt;/a&gt; if you have not already!&lt;/p&gt;
&lt;p&gt;This blog post is going to act as a tutorial showing you how to use Trieve in order to build updated search for the YC company directory using &lt;a href=&quot;https://bun.sh&quot;&gt;bun&lt;/a&gt; and &lt;a href=&quot;https://www.solidjs.com/&quot;&gt;SolidJS&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;See the complete code for this tutorial at &lt;a href=&quot;https://github.com/devflowinc/yc-companies&quot;&gt;github.com/devflowinc/yc-companies&lt;/a&gt; and try the finished project at &lt;a href=&quot;https://yc.trieve.ai&quot;&gt;yc.trieve.ai&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Step 1. Exploratory Data Analysis&lt;/h2&gt;
&lt;p&gt;Before doing anything else, we need to figure out how we are going to scrape the data in order to build this demo. The first step of that is looking at the &lt;a href=&quot;https://www.ycombinator.com/companies&quot;&gt;public YC company directory&lt;/a&gt; and seeing what public data is available.&lt;/p&gt;
&lt;h3&gt;1. Each listed company is represented via its own page linked in the search result list&lt;/h3&gt;
&lt;p&gt;Turns out that there is a lot you can learn from inspect element. In this case by looking at the DOM and having the context that a lot of YC software development is in Rails, I am willing to gamble that there is a &quot;view&quot; for each company. This is because Rails traditionally follows a MVC (Model View Controller) pattern.&lt;/p&gt;
&lt;p&gt;![Company page link available on each listing](&lt;a href=&quot;https://cdn.trieve.ai/yc-directory-tutorial/yc-dir-company-links.png&quot;&gt;https://cdn.trieve.ai/yc-directory-tutorial/yc-dir-company-links.png&lt;/a&gt;)&lt;/p&gt;
&lt;h3&gt;2. The company view offers the data in a neat JSON object as an HTML attribute!&lt;/h3&gt;
&lt;p&gt;Aha! I feel a sense of vindication looking at the DOM for AirBNB as it does seem to be a traditional view. Even luckier, all of the data about a given company is present in an attribute with in the first element of the page.&lt;/p&gt;
&lt;p&gt;![JSON data representing company available in DOM attribute](&lt;a href=&quot;https://cdn.trieve.ai/yc-directory-tutorial/yc-dom-attribute-with-data.png&quot;&gt;https://cdn.trieve.ai/yc-directory-tutorial/yc-dom-attribute-with-data.png&lt;/a&gt;)&lt;/p&gt;
&lt;h3&gt;3. Plan out an approach for pulling this JSON object from the DOM for every company&lt;/h3&gt;
&lt;p&gt;We have discovered that each YC company in the directory has a page with a JSON object containing all of its data, now what?&lt;/p&gt;
&lt;h4&gt;1. Write a script to get the link to every company available in the directory&lt;/h4&gt;
&lt;p&gt;I don&apos;t think it&apos;s too relevant to this tutorial to explain how I did this, but here is a a link to the full &lt;a href=&quot;https://gist.github.com/skeptrunedev/0e389b6532020f8512180b4f131ceb2b&quot;&gt;github gist&lt;/a&gt; containing the script I pasted into the console to get all of the company links if you are interested in reading it. My co-founder Denzell always gets a laugh over how much work I&apos;m willing to do in the console.&lt;/p&gt;
&lt;h4&gt;4. Settle on a language for pulling each company&apos;s view from the link and getting the JSON object&lt;/h4&gt;
&lt;p&gt;Typically in these spots I pick &lt;code&gt;python&lt;/code&gt; because &lt;code&gt;beautifulsoup4&lt;/code&gt; is such an amazing package for these kinds of things done headlessly on a server. However, I am more comfortable in JS and vaguely knew that &lt;a href=&quot;https://bun.sh&quot;&gt;bun&lt;/a&gt;, a new javascript server runtime written in Zig, was advertised as a good solution as well.&lt;/p&gt;
&lt;p&gt;Because I wanted to try Bun out and this should be a relatively small task either way, I opted for &lt;code&gt;typescript&lt;/code&gt; ran via bun.&lt;/p&gt;
&lt;h2&gt;Step 2. Sign up For Trieve&apos;s Free Cloud To Get an API_KEY and DATASET_ID&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;Navigate to &lt;a href=&quot;https://dashboard.trieve.ai&quot;&gt;dashboard.trieve.ai&lt;/a&gt; and sign in or make an account&lt;/li&gt;
&lt;li&gt;On the first page you see, click &lt;strong&gt;create dataset&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;On the dataset creation page, save your &lt;code&gt;dataset_id&lt;/code&gt; to use later&lt;/li&gt;
&lt;li&gt;Click the button to create an API key&lt;/li&gt;
&lt;li&gt;Create a Read+Write type API key, save the value to use later on&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;Step 3. Use bun to pull the DOM containing the view for each company and extract the data from it&lt;/h2&gt;
&lt;h3&gt;1. Install bun and scaffold an empty project using &lt;code&gt;bun init&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;I am going to cede to the &lt;a href=&quot;https://bun.sh/docs/cli/init&quot;&gt;bun docs for bun init&lt;/a&gt; here instead of detailing how this works. On the bun docs page, you can also find an &lt;a href=&quot;https://bun.sh/docs/installation&quot;&gt;install guide&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;2. Use the &lt;code&gt;happy-dom&lt;/code&gt; package to headlessly pull data from the View for each company&lt;/h3&gt;
&lt;p&gt;To start, run &lt;code&gt;bun add -d @happy-dom/global-registrator&lt;/code&gt; to add &lt;code&gt;happy-dom&lt;/code&gt; as a dependency to your bun project.&lt;/p&gt;
&lt;p&gt;Now, at the top of your &lt;code&gt;index.ts&lt;/code&gt; file we are going to import &lt;code&gt;Window&lt;/code&gt; from &lt;code&gt;happy-dom&lt;/code&gt; and instantiate a document to use &lt;code&gt;querySelector&lt;/code&gt; with later on.&lt;/p&gt;
&lt;p&gt;Note that we add the docstring comment for the dom lib such that we get the benefit of typescript types for the DOM objects.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;/// &amp;lt;reference lib=&quot;dom&quot; /&amp;gt;

import { Window } from &quot;happy-dom&quot;;

const window = new Window();
const document = window.document;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Continuing, we are going to write a function to make a GET request for the HTML page at each company&apos;s URL and extract its data.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;const processLink = async (companyUrl: string, groupIds: string[]) =&amp;gt; {
  try {
    const pageRespHtml = await fetch(companyUrl);
    const pageRespText = await pageRespHtml.text();
    document.body.innerHTML = pageRespText;

    // get the first div that has a data-page attribute
    const divs = document.body.querySelectorAll(&quot;div&quot;);
    divs.forEach(async (div) =&amp;gt; {
      const dataPage = div?.getAttribute(&quot;data-page&quot;);
      if (!dataPage) {
        return;
      }

      const bulkData = JSON.parse(dataPage).props;
      const companyData = bulkData.company;
      await processCompanyChunk(companyData, groupIds);
    });
  } catch (e) {
    console.error(&quot;error processing link&quot;, companyUrl, e);
    return;
  }
};
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;3. Stitch each companies structured data into an unstructured text blob and send it to Trieve for indexing and storage&lt;/h3&gt;
&lt;p&gt;First, because Trieve does not yet have a typescript SDK, we will specify the request data for creating a chunk as an &lt;code&gt;interface&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;export interface CreateChunkData {
  chunk_html: string;
  group_ids: string[];
  link: string;
  tag_set: string[];
  tracking_id: string;
  metadata: Object;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next, we will write a function that accepts a &lt;code&gt;CreateChunkData&lt;/code&gt; type object and then uses it to call Trieve&apos;s &lt;code&gt;create_chunk&lt;/code&gt; route.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;const createChunk = async (chunkData: CreateChunkData) =&amp;gt; {
  const response = await fetch(`${API_URL}/chunk`, {
    method: &quot;POST&quot;,
    headers: {
      &quot;Content-Type&quot;: &quot;application/json&quot;,
      Authorization: Bun.env.API_KEY ?? &quot;&quot;,
      &quot;TR-Dataset&quot;: Bun.env.DATASET_ID ?? &quot;&quot;,
    },
    body: JSON.stringify(chunkData),
  });
  if (!response.ok) {
    console.error(&quot;error creating chunk&quot;, response.status, response.statusText);
    const respText = await response.text();
    console.error(&quot;error creating chunk&quot;, respText);
    return &quot;&quot;;
  }

  const responseJson = await response.json();
  if (!response.ok) {
    console.error(&quot;error creating chunk&quot;, responseJson.message);
    return &quot;&quot;;
  }
  console.log(&quot;success creating chunk&quot;, responseJson.chunk_metadata.id);
  const chunkId = responseJson.chunk_metadata.id;

  return chunkId as string;
};
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To fully process the data we still need to stitch each company&apos;s structured data into an unstructured blob which we can use for the &lt;code&gt;chunk_html&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;const processCompanyChunk = async (
  bulkDataCompany: any,
  groupIds: string[],
) =&amp;gt; {
  const group_id = await createChunkGroup(
    bulkDataCompany.name as string,
    bulkDataCompany.one_liner as string,
  );

  const companyName = &quot;&amp;lt;h1&amp;gt;&quot; + bulkDataCompany.name + &quot;&amp;lt;/h1&amp;gt;&quot;;
  const companyOneLiner = &quot;&amp;lt;h3&amp;gt;&quot; + bulkDataCompany.one_liner + &quot;&amp;lt;/h3&amp;gt;&quot;;
  const companyLongDescription =
    &quot;&amp;lt;p&amp;gt;&quot; + bulkDataCompany.long_description + &quot;&amp;lt;/p&amp;gt;&quot;;
  const companyLocation =
    &quot;&amp;lt;p&amp;gt;&quot; +
    &quot;Located in &quot; +
    bulkDataCompany.location +
    &quot;, &quot; +
    bulkDataCompany.country +
    &quot; and founded in &quot; +
    bulkDataCompany.year_founded +
    &quot;&amp;lt;/p&amp;gt;&quot;;
  const chunk_html =
    &quot;&amp;lt;div&amp;gt;&quot; +
    companyName +
    companyOneLiner +
    companyLongDescription +
    companyLocation +
    &quot;&amp;lt;/div&amp;gt;&quot;;

  const link = bulkDataCompany.ycdc_company_url;

  const tag_set =
    bulkDataCompany.batch_name +
    &quot;,&quot; +
    bulkDataCompany.tags.join(&quot;,&quot;) +
    &quot;,&quot; +
    bulkDataCompany.city_tag;

  const tracking_id = bulkDataCompany.id.toString();

  const company_name = bulkDataCompany.name;
  const company_one_liner = bulkDataCompany.one_liner;
  const company_long_description = bulkDataCompany.long_description;
  const batch = bulkDataCompany.batch_name;
  const company_location = bulkDataCompany.location;
  const company_city = bulkDataCompany.city;
  const company_city_tag = bulkDataCompany.city_tag;
  const company_country = bulkDataCompany.country;
  const company_year_founded = bulkDataCompany.year_founded;
  const company_website = bulkDataCompany.website;
  const company_linkedin = bulkDataCompany.linkedin_url;
  const company_twitter = bulkDataCompany.twitter_url;
  const company_facebook = bulkDataCompany.facebook_url;
  const company_crunchbase = bulkDataCompany.cb_url;
  const company_logo_url = bulkDataCompany.small_logo_url;
  const metadata = {
    company_name,
    company_one_liner,
    company_long_description,
    batch,
    company_location,
    company_city,
    company_city_tag,
    company_country,
    company_year_founded,
    company_website,
    company_linkedin,
    company_twitter,
    company_facebook,
    company_crunchbase,
    company_logo_url,
  };

  const chunkData: CreateChunkData = {
    chunk_html,
    group_ids: [group_id, ...groupIds],
    link,
    tag_set: tag_set.split(&quot;,&quot;),
    tracking_id,
    metadata,
  };

  await createChunk(chunkData);

  return group_id;
};

&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Notice that above, we keep all of the chunk&apos;s native data as &lt;code&gt;metadata&lt;/code&gt; on the &lt;code&gt;chunk&lt;/code&gt; resource we are creating. We do this to make our life easier when building the final frontend which users are going to interact with.&lt;/p&gt;
&lt;p&gt;See the details of running these functions for each company&apos;s URL in the full implementation on Github at &lt;a href=&quot;https://github.com/devflowinc/yc-companies/bun-scraper/index.ts&quot;&gt;github.com/devflowinc/yc-companies/bun-scraper&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Step 4. Build a SolidJS SPA for interacting with the dataset via search&lt;/h2&gt;
&lt;p&gt;The entire frontend application for this tutorial is available at &lt;a href=&quot;https://yc.trieve.ai&quot;&gt;yc.trieve.ai&lt;/a&gt; and written in a single typescript file which you can view at &lt;a href=&quot;https://github.com/devflowinc/yc-companies/blob/main/src/App.tsx&quot;&gt;github.com/devflowinc/yc-companies/blob/main/src/App.tsx&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I strongly recommend reading the &lt;a href=&quot;https://www.solidjs.com/docs/latest/api&quot;&gt;SolidJS documentation&lt;/a&gt; to get a more in-depth understanding of how to use SolidJS in your own projects. In my opinion, it is beyond the scope of this tutorial to go into depth about how we used SolidJS to fully build out the frontend. However, if you are interested. let me know and I may write a part 2!&lt;/p&gt;
&lt;p&gt;In this post, I primarily want to focus on how we search the dataset we indexed and stored in the previous step using Trieve.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;type SearchType = &quot;semantic&quot; | &quot;hybrid&quot; | &quot;fulltext&quot;;

const searchCompanies = async (
  curSortBy: string,
  curPage: number,
  curBatchTag: string,
) =&amp;gt; {
  setFetching(true);
  const response = await fetch(`${apiUrl}/chunk/search`, {
    method: &quot;POST&quot;,
    headers: {
      &quot;Content-Type&quot;: &quot;application/json&quot;,
      &quot;TR-Dataset&quot;: datasetId,
      Authorization: apiKey,
    },
    body: JSON.stringify({
      page: curPage,
      query: searchQuery(),
      search_type: searchType(),
      tag_set:
        curBatchTag === &quot;all batches&quot; ? [] : [curBatchTag.toUpperCase()],
      highlight_results: false,
      get_collisions: false,
    }),
  });

  const data = await response.json();
  const scoreChunks = data.score_chunks;
  if (curSortBy === &quot;recency&quot;) {
    scoreChunks.sort(
      (a: any, b: any) =&amp;gt;
        parseInt(b.metadata[0].metadata.batch.slice(-2)) -
        parseInt(a.metadata[0].metadata.batch.slice(-2)),
    );
  }

  if (curPage &amp;gt; 1) {
    setResultChunks((prevChunks) =&amp;gt; prevChunks.concat(scoreChunks));
  } else {
    setResultChunks(scoreChunks);
  }
  setFetching(false);
};
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There are 3 available search modes: &lt;code&gt;semantic&lt;/code&gt;, &lt;code&gt;hybrid&lt;/code&gt;, and &lt;code&gt;fulltext&lt;/code&gt;. Semantic search uses dense vector embeddings created with a Jina model and fulltext uses sparse vectors created via SPLADEv2. Hybrid is a mix of both where the final result set is ordered with &lt;code&gt;bge-reranker&lt;/code&gt;.&lt;/p&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Build with Trieve! Join us on &lt;a href=&quot;https://matrix.to/#/#trieve-general:trieve.ai&quot;&gt;Matrix&lt;/a&gt; or &lt;a href=&quot;https://discord.gg/E9sPRZqpDT&quot;&gt;Discord&lt;/a&gt; and tell us about what you want to build. Our community would love to hear about your ideas and plans to help you build however we can.&lt;/p&gt;
&lt;p&gt;Finally, if you liked this tutorial, please star us at &lt;a href=&quot;https://github.com/devflowinc/trieve&quot;&gt;github.com/devflowinc/trieve&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;Cheers! Hopefully I get to hear from you soon. Feel free to book a meeting with me &lt;a href=&quot;https://cal.com/nick.k/meet&quot;&gt;here on cal.com&lt;/a&gt;, follow on &lt;a href=&quot;https://x.com/skeptrune&quot;&gt;X&lt;/a&gt;, or &lt;a href=&quot;https://www.linkedin.com/in/nicholas-khami-5a0a7a135/&quot;&gt;connect on LinkedIn&lt;/a&gt;.&lt;/p&gt;
</content:encoded><category>tutorials</category><author>Nick Khami</author></item><item><title>Cheating at Search with LLMs</title><link>https://trieve.ai/blog/cheating-at-search-with-llms/</link><guid isPermaLink="true">https://trieve.ai/blog/cheating-at-search-with-llms/</guid><pubDate>Wed, 21 May 2025 12:55:00 GMT</pubDate><content:encoded>&lt;h2&gt;&lt;strong&gt;Introduction&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;We&apos;ve started doing this search technique that we&apos;ve been calling &quot;cheating at Search with LLMs&quot; and I thought it&apos;d be cool to talk about it. If you just want to see it live, go to &lt;a href=&quot;https://demos.trieve.ai/demos/lifestraw&quot;&gt;demos.trieve.ai/demos/lifestraw&lt;/a&gt;, open devtools, navigate to the network tab, and click through the &quot;group_oriented_search&quot; request and subsequent tool calls.&lt;/p&gt;
&lt;p&gt;&amp;lt;VimeoEmbed src=&quot;https://player.vimeo.com/video/1086333069?h=a79cc8567c&amp;amp;badge=0&amp;amp;autopause=0&amp;amp;player_id=0&amp;amp;app_id=58479&quot; title=&quot;Build a Hotel Voice Assistant Using Trieve and Vapi&quot; /&amp;gt;&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;The Problem: Search Can&apos;t Understand Intents Like Comparisons&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;For our Gen AI sales associate Shopify app, we wanted to make it possible to do cool things like generate a comparison table for any two products. Take this example from the brand LifeWater, which sells filterable straws. If a customer asks to &quot;compare the Sip against the Life Straw&quot; (two different products in their portfolio), we need to quickly look inside their catalog to determine which two products to fetch.&lt;/p&gt;
&lt;p&gt;The challenge? No traditional keyword, semantic, or hybrid search would ever be intelligent enough without an LLM to understand the exact two products being discussed.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Our Solution: Let the LLM Do the Hard Work&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;So we cheat. Here&apos;s how it works:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;First, we do a standard search with the user&apos;s query and get the top 20 results. Each group represents a product, and each chunk within that group is a variant of that product (like different colors or pack sizes).&lt;/li&gt;
&lt;li&gt;Then we use a tool called &quot;determine relevance&quot; that asks the LLM to rank each product as high, medium, or low relevance to the query. We pass each product&apos;s JSON, HTML, description text, and title to the LLM.&lt;/li&gt;
&lt;li&gt;The LLM examines each product and makes the call. For example, it might mark the Life Straw Sip Cotton Candy variant as &quot;high&quot; relevance, and the regular Life Straw as &quot;high&quot; relevance, while everything else gets &quot;medium&quot; or &quot;low.&quot;&lt;/li&gt;
&lt;li&gt;We then use these relevance rankings to display only the most relevant products to the user.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;&lt;strong&gt;Making It Fast&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;Despite making 20+ LLM calls in the background, the experience feels instantaneous to the user thanks to semantic caching on all the tool calls. If I run the same comparison again, it&apos;s blazing fast.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Going Even Further&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;We extend this approach to other aspects of search:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Price Filters&lt;/strong&gt;: We have a tool call that extracts min and max price parameters&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Category Determination&lt;/strong&gt;: For stores with predefined categories, we use LLMs to determine the right category&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Format Selection&lt;/strong&gt;: We use tool calls to decide whether to generate text or images&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Context Retention&lt;/strong&gt;: If a user follows up with &quot;tell me more about the Life Straw&apos;s filtration,&quot; we don&apos;t need to search again - we just use the same products from before&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;&lt;strong&gt;Why This Matters&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;It literally feels like cheating, which is incredible. In the early days, we spent a ton of time building super high-relevance search pipelines. But with modern LLMs, that&apos;s unnecessary. You can just fetch 20 things, give the LLM the query and each fetched item, and ask it which ones are relevant.&lt;/p&gt;
&lt;p&gt;Absolute madness. Intelligence as a commodity.&lt;/p&gt;
</content:encoded><category>explainers</category><author>Nicholas Khami</author></item><item><title>Build Search and RAG for Any Website with Firecrawl and Trieve</title><link>https://trieve.ai/blog/firecrawl-and-trieve/</link><guid isPermaLink="true">https://trieve.ai/blog/firecrawl-and-trieve/</guid><pubDate>Thu, 22 Aug 2024 19:19:00 GMT</pubDate><content:encoded>&lt;p&gt;In this guide, we will show how to use &lt;a href=&quot;https://www.firecrawl.dev/&quot;&gt;Firecrawl&lt;/a&gt; and Trieve to build search and RAG for &lt;a href=&quot;https://signoz.io/docs/&quot;&gt;SigNoz&apos;s documentation&lt;/a&gt; in both Python and JS.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://www.firecrawl.dev/&quot;&gt;Firecrawl&lt;/a&gt;&apos;s REST API can be queried to convert every page available at a URL into vector search and RAG-ready markdown.&lt;/p&gt;
&lt;p&gt;Trieve&apos;s API can then receive chunks of the markdown docs, embed and place them into a search index, and finally be called to perform AI Search and RAG on all of the site&apos;s content.&lt;/p&gt;
&lt;p&gt;All the code used (both node.js and Python) is also in this GitHub repo (MIT license): &lt;a href=&quot;https://github.com/devflowinc/firecrawl-to-trieve&quot;&gt;firecrawl-to-trieve&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Requirements - free API keys:&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Firecrawl API key: Signup at &lt;a href=&quot;https://www.firecrawl.dev/signin/signup&quot;&gt;firecrawl.dev&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Trieve API key and dataset id: Register and setup at &lt;a href=&quot;https://dashboard.trieve.ai/&quot;&gt;dashboard.trieve.ai&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Setup a .env file with:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;FIRECRAWL_API_KEY=
TRIEVE_DATASET_ID=
TRIEVE_API_KEY=
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;1. Converting Signoz docs to Markdown Chunks with Firecrawl&lt;/h2&gt;
&lt;p&gt;&amp;lt;TextBox&amp;gt;
&lt;a href=&quot;https://docs.firecrawl.dev/features/crawl&quot;&gt;Firecrawl Docs&lt;/a&gt; &amp;amp; &lt;a href=&quot;https://github.com/mendableai/firecrawl/&quot;&gt;Github&lt;/a&gt; (see also their &lt;a href=&quot;https://github.com/mendableai/firecrawl/tree/main/apps/js-sdk&quot;&gt;js-sdk&lt;/a&gt; and &lt;a href=&quot;https://github.com/mendableai/firecrawl/tree/main/apps/python-sdk&quot;&gt;python-sdk&lt;/a&gt;)
&amp;lt;/TextBox&amp;gt;&lt;/p&gt;
&lt;p&gt;We use Firecrawl to convert the pages on &lt;code&gt;https://signoz.io/docs/&lt;/code&gt; into markdown then save them to a json file (with a timestamp). You can adjust the crawler limit (there are only 303 pages so it falls within the free credits plan).&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import FirecrawlApp from &apos;@mendable/firecrawl-js&apos;;
import dotenv from &apos;dotenv&apos;;
import { fileURLToPath } from &apos;url&apos;;
import path from &apos;path&apos;;
import fs from &apos;fs&apos;;

const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);

dotenv.config({ path: path.resolve(__dirname, &apos;../.env&apos;) });

// Initialize the FirecrawlApp with your API key
const app = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });

// Define crawl parameters
const crawlUrl = &apos;https://signoz.io/docs/&apos;;
const params = {
    crawlerOptions: {
        limit: 500,
        maxDepth: 10,
        includes: [&apos;docs/*&apos;],
    },
    pageOptions: {
        onlyMainContent: true
    }
};

// Crawl the website
const crawlResult = await app.crawlUrl(
    crawlUrl,
    params,
    true, // wait_until_done
    2 // poll_interval
);

// Save the crawl result to a file (with a timestamp)
const timestamp = new Date().toISOString().replace(/:/g, &apos;-&apos;).slice(0, -5);
fs.writeFileSync(`crawl_results_${timestamp}.json`, 
    JSON.stringify(crawlResult, null, 2));
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Cleaning the Markdown&lt;/h3&gt;
&lt;p&gt;Each crawl result item has a &lt;code&gt;markdown&lt;/code&gt; field that contains the full markdown text of the page. We did some initial data cleaning as we took our first look at the data and worked out the chunking. You can always add more cleaning later. We cut boilerplate from the end and transformed how links were presented in the metadata. This was done through tested functions that we can easily add to as we start searching.&lt;/p&gt;
&lt;p&gt;&amp;lt;TextBox&amp;gt;
Initial cleaning functions are in &lt;code&gt;cleaners.py&lt;/code&gt; / &lt;code&gt;cleaners.js&lt;/code&gt;, imported into the transform scripts.
&amp;lt;/TextBox&amp;gt;&lt;/p&gt;
&lt;h3&gt;Chunking the Markdown&lt;/h3&gt;
&lt;p&gt;For a baseline chunking approach we just want to find ways to split pages into smaller chunks, while maintaining some semantic cohesion within the chunks. Just as with cleaning, we can always refine our chunking strategy as we gain more familarity with the data and our use cases.&lt;/p&gt;
&lt;p&gt;&amp;lt;TextBoxLearnMore&amp;gt;
Learn more about chunking techniques in &lt;a href=&quot;https://github.com/FullStackRetrieval-com/RetrievalTutorials/blob/a4570f3c4883eb9b835b0ee18990e62298f518ef/tutorials/LevelsOfTextSplitting/5_Levels_Of_Text_Splitting.ipynb&quot;&gt;Greg Kamradt&apos;s &quot;5 Levels Of Text Splitting&quot; Notebook&lt;/a&gt; and &lt;a href=&quot;https://unstructured.io/blog/chunking-for-rag-best-practices&quot;&gt;Maria Khalusova&apos;s &quot;Chunking for RAG: best practices&quot;&lt;/a&gt;
&amp;lt;/TextBoxLearnMore&amp;gt;&lt;/p&gt;
&lt;p&gt;Our transform script first makes chunks of any page that is less than 500 words (though this could easily be switched for a token-based limit based on the LLM of your RAG system). Then it splits longer content into chunks by top-level anchor links (e.g. &lt;code&gt;[](#overview)\nOverview&lt;/code&gt; ) and then splitting by h3, h4, etc (e.g. &lt;code&gt;### [](#step-1-setup-otel-collector)\n&lt;/code&gt;). (Explicit h1-h2 (i.e. &lt;code&gt;# this-is-h1&lt;/code&gt; and &lt;code&gt;## this-is-h2&lt;/code&gt;) headings for this dataset are not indicated in the markdown. The html itself has h2 headings for both the page title and for what is rendered here as the top-level anchor link headings but it is not represented with &lt;code&gt;#&lt;/code&gt; here.)&lt;/p&gt;
&lt;p&gt;This transform script uses regex to recursively split longer markdown content whenever the output was greater than &lt;code&gt;max_words&lt;/code&gt; (we used 500, determined via a crude split on whitespace) or until the split is at the &lt;code&gt;max_depth&lt;/code&gt; (we used &lt;code&gt;h4&lt;/code&gt;). So some chunks are longer than 500 words (25 of 972). We can later investigate lower heading-level splits, alternative split points, or manual edits for those chunks.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;function processContent(pageMarkdown, pageTitle, pageLink, pageTagsSet,
                        pageDescription, maxWords = CONFIGS.maxWords, maxDepth = CONFIGS.maxDepth) {

  // Splits content into sections based on headings
  function splitContent(content, pattern) {
    const matches = content.match(new RegExp(pattern, &apos;g&apos;)) || [];
    return matches.map((match, i) =&amp;gt; {
      const [, headingLink, headingText] = match.match(pattern);
      const start = content.indexOf(match);
      const end = i === matches.length - 1 ? content.length : content.indexOf(matches[i + 1]);
      return [headingLink, headingText, content.slice(start, end)];
    });
  }

  // Creates chunks from sections
  function createChunks(sections, currentTitle = &apos;&apos;, depth = 0) {
    const localChunks = [];
    let lastChunkHeadingOnly = false;

    sections.forEach(([headingLink, headingText, sectionContent]) =&amp;gt; {
      if (lastChunkHeadingOnly) {
        headingText = `${lastChunkHeadingOnly} - ${headingText}`;
        lastChunkHeadingOnly = false;
      }

      const chunkHtml = getChunkHtml(sectionContent, pageTitle, headingText, 0, null);

      if (chunkHtml === &quot;HEADING_ONLY&quot;) {
        lastChunkHeadingOnly = headingText;
        return;
      }

      const fullTitle = `${currentTitle}: ${headingText}`.replace(/^:\s+/, &apos;&apos;);
      const chunkWordCount = chunkHtml.split(/\s+/).length;
      const isWithinChunkingConstrains = chunkWordCount &amp;lt;= maxWords || depth &amp;gt;= maxDepth;
      // true if the chunk is within the word limit or we&apos;ve reached max depth.
      // we don&apos;t split further at max depth, even if over word limit

      if (isWithinChunkingConstrains) {
        localChunks.push(createChunkObject(chunkHtml, pageLink, headingLink, headingText,
                                           pageTagsSet, pageTitle, pageDescription, fullTitle));
      } else {
        const subsections = splitContent(sectionContent, 
          /(\\n###+ \\[\\]\\((#.*?)\\))\\n(.*?)\\n/);
        // regex to find subsections of the current section
        if (subsections.length &amp;gt; 0) {
          localChunks.push(...createChunks(subsections, fullTitle, depth + 1));
        } else {
          // if no subsections, create a chunk for the current section
          localChunks.push(createChunkObject(chunkHtml, pageLink, headingLink, headingText,
                                             pageTagsSet, pageTitle, pageDescription, fullTitle));
        }
      }
    });

    return localChunks;
  }

  const topSections = splitContent(pageMarkdown, /(\\n\\[\\]\\((#.*?)\\))\\n(.*?)\\n/);
  return createChunks(topSections);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We also take the heading itself and add it at the top of the chunk along with the original page title.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Example: The chunk for the &quot;Enable a Prometheus Receiver&quot; heading on the &quot;Send Metrics to SigNoz Cloud&quot; page is titled: &quot;Send Metrics to SigNoz Cloud: Enable a Prometheus Receiver&quot;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If the split results in a chunk consists of only a heading (&lt;code&gt;HEADING_ONLY&lt;/code&gt;), that chunk is skipped and the heading is added to the top of the subsequent chunk.&lt;/p&gt;
&lt;h3&gt;Preparing the Chunks for Trieve&lt;/h3&gt;
&lt;p&gt;The chunk will include various other fields, metadata and labels specific for Trieve:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;tracking_id&lt;/code&gt;: parsed from the URL + the id of the heading for splits&lt;/li&gt;
&lt;li&gt;&lt;code&gt;group_tracking_ids&lt;/code&gt;: parsed from the URL (this will let us link the page chunks together)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tag_set&lt;/code&gt;: parsed from the URL segments (ex. &lt;code&gt;https://signoz.io/docs/install/kubernetes/&lt;/code&gt; -&amp;gt; &lt;code&gt;[&apos;install&apos;, &apos;kubernetes&apos;]&lt;/code&gt;; optimized for filtering)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;image_urls&lt;/code&gt;: parsed from the markdown (the docs have both &lt;code&gt;png&lt;/code&gt; and &lt;code&gt;webp&lt;/code&gt; images; this will show within the Trieve search playground)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;metadata&lt;/code&gt;: parsed from the crawl result metadata (we save the page title and description (if it exists), in case useful later; can be filtered)&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;function createChunk(chunkHtml, pageLink, headingLink, headingText, pageTagsSet, 
                     pageTitle, pageDescription) {
  const chunk = {
    chunk_html: chunkHtml,
    link: pageLink + headingLink,
    tags_set: pageTagsSet,
    image_urls: getImages(chunkHtml),
    tracking_id: getTrackingId(pageLink + headingLink),
    group_tracking_ids: [getTrackingId(pageLink)],
    timestamp: TIMESTAMP,
    metadata: {
      title: pageTitle + &apos;: &apos; + headingText,
      page_title: pageTitle,
      page_description: pageDescription,
    }
  };

  return chunk;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&amp;lt;TextBox&amp;gt;
The groups are automatically created in Trieve for each tracking ID in the &lt;code&gt;group_tracking_ids&lt;/code&gt; array if a group with that tracking ID does not yet exist. Groups can be created (and edited) in the &lt;code&gt;chunk_group&lt;/code&gt; route. This lets you provide a name, description, tag_set and metadata for the group. See the docs: &lt;a href=&quot;https://docs.trieve.ai/api-reference/chunk-group/create-or-upsert-group-or-groups&quot;&gt;Create or Upsert Group or Groups&lt;/a&gt;
&amp;lt;/TextBox&amp;gt;&lt;/p&gt;
&lt;h2&gt;2. Storing the Chunks in Trieve for Search and RAG&lt;/h2&gt;
&lt;p&gt;We use the Trieve &lt;code&gt;api/chunk&lt;/code&gt; route to create the chunks in Trieve.&lt;/p&gt;
&lt;p&gt;&amp;lt;TextBox&amp;gt;
See the docs: &lt;a href=&quot;https://docs.trieve.ai/api-reference/chunk/create-or-upsert-chunk-or-chunks&quot;&gt;Create or Upsert Chunk or Chunks&lt;/a&gt;
&amp;lt;/TextBox&amp;gt;&lt;/p&gt;
&lt;p&gt;Chunks here are just parsed JSON in batches of 120.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;async function loadChunks(chunks, config, upsert = false) {
  const url = `${config.basePath}/api/chunk`;
  const headers = {
    &quot;TR-Dataset&quot;: config.datasetId,
    &quot;Authorization&quot;: `Bearer ${config.apiKey}`,
    &quot;Content-Type&quot;: &quot;application/json&quot;
  };

  chunks.forEach(chunk =&amp;gt; chunk.upsert_by_tracking_id = upsert);

  try {
    const { data } = await axios.post(url, chunks, { headers });
    console.log(`Successfully ${upsert ? &apos;upserted&apos; : &apos;created&apos;} batch of ` +
      `${chunks.length} chunks to ${config.datasetId}`);
    return data;
  } catch (error) {
    console.error(`Failed to ${upsert ? &apos;upsert&apos; : &apos;create&apos;} batch. ` +
      `Status: ${error.response?.status}, ` +
      `Data: ${JSON.stringify(error.response?.data)}, ` +
      `Message: ${error.message}`);
    throw error;
  }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Testing Search and RAG quality using Trieve Playgrounds&lt;/h3&gt;
&lt;p&gt;Trying out some searches and chatting with the chunks is one way to get a look at your data and the quality of your initial cleaning and chunking steps.&lt;/p&gt;
&lt;p&gt;After loading the chunks you can head to &lt;a href=&quot;https://dashboard.trieve.ai/&quot;&gt;dashboard.trieve.ai&lt;/a&gt;. Here you&apos;ll find a datasets table with your ingested dataset, listing the number of chunks along with quicklinks to settings and the search, RAG, and analytics playgrounds. Both the Search and RAG Playgrounds show the chunks with their &lt;code&gt;chunk_html&lt;/code&gt;, the links to their source pages, tracking IDs, and metadata.&lt;/p&gt;
&lt;h4&gt;Search Playground: &lt;a href=&quot;https://search.trieve.ai/&quot;&gt;search.trieve.ai&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;&amp;lt;TextBox&amp;gt;
This is a search interface optimized for fully exploring the search options in Trieve. You&apos;ll see configurations for adjusting filters, choosing between search types (ex. hybrid, semantic, fulltext (SPLADE), BM25, etc.), sorting and reranking. You can also do multi-query searches and group search (returning the corresponding group for the retrieved chunks). There are additional options to set the score threshold for retrieved results, page size, stop word removal, and various settings for managing Trieve&apos;s sub-sentence highlighting. Once a search has been conducted you will also see a button to &quot;Rate This Search&quot; which opens a modal with a 0-10 slider and notes field.
&amp;lt;/TextBox&amp;gt;&lt;/p&gt;
&lt;p&gt;Example searches to showcase the search types:&lt;/p&gt;
&lt;p&gt;Query: &lt;em&gt;visualizations&lt;/em&gt; (Semantic)
This is a dense vector search using cosine distance vectors on &lt;a href=&quot;https://jina.ai/news/jina-ai-launches-worlds-first-open-source-8k-text-embedding-rivaling-openai&quot;&gt;Jina english embeddings&lt;/a&gt;),
![image](https://cdn.trieve.ai/blog/firecrawl-and-trieve/visualizations_query_signoz_semantic.png)&lt;/p&gt;
&lt;p&gt;Query: &lt;em&gt;visualizations&lt;/em&gt; (Fulltext)
Notice how the retrieval model for our fulltext search type, &lt;a href=&quot;https://github.com/naver/splade&quot;&gt;SPLADE&lt;/a&gt;, is returning chunks containing words closer to the query rather than the broader semantics.
![image](https://cdn.trieve.ai/blog/firecrawl-and-trieve/visualizations_query_signoz_fulltext.png)&lt;/p&gt;
&lt;p&gt;Query: &lt;em&gt;visualizations&lt;/em&gt; (Hybrid)
This takes in results from both fulltext and semantic and re-ranks with a cross-encoder model (here the &lt;a href=&quot;https://huggingface.co/BAAI/bge-reranker-large&quot;&gt;BAAI/bge-reranker-large&lt;/a&gt;).
![image](https://cdn.trieve.ai/blog/firecrawl-and-trieve/visualizations_query_signoz_hybrid.png)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Group Search:&lt;/strong&gt;
Internally this uses the &lt;a href=&quot;https://docs.trieve.ai/api-reference/chunk-group/search-over-groups&quot;&gt;chunk_group/group_oriented_search&lt;/a&gt; route.&lt;/p&gt;
&lt;p&gt;Query: &lt;em&gt;OpenTelemetry FastAPI example&lt;/em&gt; (Hybrid)
![image](https://cdn.trieve.ai/blog/firecrawl-and-trieve/fastapi_example_query_signoz_semantic_group.png)&lt;/p&gt;
&lt;h4&gt;RAG Playground: &lt;a href=&quot;https://chat.trieve.ai/&quot;&gt;chat.trieve.ai&lt;/a&gt;&lt;/h4&gt;
&lt;p&gt;This is a question-and-answer chat interface to showcase and explore RAG in your dataset with Trieve. It has everything you are looking for: input at bottom, streaming response from the LLM, and documents (the retrieved chunks) used by the LLM in its response on the right-hand side. You can regenerate a response, ask follow-on questions, start new conversations, and open any of the retrieved chunks. The various RAG-relevant parameters can be adjusted in the dataset settings in the dashboard: LLM, HyDE prompt (optional), System Prompt, RAG prompt, and more. We support models available on OpenRouter: &lt;a href=&quot;https://openrouter.ai/docs/models&quot;&gt;openrouter.ai/docs/models&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What is HyDE?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Hypothetical Document Embeddings (HyDE) is a retrieval method where an LLM first generates a hypothetical document that might address a user&apos;s query, and then that document is used in a semantic search (instead of the original query) to retrieve documents from the dataset. Read more in &lt;a href=&quot;https://arxiv.org/abs/2212.10496&quot;&gt;Luyu Gao et al.&apos;s &quot;Precise Zero-Shot Dense Retrieval without Relevance Labels&quot;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Example interactions on the RAG interface:&lt;/p&gt;
&lt;p&gt;Question: &lt;em&gt;What is SigNoz?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;![image](https://cdn.trieve.ai/blog/firecrawl-and-trieve/what_signoz_chat_signoz.png)&lt;/p&gt;
&lt;p&gt;Question: &lt;em&gt;What are the benefits of OpenTelemetry?&lt;/em&gt;
![image](https://cdn.trieve.ai/blog/firecrawl-and-trieve/open_telemetry_benefits_chat_signoz.png)&lt;/p&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Firecrawl and Trieve are a powerful combination for quickly building search and RAG systems. &lt;a href=&quot;https://dashboard.trieve.ai/&quot;&gt;Get started today!&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Firecrawl: &lt;a href=&quot;https://firecrawl.dev/&quot;&gt;firecrawl.dev&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Trieve: &lt;a href=&quot;https://trieve.ai/&quot;&gt;trieve.ai&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Github repo: &lt;a href=&quot;https://github.com/devflowinc/firecrawl-to-trieve&quot;&gt;firecrawl-to-trieve&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</content:encoded><category>explainers</category><author>Daniel S. Griffin</author></item><item><title>History of HackerNews Search: From 2007 to 2024</title><link>https://trieve.ai/blog/history-of-hnsearch/</link><guid isPermaLink="true">https://trieve.ai/blog/history-of-hnsearch/</guid><pubDate>Mon, 12 Aug 2024 19:17:00 GMT</pubDate><content:encoded>&lt;p&gt;We at Trieve are going to be launching a search engine for HackerNews with some additional features soon and thought it would be worth studying the history of HN search before finalizing things. Here&apos;s what we found!&lt;/p&gt;
&lt;h4&gt;Update!&lt;/h4&gt;
&lt;p&gt;We launched at the end of August! Check it out at &lt;a href=&quot;https://hn.trieve.ai&quot;&gt;hn.trieve.ai&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here&apos;s &lt;a href=&quot;https://trieve.ai/launching-trieve-hn-discovery/&quot;&gt;a writeup on the launch&lt;/a&gt; (&lt;a href=&quot;https://news.ycombinator.com/item?id=41393005&quot;&gt;HN post&lt;/a&gt;).&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Note: Did the research using our own HN search engine! :)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;Note: I was happy looking at these old HN posts and seeing that so many of the comment/post&apos;ers from the early days of HN
were, or eventually became, founders of YC companies.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;First Gen | 2007-2011&lt;/h2&gt;
&lt;h3&gt;1. Search News YC (bigheadlabs) on March 17, 2007&lt;/h3&gt;
&lt;p&gt;Written and shared by Jason Yan, Founder/CTO of Disqus (S07) (aka &lt;a href=&quot;https://news.ycombinator.com/user?id=jasonyan&quot;&gt;jsonyan&lt;/a&gt;), on &lt;a href=&quot;https://news.ycombinator.com/item?id=4780&quot;&gt;March 17, 2007&lt;/a&gt;. Can still be viewed &lt;a href=&quot;https://web.archive.org/web/20070707020143/http://nycs.bigheadlabs.com/&quot;&gt;here on the internet archive&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.trieve.ai/blog/history-of-hnsearch/nycs.bigheadlabs.com.webp&quot; alt=&quot;search-news-yc-screenshot&quot; /&gt;&lt;/p&gt;
&lt;p&gt;I assume there was some indexing logic being done on the DJango server that Jason used.&lt;/p&gt;
&lt;h3&gt;2. ycsearch.com on June 27, 2007&lt;/h3&gt;
&lt;p&gt;Quickly hacked together by Keven Lin (YC S07) (aka &lt;a href=&quot;https://news.ycombinator.com/user?id=keven&quot;&gt;keven&lt;/a&gt;) on &lt;a href=&quot;https://news.ycombinator.com/item?id=31012&quot;&gt;June 27, 2007&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I could not find a screenshot on Internet Archive, but Keven explained he built it with &lt;a href=&quot;https://programmablesearchengine.google.com/about/&quot;&gt;cse.google.com&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;3. Search &apos;Hacker News&apos; (trk7.com) on September 18, 2007&lt;/h3&gt;
&lt;p&gt;Created and shared by Kesevan (aka &lt;a href=&quot;https://news.ycombinator.com/user?id=cosmok&quot;&gt;cosmok&lt;/a&gt;) on &lt;a href=&quot;https://news.ycombinator.com/item?id=56327&quot;&gt;September 18, 2007&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Can see the UI at this &lt;a href=&quot;https://web.archive.org/web/20080306153636/http://trk7.com/yc&quot;&gt;link on the internet archive&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.trieve.ai/blog/history-of-hnsearch/trk7-hnsearch.webp&quot; alt=&quot;trk7-search-hn-screenshot&quot; /&gt;&lt;/p&gt;
&lt;p&gt;In the text of the original post (see &lt;a href=&quot;https://news.ycombinator.com/item?id=1309589&quot;&gt;here&lt;/a&gt;), cosmok explains that he built it using Yahoo&apos;s search API.&lt;/p&gt;
&lt;h3&gt;4. searchyc.com (independent) on Dec 31, 2007 | First with Some Staying Power&lt;/h3&gt;
&lt;p&gt;Independently created and shared by Mike Cheng (aka &lt;a href=&quot;https://news.ycombinator.com/user?id=chengmi&quot;&gt;chengmi&lt;/a&gt;) and Alaska Miller (aka &lt;a href=&quot;https://news.ycombinator.com/user?id=chengmi&quot;&gt;alaskamiller&lt;/a&gt;) on &lt;a href=&quot;https://news.ycombinator.com/item?id=93864&quot;&gt;Dec 31, 2007&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I surmise from the comment on the post that the motivation here was ycsearch being limited in terms of HN-specific filters and the bigheadlabs one being un-maintained.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.trieve.ai/blog/history-of-hnsearch/searchyc.com.webp&quot; alt=&quot;search-yc-screenshot&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Judging from the last &lt;a href=&quot;https://news.ycombinator.com/item?id=35959&quot;&gt;paulg comment on this thread&lt;/a&gt; it seems like, similar to ycsearch, it was built using &lt;a href=&quot;https://programmablesearchengine.google.com/about/&quot;&gt;cse.google.com&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The HN community seemed to get a lot of value out of it as in a &lt;a href=&quot;https://news.ycombinator.com/item?id=2605959&quot;&gt;HN thread posted when it went down on June 1 of 2011&lt;/a&gt; there are multiple users explaining how important it was to them:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;iheartmemcache&lt;/strong&gt;: This service is a major component of this community; as such, I&apos;ll host this on whatever metal you need. My contact information is in my profile. Ping me on G-talk and we can have this sorted out by the morning (if you&apos;re in PST).&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;bkrausz&lt;/strong&gt;: What kind of traffic does SearchYC get? Is a $40/mo Linode not sufficient? I would gladly pay that (or be content with some Google ads in the right bar). Hell, I&apos;d even maintain the site...it&apos;s a great service.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;g123g&lt;/strong&gt;: Hopfully you will be able to bring it back soon. SearchYC.com is the best way to search the treasure trove that HN has become.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Worth mentioning that HNSearch (mentioned further below) was up by this point in time. Judging by comments on the shutdown post it seems like traffic was somewhat split:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;swombat&lt;/strong&gt;: What&apos;s wrong with &lt;a href=&quot;http://www.hnsearch.com&quot;&gt;http://www.hnsearch.com&lt;/a&gt; ?&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;evangineer&lt;/strong&gt;: Just got zero hits on a search that I know there is at least one result for. Same search worked fine on searchyc.com a few days ago.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;h2&gt;Second Gen - Octopart/ThriftDB-powered HNSearch | 2011-2014&lt;/h2&gt;
&lt;p&gt;The official launch of HNSearch was posted by Paul Graham, founder of Y-Combinator itself (aka &lt;a href=&quot;https://news.ycombinator.com/user?id=pg&quot;&gt;pg&lt;/a&gt;), and Andres Morey, founder of Octopart (W07) (aka &lt;a href=&quot;https://news.ycombinator.com/user?id=andres&quot;&gt;andres&lt;/a&gt;), separately on June 4, 2011. Andres posted it as an &lt;a href=&quot;https://news.ycombinator.com/item?id=2619892&quot;&gt;API contest here&lt;/a&gt; to build the best thing on top of the HNSearch API where the winner would get a 27-inch Dell monitor. PG posted it as an official announcement on &lt;a href=&quot;https://news.ycombinator.com/item?id=2619736&quot;&gt;HN here&lt;/a&gt; which linked to &lt;a href=&quot;https://web.archive.org/web/20110618105517/http://ycombinator.com/newsnews.html&quot;&gt;ycombinator.com in a now only archive-available page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.trieve.ai/blog/history-of-hnsearch/hnsearch-api-contest.webp&quot; alt=&quot;octopart/thriftdb-search-screenshot&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Building search for HN has certainly been a trial for us and we felt validated seeing that &lt;a href=&quot;https://news.ycombinator.com/item?id=35959&quot;&gt;PG first mentioned the Octopart guys using ThriftDB to make this in 2007 4yrs before it released&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I think the best part of HNSearch was that third-party applications were built on top of it. It seems, judging by the &lt;a href=&quot;https://news.ycombinator.com/item?id=7404972&quot;&gt;HNSearch shutdown post&lt;/a&gt;, that it was well-loved by HN users and also well-replaced by Algolia.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;clamprecht&lt;/strong&gt;: Can someone outline the benefits of the new one over the old one? When I first tried the new one, the UI was severely lacking. I saw the fixed a few things, but I haven&apos;t evaluated it again.
I don&apos;t always use the HN search engine, but when I do, it&apos;s usually very helpful. I&apos;d hate to lose that.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;swah&lt;/strong&gt;I noticed the new one is much faster..&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;h2&gt;Third Gen - Algolia powered search | 2014-current&lt;/h2&gt;
&lt;p&gt;The first HN post I was able to find mentioning Algolia HN search was &lt;a href=&quot;https://news.ycombinator.com/item?id=7126301&quot;&gt;Ask HN: What do you think about our last HN Search update? on Jan 26, 2014&lt;/a&gt; posted by Julian Lemoine, founder/CTO of Algolia (W14) aka &lt;a href=&quot;https://news.ycombinator.com/user?id=jlemoine&quot;&gt;jlemoine&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.trieve.ai/blog/history-of-hnsearch/algolia-hn-search.webp&quot; alt=&quot;algolia-hn-search-screenshot&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Algolia&apos;s ability to get HN search up so quickly is really impressive. If you look at the &lt;a href=&quot;https://github.com/algolia/hn-search/commits/master/?after=e27760e09840a6fa3efc592649fceb89237e4c2f+1119&quot;&gt;Github repo&lt;/a&gt; it seems like they started in Sep 2013 and released in Jan 2014.&lt;/p&gt;
&lt;p&gt;We also took about 6 months to get everything up having &lt;a href=&quot;https://github.com/devflowinc/trieve-hn-discovery/commits/main/?after=0163ad22215a28a3492fc86f0d50e4a9bd338f3b+139&quot;&gt;started in Feb 2024 and releasing in Aug 2024&lt;/a&gt;. I can say now, with firsthand experience, that timeline is not easy to operate on. Especially given certain devtooling was less mature in 2013.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://news.ycombinator.com/item?id=7126301&quot;&gt;Algolia asked the community for feedback post-launch in 2014&lt;/a&gt; and implemented several improvements including &lt;a href=&quot;https://news.ycombinator.com/item?id=37881130&quot;&gt;additional filter types and improved indexing speed in late 2023&lt;/a&gt;. We think it&apos;s incredibly accurate for keyword search and has all the filters and options that we would want.&lt;/p&gt;
&lt;h2&gt;Honorable Mentions during the Algolia era&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://hackersearch.net/&quot;&gt;hackersearch.net&lt;/a&gt; by &lt;a href=&quot;https://news.ycombinator.com/user?id=jnnnthnn&quot;&gt;jnnnthnn&lt;/a&gt; posted May 2024 | semantic search engine using OpenAI embeddings&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://deephn.org/&quot;&gt;deephn.org&lt;/a&gt; by &lt;a href=&quot;https://news.ycombinator.com/user?id=wolfgarbe&quot;&gt;wolfgarbe&lt;/a&gt; posted April 13, 2021 | full-text search of both HN posts and linked webpages&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://hackernews.demo.vectara.com/&quot;&gt;hackernews.demo.vectara.com&lt;/a&gt; by &lt;a href=&quot;https://news.ycombinator.com/user?id=ofermend&quot;&gt;ofermend&lt;/a&gt; posted July 2024 | semantic search for past 6mths of data&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://searchhacker.news/&quot;&gt;searchhacker.news&lt;/a&gt; by &lt;a href=&quot;https://news.ycombinator.com/user?id=isoprophlex&quot;&gt;isoprophlex&lt;/a&gt; posted April 2024 | keyword search over discussions re-ranked by dense semantic vectors&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://hn.lixiasearch.com/&quot;&gt;hn.lixiasearch.com&lt;/a&gt; by &lt;a href=&quot;https://news.ycombinator.com/user?id=larose&quot;&gt;larose&lt;/a&gt; posted February 2024 | unknown data level and indexing strategy&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://orangewords.com/search&quot;&gt;orangewords.com&lt;/a&gt; by &lt;a href=&quot;https://news.ycombinator.com/submitted?id=cmcollier&quot;&gt;cmcollier&lt;/a&gt; (not posted to HN yet) | all of HN indexed in Vespa with RAG&lt;/li&gt;
&lt;/ul&gt;
</content:encoded><category>explainers</category><author>Nick Khami</author></item><item><title>How to Build and Use Agentic RAG</title><link>https://trieve.ai/blog/how-to-build-agentic-rag/</link><guid isPermaLink="true">https://trieve.ai/blog/how-to-build-agentic-rag/</guid><pubDate>Fri, 30 May 2025 16:39:00 GMT</pubDate><content:encoded>&lt;h2&gt;The Problem: RAG That Searches for Everything&lt;/h2&gt;
&lt;p&gt;Traditional RAG systems have a glaring issue: they search for &lt;em&gt;everything&lt;/em&gt;. User asks about pizza recipes? Search the knowledge base. Wants to know the weather? Search again. Having a casual conversation? Yep, another search.&lt;/p&gt;
&lt;p&gt;This shotgun approach leads to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Irrelevant context&lt;/strong&gt; cluttering the LLM&apos;s input&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Higher costs&lt;/strong&gt; from unnecessary searches and longer prompts&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Slower responses&lt;/strong&gt; due to constant database hits&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Confused AI&lt;/strong&gt; trying to make sense of unrelated documents&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;What if the AI could decide for itself when it actually needs to search?&lt;/p&gt;
&lt;h2&gt;Enter Agentic RAG: The LLM Calls the Shots&lt;/h2&gt;
&lt;p&gt;Agentic RAG flips the script. Instead of automatically searching on every query, we give the language model &lt;em&gt;tools&lt;/em&gt; it can choose to use. Think of it like handing someone a toolbox—they&apos;ll grab a hammer when they need to drive a nail, not when they&apos;re stirring soup.&lt;/p&gt;
&lt;p&gt;In our implementation, the LLM gets two main tools:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Search tool&lt;/strong&gt;: &quot;I need information about X&quot;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Chunk selection tool&lt;/strong&gt;: &quot;I&apos;ll use these specific documents&quot;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The magic happens when the AI decides it needs more information. Only then does it reach for the search tool. If you are only interested in the source code (Rust btw 😉), check it out on &lt;a href=&quot;https://github.com/devflowinc/trieve/blob/main/server/src/operators/message_operator.rs#L1812-L2720&quot;&gt;Github here&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Try It Yourself: Two API Calls to Agentic RAG&lt;/h2&gt;
&lt;p&gt;Want to add intelligent RAG to your application right now? You can use our agentic search system with just two simple API calls. No need to build anything from scratch.&lt;/p&gt;
&lt;h3&gt;Step 1: Create a Chat Topic&lt;/h3&gt;
&lt;p&gt;First, create a topic to hold your conversation. Refernece the full documentation at &lt;a href=&quot;https://docs.trieve.ai/api-reference/topic/create-topic#create-topic&quot;&gt;docs.trieve.ai/api-reference/topic/create-topic&lt;/a&gt; for more details.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl -X POST &quot;https://api.trieve.ai/api/topic&quot; \
  -H &quot;Authorization: Bearer YOUR_API_KEY&quot; \
  -H &quot;TR-Dataset: YOUR_DATASET_ID&quot; \
  -H &quot;Content-Type: application/json&quot; \
  -d &apos;{
    &quot;owner_id&quot;: &quot;user_123&quot;,
    &quot;name&quot;: &quot;My Chat Session&quot;
  }&apos;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This returns a topic with an &lt;code&gt;id&lt;/code&gt; that you&apos;ll use for the conversation.&lt;/p&gt;
&lt;h3&gt;Step 2: Send Messages with Agentic Search&lt;/h3&gt;
&lt;p&gt;Now send your message and let the AI decide when to search. To see the full documentation, visit &lt;a href=&quot;https://docs.trieve.ai/api-reference/message/create-message#create-message&quot;&gt;docs.trieve.ai/api-reference/message/create-message&lt;/a&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl -X POST &quot;https://api.trieve.ai/api/message&quot; \
  -H &quot;Authorization: Bearer YOUR_API_KEY&quot; \
  -H &quot;TR-Dataset: YOUR_DATASET_ID&quot; \
  -H &quot;Content-Type: application/json&quot; \
  -d &apos;{
    &quot;topic_id&quot;: &quot;TOPIC_ID_FROM_STEP_1&quot;,
    &quot;new_message_content&quot;: &quot;How do I configure authentication?&quot;,
    &quot;use_agentic_search&quot;: true
  }&apos;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That&apos;s it! The system will:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Analyze if your question needs a knowledge base search&lt;/li&gt;
&lt;li&gt;Search intelligently only when needed&lt;/li&gt;
&lt;li&gt;Stream back responses in real-time&lt;/li&gt;
&lt;li&gt;Show you exactly which documents were used&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The magic is in that &lt;code&gt;use_agentic_search: true&lt;/code&gt; parameter—it activates the intelligent search behavior described below.&lt;/p&gt;
&lt;h2&gt;How We Built It: One Smart API Route&lt;/h2&gt;
&lt;p&gt;Our entire agentic RAG system lives in a single function: &lt;code&gt;stream_response_with_agentic_search&lt;/code&gt;. It&apos;s surprisingly straightforward once you break it down.&lt;/p&gt;
&lt;h3&gt;Step 1: Setting Up the Conversation&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;// Simplified: Prepare messages for the LLM
let mut openai_messages: Vec&amp;lt;ChatMessage&amp;gt; = messages
    .iter()
    .map(|message| ChatMessage::from(message.clone()))
    .collect();

// Add the current user query
openai_messages.push(ChatMessage::User {
    content: ChatMessageContent::Text(user_message_query.clone()),
    name: None,
});
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Nothing fancy here—we take the conversation history and current user message, then format them for the LLM. If the user included images, those get added too.&lt;/p&gt;
&lt;h3&gt;Step 2: Creating the Toolbox&lt;/h3&gt;
&lt;p&gt;This is where it gets interesting. We define tools that the LLM can call:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;// Simplified: Define tools the LLM can use
let tools = vec![
    ChatCompletionTool {
        function: ChatCompletionFunction {
            name: &quot;search&quot;.to_string(),
            description: &quot;Search for relevant information in the knowledge base&quot;,
            parameters: json!({
                &quot;type&quot;: &quot;object&quot;,
                &quot;properties&quot;: {
                    &quot;query&quot;: {
                        &quot;type&quot;: &quot;string&quot;,
                        &quot;description&quot;: &quot;The search query to find relevant information&quot;
                    }
                }
            }),
        },
    },
    ChatCompletionTool {
        function: ChatCompletionFunction {
            name: &quot;chunks_used&quot;.to_string(),
            description: &quot;Tell the user which chunks you plan to use&quot;,
            parameters: json!({
                &quot;type&quot;: &quot;object&quot;,
                &quot;properties&quot;: {
                    &quot;chunks&quot;: {
                        &quot;type&quot;: &quot;array&quot;,
                        &quot;items&quot;: {&quot;type&quot;: &quot;string&quot;}
                    }
                }
            }),
        },
    }
];
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The search tool lets the AI query our knowledge base when it needs information. The chunks_used tool allows the AI to explicitly state which retrieved documents it&apos;s actually using—no more mystery about where answers come from.&lt;/p&gt;
&lt;h3&gt;Step 3: The Conversation Loop&lt;/h3&gt;
&lt;p&gt;Here&apos;s where the agent behavior emerges:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;// Simplified: Main conversation loop
loop {
    // Get response from LLM
    let response = client.chat().create(parameters.clone()).await?;

    // Add assistant message to conversation
    conversation_messages.push(response.message.clone());

    // Check if AI wants to use tools
    if let Some(tool_calls) = response.tool_calls {
        for tool_call in tool_calls {
            match tool_call.function.name.as_str() {
                &quot;search&quot; =&amp;gt; {
                    // AI decided it needs to search
                    let (results, formatted_results) = handle_search_tool_call(
                        tool_call,
                        dataset,
                        pool,
                        redis_pool,
                        dataset_config,
                        event_queue,
                    ).await?;

                    // Add search results back to conversation
                    conversation_messages.push(ChatMessage::Tool {
                        content: formatted_results,
                        tool_call_id: tool_call.id,
                    });

                    searched_chunks.extend(results);
                }
                &quot;chunks_used&quot; =&amp;gt; {
                    // AI specified which chunks it&apos;s using
                    let chunks_to_use: Vec&amp;lt;String&amp;gt; = parse_chunks_used(&amp;amp;tool_call)?;

                    // Filter to only keep specified chunks
                    searched_chunks.retain(|chunk| {
                        chunks_to_use.contains(&amp;amp;chunk.id.to_string())
                    });
                }
                _ =&amp;gt; {}
            }
        }
        // Continue conversation with tool results
        continue;
    } else {
        // No tool calls - we have the final response
        break;
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This loop is the heart of the agentic behavior. The LLM generates a response, and if it includes tool calls, we execute them and add the results back to the conversation. The loop continues until the LLM provides a final answer without requesting any tools.&lt;/p&gt;
&lt;h3&gt;Step 4: Real-Time Streaming&lt;/h3&gt;
&lt;p&gt;For the best user experience, we stream responses in real-time. This gets trickier because tool calls can interrupt the stream:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;// Simplified: Streaming with tool support
while let Some(response_chunk) = stream.next().await {
    match response_chunk {
        Ok(chunk) =&amp;gt; {
            // Stream content to user
            if let Some(content) = chunk.content {
                tx.send(content).await;
            }

            // Handle tool calls mid-stream
            if chunk.finish_reason == Some(&quot;tool_calls&quot;) {
                // AI wants to use tools - pause streaming
                tx.send(&quot;[Searching...]&quot;).await;

                // Execute tool calls
                handle_tool_calls(&amp;amp;chunk.tool_calls).await;

                // Resume streaming with updated context
                continue;
            }
        }
        Err(e) =&amp;gt; break,
    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The user sees responses appear in real-time, with helpful indicators like &quot;[Searching...]&quot; when the AI decides it needs more information.&lt;/p&gt;
&lt;h2&gt;What Makes This Actually Work&lt;/h2&gt;
&lt;h3&gt;Tool Choice Intelligence&lt;/h3&gt;
&lt;p&gt;The LLM learns when to search based on the conversation context. If someone asks &quot;What&apos;s the weather like?&quot;, it typically won&apos;t search a knowledge base about product documentation. But if they ask &quot;How do I configure authentication in your API?&quot;, it will search for relevant docs.&lt;/p&gt;
&lt;h3&gt;Transparency Through Chunk Selection&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;chunks_used&lt;/code&gt; tool solves a major RAG problem: knowing where answers come from. Instead of the system dumping 20 documents into context and hoping for the best, the AI explicitly states which documents it&apos;s using.&lt;/p&gt;
&lt;h3&gt;Streaming with Interruptions&lt;/h3&gt;
&lt;p&gt;Most agentic systems batch everything—ask, search, think, respond. Our streaming approach provides immediate feedback while still allowing the AI to search mid-conversation.&lt;/p&gt;
&lt;h2&gt;Real-World Performance&lt;/h2&gt;
&lt;p&gt;Since implementing this approach, we&apos;ve seen:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;60% reduction&lt;/strong&gt; in unnecessary searches&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;40% faster responses&lt;/strong&gt; on queries that don&apos;t need retrieval&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Higher accuracy&lt;/strong&gt; because context is more relevant when searches do happen&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Better user experience&lt;/strong&gt; with transparent document usage&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The system handles everything from simple greetings (no search needed) to complex technical questions (multiple targeted searches) seamlessly.&lt;/p&gt;
&lt;h2&gt;The Trade-offs We Made&lt;/h2&gt;
&lt;p&gt;Like any system, this approach has trade-offs:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Intelligent search decisions&lt;/li&gt;
&lt;li&gt;Transparent source attribution&lt;/li&gt;
&lt;li&gt;Real-time streaming responses&lt;/li&gt;
&lt;li&gt;Lower costs and latency&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Slightly more complex than traditional RAG&lt;/li&gt;
&lt;li&gt;Requires tool-capable LLMs&lt;/li&gt;
&lt;li&gt;Multiple round trips for complex queries&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;What&apos;s Next&lt;/h2&gt;
&lt;p&gt;We&apos;re continuously improving the system:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Smarter tool descriptions&lt;/strong&gt; that help the LLM make better decisions&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Multi-step reasoning&lt;/strong&gt; for complex queries requiring multiple searches&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Domain-specific tools&lt;/strong&gt; beyond just search&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Better chunk selection&lt;/strong&gt; with reasoning about why specific documents are relevant&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;The Bottom Line&lt;/h2&gt;
&lt;p&gt;Agentic RAG isn&apos;t about adding complexity—it&apos;s about adding intelligence. By letting the LLM decide when and what to search for, we&apos;ve built a system that&apos;s both more efficient and more effective than traditional approaches.&lt;/p&gt;
&lt;p&gt;The beauty is in the simplicity. One API route, a few well-defined tools, and suddenly your RAG system becomes an intelligent agent that knows when to look things up and when to just have a conversation.&lt;/p&gt;
&lt;p&gt;Want to see it in action? Check out our implementation at &lt;a href=&quot;https://github.com/devflowinc/trieve&quot;&gt;github.com/devflowinc/trieve&lt;/a&gt; or try it yourself at &lt;a href=&quot;https://docs.trieve.ai/api-reference/message/create-message#create-message&quot;&gt;docs.trieve.ai/api-reference/message/create-message&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Building something similar? We&apos;d love to hear about your approach and any challenges you&apos;ve faced. The future of RAG is agentic, and we&apos;re excited to see what the community builds.&lt;/em&gt;&lt;/p&gt;
</content:encoded><category>explainers</category><author>Nick Khami</author></item><item><title>The Future of Work: Commanding Armies of Parallel LLMs</title><link>https://trieve.ai/blog/massively-parallel-llm-function-calling-is-underrated/</link><guid isPermaLink="true">https://trieve.ai/blog/massively-parallel-llm-function-calling-is-underrated/</guid><pubDate>Thu, 22 May 2025 17:25:00 GMT</pubDate><content:encoded>&lt;p&gt;import ParallelClothesLabeling from &quot;@components/ParallelClothesLabeling&quot;;&lt;/p&gt;
&lt;h1&gt;Massive parallel LLM inference is &lt;em&gt;underrated&lt;/em&gt;.&lt;/h1&gt;
&lt;p&gt;If you use AI today, you&apos;re likely:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Asking ChatGPT or Claude to do something for you&lt;/li&gt;
&lt;li&gt;Navigating away to thumb on another task for &lt;em&gt;just long enough&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;Navigating back to judge its output to then either:&lt;br /&gt;
a. Accepting it, implementing the solution, and moving on&lt;br /&gt;
b. Rejecting it and trying to reason with the AI again, and again, and again...&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This is a waste of time! &lt;a href=&quot;https://x.com/willdepue/status/1923413964240666876&quot;&gt;The future of work looks like StarCraft or Age of Empires&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Imagine you have an allocation of 10,000 requests—rather than a single prompt—at 100,000 tokens per second. (For context, the average GPT interaction uses 50-150 tokens per turn.) You&apos;re directing this computational firepower to solve problems and create solutions, acting as the commander with agentic systems as your units. If the point of battle is to win, you need asymmetric advantages. But this isn&apos;t just about sheer firepower or brute force—though that&apos;s part of it. The real power lies in your ability to direct and manage these forces strategically to maximize their effectiveness.&lt;/p&gt;
&lt;p&gt;Like any real working environment, you don&apos;t spread capacity evenly across 200 different tasks. You provision agents to match the problem&apos;s nature. Some tasks stand alone, while others form clusters—interconnected webs where complexity and complication (two distinct beasts) hide cloaked in an ether of ethers.&lt;/p&gt;
&lt;p&gt;Yes, each LLM has &lt;strong&gt;some&lt;/strong&gt; &lt;a href=&quot;https://trieve.b-cdn.net/nassim.jpeg&quot;&gt;&lt;strong&gt;probability&lt;/strong&gt;&lt;/a&gt; of finding a solution. But you don&apos;t just want one correct solution per problem. You want enough to get multiple, then mix and match them into one extremely high quality solve. Think of it like &lt;a href=&quot;https://en.wikipedia.org/wiki/Monte_Carlo_method&quot;&gt;Monte Carlo sampling&lt;/a&gt;, you&apos;re hedging against randomness as you explore the solution space.&lt;/p&gt;
&lt;h2&gt;Data labeling is a fantastic base case example&lt;/h2&gt;
&lt;p&gt;As a familiar and persistent example, we built a demo app that categorizes images of clothing using parallel LLM calls. As a search company, being able to structure arbitrary datasets across different sources greatly improves our ability to index and manipulate information into awesome AI experiences for our customers&apos; customers.&lt;/p&gt;
&lt;p&gt;But here&apos;s what&apos;s actually cool: you could photograph all your belongings and have AI instantly sort which items are worth more than $50. With humans, you need dozens of people working in parallel to finish in reasonable time.&lt;/p&gt;
&lt;p&gt;Google spent years tricking &lt;a href=&quot;https://prosopo.io/blog/why-am-i-clicking-traffic-lights/#contribution-to-ai-training&quot;&gt;hundreds of millions of people into labeling data&lt;/a&gt; through CAPTCHAs. Now you can leverage the years of human effort that went into this and accomplish similar tasks for &lt;strong&gt;much less&lt;/strong&gt; in a few hours. As you might also assume, this will get faster and cheaper as LLMs improve.&lt;/p&gt;
&lt;p&gt;&amp;lt;ParallelClothesLabeling client:load /&amp;gt;&lt;/p&gt;
&lt;h2&gt;Parallel compute should be all of the time&lt;/h2&gt;
&lt;p&gt;We should be using parallel compute for everything AI. We should be supervisors, not spectators.&lt;/p&gt;
&lt;p&gt;Right now, as a software engineer, I open my IDE, prompt an agent to edit code, then watch it work. Strong WALL-E fitless human vibes; disengaged, passive. &lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://trieve.b-cdn.net/walle-fitless.jpg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;I don&apos;t want this. I want to be in flow state, hyper-engaged and &lt;em&gt;forward-leaning&lt;/em&gt;. I want to feel and produce like &lt;a href=&quot;https://youtu.be/Ht12otHMX_Q?si=M0FI0pHFxaPI1lLt&amp;amp;t=280&quot;&gt;Ender commanding an entire fleet&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/src/assets/images/blog-posts/massively-parallel-llm-function-calling-is-underrated/image.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;It should be trivial to provision multiple LLMs with high variance to tackle tasks in parallel. If AI can generate product photos, I want to pick 3 styles and deploy 6 agents, 2 per style-generating image sets—simultaneously. If something&apos;s going wrong, I can zoom in and intervene while they work. The feedback I give one agent can be immediately applied to the others, allowing them to adjust their outputs in real-time.&lt;/p&gt;
&lt;p&gt;This is similar to a strategy some employ on Fiverr when they &lt;a href=&quot;https://www.reddit.com/r/Fiverr/comments/1du7597/discussion_is_it_wrong_to_hire_two_people_on/&quot;&gt;hire multiple freelancers for the same project&lt;/a&gt; to judge the quality of their work and pick the best candidate.&lt;/p&gt;
&lt;h2&gt;New tools are needed&lt;/h2&gt;
&lt;p&gt;You can sort of replicate this with &lt;a href=&quot;https://sufiyanyasa.com/blog/how-to-use-git-worktree/&quot;&gt;git worktrees&lt;/a&gt; for programming, but it&apos;s clunky. We need new software designed for managing parallelized general intelligence.&lt;/p&gt;
&lt;p&gt;What we have now feels like a &lt;a href=&quot;https://koomen.dev/essays/horseless-carriages/&quot;&gt;horseless carriage&lt;/a&gt;, we&apos;re mimicking old patterns instead of embracing new possibilities. The future UX for AI isn&apos;t watching a single agent work. It&apos;s commanding hundreds simultaneously.&lt;/p&gt;
&lt;p&gt;Maybe these tools will be vertical-specific. Maybe they&apos;ll be general-purpose. Either way, I&apos;m excited to see what we build and get it in your hands.&lt;/p&gt;
</content:encoded></item><item><title>Streaming LLM assistant completions with the OpenAI API and Rust Actix-Web</title><link>https://trieve.ai/blog/open_ai_streaming/</link><guid isPermaLink="true">https://trieve.ai/blog/open_ai_streaming/</guid><pubDate>Thu, 08 Aug 2024 18:54:00 GMT</pubDate><content:encoded>&lt;p&gt;We were tired of using Javascript for our backend services when we started building Trieve. We wanted something better, something faster, something safer, something rusty. Our main motivation behind choosing to use rust was for the learning experience behind it.&lt;/p&gt;
&lt;h2&gt;Why Actix&lt;/h2&gt;
&lt;p&gt;When looking at what framework to use we had two choices actix_web or rocket. We chose to use actix and actix_web because we heard that it was faster than rocket from benchmarks.&lt;/p&gt;
&lt;h2&gt;Our Naive Solution&lt;/h2&gt;
&lt;h3&gt;Streaming data with Actix&lt;/h3&gt;
&lt;p&gt;The first thing we saw on how to stream data in actix_web was using tokio&apos;s built-in &lt;a href=&quot;https://docs.rs/tokio/latest/tokio/sync/mpsc/fn.channel.html#&quot;&gt;&lt;code&gt;mpsc_channels&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;pub type StreamItem = Result&amp;lt;Bytes, actix_web::Error&amp;gt;;

pub async fn stream_response(messages: Vec&amp;lt;Message&amp;gt;, pool: web::Data&amp;lt;Pool&amp;gt;)  {
    let (tx, rx) = mpsc::channel::&amp;lt;StreamItem&amp;gt;(1000);
    let receiver_stream: ReceiverStream&amp;lt;StreamItem&amp;gt; = ReceiverStream::new(rx);
    ...
    // Send data at some point
    let _ = tx.send(Ok(&quot;data&quot;.into())
    ...

    // Return Result&amp;lt;HttpResponse, actix_web::Error&amp;gt; to client
    Ok(HttpResponse::Ok().streaming(receiver_stream))
}
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Getting Completions from openAI&lt;/h3&gt;
&lt;p&gt;For streaming we used the &lt;a href=&quot;https://docs.rs/openai_dive/0.2.7/openai_dive/&quot;&gt;&lt;code&gt;openai_dive&lt;/code&gt;&lt;/a&gt; crate. Ensure you enable the &lt;code&gt;stream&lt;/code&gt; feature flag.&lt;/p&gt;
&lt;p&gt;According to their documentation, you can stream data like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;...
    let client = Client::new(&quot;sk-xxxxxxxxxxxxxxxx&quot;);

    let parameters = ChatCompletionParameters {
        messages: vec![
            ChatMessage {
                role: Role::User,
                content: &quot;Hello!&quot;.to_string(),
                name: None,
            },
            ...
        ],
        ...
    };
    let mut stream = client.chat().create_stream(parameters).await.unwrap();
    while let Some(response) = stream.next().await {
        match response {
            Ok(chat_response) =&amp;gt; chat_response.choices.iter().for_each(|choice| {
                if let Some(content) = choice.delta.content.as_ref() {
                    print!(&quot;{}&quot;, content);
                }
            }),
            Err(e) =&amp;gt; eprintln!(&quot;{}&quot;, e),
        }
    }
...

&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Bringing both together&lt;/h3&gt;
&lt;p&gt;Streaming data won&apos;t get saved to open AI, so as we iterate through the messages we should keep track of the full message so we can store it to the database when streaming finishes. Giving us this as our final function&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;pub async fn stream_response(
    messages: Vec&amp;lt;Message&amp;gt;,
    pool: web::Data&amp;lt;Pool&amp;gt;
) -&amp;gt; Result&amp;lt;HttpResponse, actix_web::Error&amp;gt; {

    let (tx, rx) = mpsc::channel::&amp;lt;StreamItem&amp;gt;(1000);
    let receiver_stream: ReceiverStream&amp;lt;StreamItm&amp;gt; = ReceiverStream::new(rx);

    let open_ai_messages: Vec&amp;lt;ChatMessage&amp;gt; = messages
        .iter()
        .map(|message| ChatMessage::from(message.clone()))
        .collect();
    let client = Client::new(&quot;sk-xxxxxxxxxxxxxxxx&quot;);

    let parameters = ChatCompletionParameters {
        messages: open_ai_messages,
        ...
    };

    let mut response_content = String::new();
    let mut completion_tokens = 0;
    let mut stream = client.chat().create_stream(parameters).await.unwrap();

    while let Some(response) = stream.next().await {
        match response {
            Ok(chat_response) =&amp;gt; {
                completion_tokens += 1;

                log::info!(&quot;Got chat completion: {:?}&quot;, chat_response);

                let chat_content = chat_response.choices[0].delta.content.clone();
                if chat_content.is_none() {
                    log::error!(&quot;Chat content is none&quot;);
                    continue;
                }
                let chat_content = chat_content.unwrap();

                let multi_use_chat_content = chat_content.clone();
                let _ = tx.send(Ok(chat_content.into())).await;
                response_content.push_str(multi_use_chat_content.clone().as_str());
            }
            Err(e) =&amp;gt; log::error!(&quot;Error getting chat completion: {}&quot;, e),
        }
    }

    let completion_message = Message::from_details(
        response_content,
        ...
    );

	// Lets hope this finishes in the background
    let _ = web::block(move || {
	create_message_query(completion_message, &amp;amp;pool)
    }).await?;

    Ok(HttpResponse::Ok().streaming(receiver_stream))
}

&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This has issues though, when we ran this, it returned to the client once the streaming context completed. Not in a streaming context.&lt;/p&gt;
&lt;p&gt;The big issue that we didn&apos;t see here is that calling await on the stream at
all will block the main thread from returning until it has fully resolved.&lt;/p&gt;
&lt;p&gt;This lead us down a rabbit hole of trying to put this while loop in a different
process. This proved difficult because we wanted to spawn an asynchronous
function on a separate thread, while Actix is much better suited for spawning
synchronous functions on separate threads.&lt;/p&gt;
&lt;p&gt;This had other larger issues because &lt;code&gt;mpsc::channel&lt;/code&gt; channels don&apos;t implement
the &lt;code&gt;Send&lt;/code&gt; trait that is vital for it to be sent across threads and frankly we didn&apos;t have the knowledge to implement the Send trait ourselves&lt;/p&gt;
&lt;h2&gt;What went wrong&lt;/h2&gt;
&lt;p&gt;Eventually we went back a few steps and looked at it a different way. Instead
of looking how to stream to the client first, then how to put the OpenAI messages into that stream.
We looked at how to push the OpenAI stream to the client. Then how to push that
data onto the server. Worse case-scenario, we have the client send back the
completed message to the server.&lt;/p&gt;
&lt;p&gt;Looking at &lt;code&gt;tokio_stream&lt;/code&gt; I found the &lt;code&gt;StreamExt&lt;/code&gt; trait which allows you to map
a stream as if it was an iterator. This ended up giving us a much more concise
function that looked very aesthetic (if you ask me).&lt;/p&gt;
&lt;h3&gt;How to properly stream&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;pub async fn stream_response(
    messages: Vec&amp;lt;models::Message&amp;gt;,
    pool: web::Data&amp;lt;Pool&amp;gt;
) -&amp;gt; Result&amp;lt;HttpResponse, actix_web::Error&amp;gt; {

    let open_ai_messages: Vec&amp;lt;ChatMessage&amp;gt; = messages
        .iter()
        .map(|message| ChatMessage::from(message.clone()))
        .collect();

    let client = Client::new(&quot;sk-xxxxxxxxxxxxxxxx&quot;);

    let parameters = ChatCompletionParameters {
        messages: open_ai_messages,
	    ...
    };

    let stream = client.chat().create_stream(parameters).await.unwrap();

    Ok(HttpResponse::Ok().streaming(
        stream.map(|response| -&amp;gt; Result&amp;lt;Bytes, actix_web::Error&amp;gt; {
            if let Ok(response) = response {
                let chat_content = response.choices[0].delta.content.clone();
                return Ok(Bytes::from(chat_content.unwrap_or(&quot;&quot;.to_string())));
            }
            log::error!(&quot;Something bad happened&quot;);
            Err(ServiceError::InternalServerError.into())
        })
    ))
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Great, now its streaming, but we removed core functionality that pushed the completed message to the database.
We need to both process data while sending it to the client. So... we went back to channels.&lt;/p&gt;
&lt;h3&gt;Actix Arbiter&lt;/h3&gt;
&lt;p&gt;In order for this to truly work how we want, we need to bring in another actix primitive, &lt;a href=&quot;https://actix.rs/docs/actix/arbiter/&quot;&gt;`Arbiters&lt;/a&gt;. Arbiters spawn a process on a different thread, on this separate thread we
will need to get the completion messages via a channel and write to the database once the full response is received.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Arbiter::new().spawn(move {
    // Do stuff on other thread
});
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;mpsc_channels can&apos;t be sent across threads, we instead used a different crate &lt;a href=&quot;https://docs.rs/crossbeam-channel/latest/crossbeam_channel/&quot;&gt;&lt;code&gt;crossbeam-channel&lt;/code&gt;&lt;/a&gt;.
This has channels that can be sent across threads.&lt;/p&gt;
&lt;p&gt;This time when we looked at channels, we had the main thread transmit data over to the child thread. This was easier to do while mapping
the stream before data gets sent down to the client.&lt;/p&gt;
&lt;h3&gt;Proper solution&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;pub async fn stream_response(
    messages: Vec&amp;lt;models::Message&amp;gt;,
    pool: web::Data&amp;lt;Pool&amp;gt;,
) -&amp;gt; Result&amp;lt;HttpResponse, actix_web::Error&amp;gt; {

    let open_ai_messages: Vec&amp;lt;ChatMessage&amp;gt; = messages
        .iter()
        .map(|message| ChatMessage::from(message.clone()))
        .collect();

    let open_ai_api_key = std::env::var(&quot;OPENAI_API_KEY&quot;)
							.expect(&quot;OPENAI_API_KEY must be set&quot;);

    let client = Client::new(open_ai_api_key);

    let parameters = ChatCompletionParameters {
        messages: open_ai_messages,
	    ...
    };

    let (sending_chan, receiving_chan) = crossbeam_channel::unbounded::&amp;lt;String&amp;gt;();
    let stream = client.chat().create_stream(parameters).await.unwrap();

    Arbiter::new().spawn(async move {
        let chunk_v: Vec&amp;lt;String&amp;gt; = receiving_chan.iter().collect();
        let completion = chunk_v.join(&quot;&quot;);

        let new_message = models::Message::from_details(
            completion,
            ...
        );

		// Write to database since we are in an arbiter no need to block
        let _ = create_message_query(new_message, &amp;amp;pool);
    });

    Ok(HttpResponse::Ok().streaming(stream.map(
        move |response| -&amp;gt; Result&amp;lt;Bytes, actix_web::Error&amp;gt; {
            if let Ok(response) = response {
                let chat_content = response.choices[0].delta.content.clone();
                if let Some(message) = chat_content.clone() {
				    // Send data to arbiter
                    sending_chan.send(message).unwrap();
                }
                return Ok(Bytes::from(chat_content.unwrap_or(&quot;&quot;.to_string())));
            }
            Err(ServiceError::InternalServerError.into())
        },
    )))
}
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;There are 2 major learnings we got from this experience.&lt;/p&gt;
&lt;p&gt;The first is that in Rust its best to think of a solution that uses functional
programming first. Then move on to a more imperative solution.&lt;/p&gt;
&lt;p&gt;The second is that Rust still has very few resources on what is best to
use. We had the idea to use &lt;code&gt;crossbeam_channel&lt;/code&gt; from a random forum comment
&lt;a href=&quot;https://users.rust-lang.org/t/rusts-sender-receiver-is-forced-to-be-sync-send/61211/2&quot;&gt;here&lt;/a&gt;
that was 2 years old. We were very skeptical of even using it because it felt
like a random third party hack that might not have worked with tokio&apos;s runtime.
We only chose it out of desperation. That&apos;s the main motivator behind this blog
post.&lt;/p&gt;
&lt;p&gt;The current code has a few edits to it, but is essentially the same, you can navigate to it on the link below.&lt;/p&gt;
&lt;p&gt;Good Luck Rustaceans and Happy Hacking!&lt;/p&gt;
&lt;p&gt;GITHUB: &lt;a href=&quot;https://github.com/devflowinc/trieve&quot;&gt;https://github.com/devflowinc/trieve&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Specific File: &lt;a href=&quot;https://github.com/devflowinc/trieve/blob/main/server/src/handlers/message_handler.rs&quot;&gt;Here&lt;/a&gt;&lt;/p&gt;
</content:encoded><category>explainers</category><category>tutorials</category><author>Denzell Ford</author></item><item><title>PGVector&apos;s Missing Features</title><link>https://trieve.ai/blog/pgvector-missing-features/</link><guid isPermaLink="true">https://trieve.ai/blog/pgvector-missing-features/</guid><content:encoded>&lt;h2&gt;Introducing Trieve Vector Inference&lt;/h2&gt;
&lt;p&gt;Trieve Vector Inference (TVI), our solution for fast, unmetered embedding vector inference in your own cloud or on your own hardware, is now generally available as a standalone product!&lt;/p&gt;
&lt;p&gt;Building AI features at scale exposes two critical limitations of cloud embedding APIs: high latency and rate limits. Modern AI applications require better infrastructure.&lt;/p&gt;
&lt;p&gt;The platform supports any embedding model, whether it’s your own custom model, a private model, or popular open-source options. You get the flexibility to choose the right model for your use case while maintaining complete control over your infrastructure.&lt;/p&gt;
&lt;p&gt;We put together TVI to eliminate these bottlenecks for our own core product. It’s served billions of queries across billions of documents. After requests from others, we’ve sanded it down, wrote up some docs, and are now making it available for all. You can even get it on &lt;a href=&quot;https://aws.amazon.com/marketplace/pp/prodview-kxk2t4nafpmn4?sr=0-1&amp;amp;ref_=beagle&amp;amp;applicationId=AWSMPContessa&quot;&gt;AWS Marketplace&lt;/a&gt;!&lt;/p&gt;
&lt;h2&gt;So, just how good is TVI?&lt;/h2&gt;
&lt;p&gt;To start, here are our benchmarks:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/src/assets/images/blog-posts/pgvector-missing-features/tvi-benchmarks-docs.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;At 1000 RPS, your P99 latency with cloud-based embeddings will range between 23.59 to 27.03 seconds. With Trieve, we’re still measuring in milliseconds. It’s literally an order of magnitude faster.&lt;/p&gt;
&lt;p&gt;Additionally, check out the failed requests. Notice that we ran 30,000 with no failures. Without TVI, you’re only making ~7,000. If you’re going through Sagemaker, it’s around ~3,000.&lt;/p&gt;
&lt;h2&gt;TVI is a simple solution that solves two problems&lt;/h2&gt;
&lt;h3&gt;Rate limits&lt;/h3&gt;
&lt;p&gt;Rate limits force you to implement complex batching and queueing. Now you’re allocating development time on workarounds instead of building core product. This crucial part of your pipeline should not feel like an exercise. We’ve seen teams build elaborate workarounds:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Multiple API keys rotated on a schedule&lt;/li&gt;
&lt;li&gt;Distributed rate limit tracking across microservices&lt;/li&gt;
&lt;li&gt;Complex retry logic with exponential backoff&lt;/li&gt;
&lt;li&gt;Request queuing systems that rival Kafka in complexity&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;One customer had an entire Kubernetes cluster dedicated to managing their embedding pipeline - not because of compute needs, but just to handle rate limit coordination across their services. Another built a “request budget” system that required teams to reserve embedding capacity days in advance.&lt;/p&gt;
&lt;p&gt;None of this complexity adds value to your product. It’s pure overhead, stealing engineering time from features that actually matter to your users.&lt;/p&gt;
&lt;h3&gt;High Latency&lt;/h3&gt;
&lt;p&gt;High latency robs your magic. When every embedding request takes 300ms+ round trip, your real-time features aren’t really real-time anymore.&lt;/p&gt;
&lt;p&gt;The cascading effects are brutal:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Search results that lag behind user typing&lt;/li&gt;
&lt;li&gt;Recommendations that feel disconnected from user actions&lt;/li&gt;
&lt;li&gt;Chat experiences with noticeable “thinking” delays&lt;/li&gt;
&lt;li&gt;Batch processes that take hours instead of minutes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Add in retries for rate limits and your “real-time” feature is now consistently 1+ second behind.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/src/assets/images/blog-posts/pgvector-missing-features/sama-working-on-low-latency-embeddings.png&quot; alt=&quot;Sama working on low latency embeddings&quot; /&gt;&lt;/p&gt;
&lt;h3&gt;We hope you like the TVI DX&lt;/h3&gt;
&lt;p&gt;Most AI dev tools feel like they are at home in Jupyter notebooks or made for quick prototyping. The platforms hyperscalers and others built to onboard the masses onto AI were not designed to handle large scale.&lt;/p&gt;
&lt;p&gt;Most production-grade software seems to fall into two buckets. It’s either 1) expensive and requires intense engagement with a vendor, and is limited in scope or 2) painful to self-host, let alone begin to productionize. We’ve talked to some folk who spent weeks building and tweaking their own servers. Others who went the enterprise route and now have to maintain annoying shadow infrastructure. Pain!&lt;/p&gt;
&lt;p&gt;Trieve is a dev-first company. This means all engineers. DevOps, frontend, backend, everyone. We really think TVI is one of those rare solutions that works for tinkerers and giants alike. General support is available over 12 hours per day over email, Discord, Slack, and our office line.&lt;/p&gt;
&lt;h2&gt;Getting Started&lt;/h2&gt;
&lt;p&gt;You have two options for getting started with TVI. You can 1) buy a license from us and self-host it or 2) deploy it via AWS through AWS Marketplace. Self-hosting gives you maximum control, while AWS Marketplace provides the fastest path to production and still a ton of control.&lt;/p&gt;
&lt;p&gt;Our deployment process is streamlined for both paths:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;AWS: Deployment via Helm through Marketplace&lt;/li&gt;
&lt;li&gt;Self-hosted: Docker images and clear documentation for any environment&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We back TVI with a 15-day integration guarantee - if you can’t get it running in your environment, we’ll refund your fees. We’ve had teams go from zero to production in under an hour.&lt;/p&gt;
&lt;h2&gt;The Future of Vector Inference&lt;/h2&gt;
&lt;p&gt;We’re committed to making vector inference more accessible, faster, and more cost-effective for teams building AI features. Our roadmap includes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Enhanced monitoring and observability (custom Grafana dashboards coming soon)&lt;/li&gt;
&lt;li&gt;Support for more cloud providers (GCP and Azure coming soon)&lt;/li&gt;
&lt;li&gt;Additional model optimizations (including quantization and pruning)&lt;/li&gt;
&lt;li&gt;Advanced scaling features (automatic horizontal scaling based on load)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Plus, we’re working on some exciting features we can’t talk about yet. If you think today’s latency numbers are good, stay tuned!&lt;/p&gt;
</content:encoded><category>tutorials</category><category>announcements</category></item><item><title>Success Story: BillTrack50</title><link>https://trieve.ai/blog/success-story-billtrack50/</link><guid isPermaLink="true">https://trieve.ai/blog/success-story-billtrack50/</guid><pubDate>Tue, 03 Sep 2024 19:22:00 GMT</pubDate><content:encoded>&lt;h2&gt;About BillTrack50&lt;/h2&gt;
&lt;p&gt;BillTrack50 is the search engine for democracy. The platform is the OS for the nation’s non-profits, advocacy groups, and corporate government affairs departments. A competitive space, the BillTrack50 development team works hard to unlock advantages and leverage for their customers, at scale.&lt;/p&gt;
&lt;p&gt;BillTrack50’s values are evident in their free Citizen tier, where anyone can empower themselves with historical and current legislation to direct and inform their action in our democracy. After reaching out and connecting on a desire to use AI to make bill discovery and tracking 10x easier, they engaged with Trieve to make it happen.&lt;/p&gt;
&lt;h2&gt;AI Goals&lt;/h2&gt;
&lt;p&gt;![Similar Bills in Action](https://cdn.trieve.ai/BT50%20AI_720.gif)&lt;/p&gt;
&lt;p&gt;One of the core experiences to the BillTrack50 platform is to discover and track bills relevant to their interests. With tens of thousands of new bills to discover every legislative cycle, organizations were spendings countless hours per year manually querying for bills related to theirs. Not to mention the anxiety of being uncertain one has discovered all of the relevant bills and that one has not slipped past their filters and keywords.&lt;/p&gt;
&lt;p&gt;Simply put, there are ways to write legislation about &apos;gun control&apos; or &apos;abortion&apos; without the bill actually containing these words. Vectors can be very powerful here, when done correctly. The goal was to create an &apos;easy button&apos; for users to find bills similar to a given piece of legislation with a single click.&lt;/p&gt;
&lt;p&gt;With most of the support being done over email, it took the team one sprint to build, test, evaluate the Trieve-based solution against others, and deploy it to a smaller group of customers. After a successful trial period, it was immediately well-received by the greater userbase.\&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Trieve has been more than a vendor to us; they&apos;ve been a true partner in helping us realize our AI aspirations. From the outset, their team demonstrated exceptional patience and expertise, guiding us through every step of the process. Their timely support and advice, rooted in deep experience and knowledge, was crucial to our success. We were actually able to roll out our new AI-driven features well ahead of schedule.  For any company looking to add AI features into their product, Trieve is the go-to expert you can trust.&lt;br /&gt;
\&lt;/p&gt;
&lt;/blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Karen Suhaka - Chief Catalyst&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Results&lt;/h2&gt;
&lt;p&gt;We are grateful to the BillTrack50 team and their users for informing critical development and hitting our service with frequency. Trieve is proud to work with such a cornerstone company and excited to see what they build next.&lt;/p&gt;
&lt;p&gt;Search, recommendations, and RAG share fundamental mechanics. Trieve is the API and the team to get your features built, shipped, and deployed at scale for intense usage. If this is you, book a call &lt;a href=&quot;https://cal.com/nick.k/meet?duration=30&amp;amp;date=2024-09-03&amp;amp;month=2024-09&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
</content:encoded><category>reviews</category><author>Federico Chávez Torres</author></item><item><title>Success Story: Mintlify&apos;</title><link>https://trieve.ai/blog/success-story-mintlify/</link><guid isPermaLink="true">https://trieve.ai/blog/success-story-mintlify/</guid><pubDate>Thu, 05 Sep 2024 19:25:00 GMT</pubDate><content:encoded>&lt;h2&gt;About Mintlify&lt;/h2&gt;
&lt;p&gt;Mintlify is the modern standard for documentation. With its developer-forward product, they make it easy to build beautiful documentation whether you&apos;re a large enterprise or just getting started.&lt;/p&gt;
&lt;p&gt;Trieve itself is a proud Mintlify customer: checkout the &lt;a href=&quot;https://docs.trieve.ai&quot;&gt;docs&lt;/a&gt; here.&lt;/p&gt;
&lt;p&gt;After reaching out and discussing their technical requirements and desired outcomes for search and RAG, we engaged to support the team&apos;s build.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.trieve.ai/hahnbee-shoutout.png&quot; alt=&quot;Mintlify Co-Founder, Hahnbee Lee, shouts us out!&quot; /&gt;&lt;/p&gt;
&lt;h2&gt;Building, Evaluating, and Implementing&lt;/h2&gt;
&lt;p&gt;We first built a &lt;a href=&quot;https://mintlify.trieve.ai&quot;&gt;proof-of-concept&lt;/a&gt; for the Mintlify team to evaluate the out-of-the-box search quality of Trieve. After drawing out an implementation and proof schedule, it took the team one sprint to build, test, evaluate the Trieve-based solution against others, and deploy it to a smaller group of customers.&lt;/p&gt;
&lt;p&gt;After a successful testing period, large-scale search improvements &lt;a href=&quot;https://mintlify.com/blog/launch-week-3-day-2&quot;&gt;launched&lt;/a&gt; as part of Mintlify&apos;s Summer launch week.&lt;/p&gt;
&lt;p&gt;Because Trieve is an all-in-one solution, control over chunking, highlighting, and analytics give the team extra levers and knobs to ensure customer satisfaction. Since then, Trieve-powered RAG has been rolled out as well.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://cdn.trieve.ai/success-story-mintlify-suggested-quere-480.gif&quot; alt=&quot;Trieve RAG in action&quot; /&gt;&lt;/p&gt;
&lt;h2&gt;Acknowledgements&lt;/h2&gt;
&lt;p&gt;We are grateful to the Mintlify team for informing critical development and hitting our service with intense frequency.&lt;/p&gt;
&lt;p&gt;Search, recommendations, and RAG share fundamental mechanics. Trieve is the API and the team to get your features built, shipped, and deployed at scale. If this is you, book a call &lt;a href=&quot;https://cal.com/nick.k/meet&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
</content:encoded><category>reviews</category><author>Federico Chávez Torres</author></item><item><title>Trieve Secures $3.5M to Power AI Search and RAG</title><link>https://trieve.ai/blog/trieve-fundraise-announcement/</link><guid isPermaLink="true">https://trieve.ai/blog/trieve-fundraise-announcement/</guid><pubDate>Wed, 11 Sep 2024 12:34:00 GMT</pubDate><content:encoded>&lt;p&gt;SAN FRANCISCO, CA (September 11, 2024) - We at Trieve are thrilled to announce our oversubscribed $3.5M round led by Root Ventures! Our backers include Y Combinator, Soma Capital, Kulveer Taggar, Transpose Platform, and a distinguished group of strategic angel investors such as JJ Fliegelman, Richard Aberman, Rajiv Ayyangar, Jenny Fleiss, and Rohan Das.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://github.com/skeptrunedev&quot;&gt;Myself&lt;/a&gt; and &lt;a href=&quot;https://github.com/cdxker&quot;&gt;Denzell Ford&lt;/a&gt; founded Trieve because we wanted infrastructure purpose-built for AI. Trieve makes it easy for developers building AI applications to determine which retrieval techniques best fit their problem and productionize with confidence.&lt;/p&gt;
&lt;p&gt;We are looking to power every industry building with AI. New generations of discovery technology are revolutionizing user experiences and we are on the frontier.&lt;/p&gt;
&lt;p&gt;This fresh injection of capital will be directed towards bolstering our sales strategy and strengthening customer success outcomes across e-commerce, ERP, social media platforms, and beyond. With current customers like Mintlify, BillTrack50, and AmLaw100 Firms, over 16,000 search bars are powered by Trieve.&lt;/p&gt;
&lt;p&gt;Lee Edwards, our lead partner from Root Ventures, and us share a frustration and desire to enable quality retrieval across the internet: &quot;Everyone thinks in-app search sucks, and that&apos;s because we are used to the intelligent, AI-based search that has improved over time from search engines like Google. But inside apps, even the large incumbents aren&apos;t doing much more than text matching. RAG and other AI-based techniques deliver the kind of user experience that can actually help products see lift from their search features, and Trieve makes these advanced techniques easy for any engineering team to implement.&quot;&lt;/p&gt;
&lt;p&gt;I am grateful to our partners, team, community, and customers. We will continue to build the best-possible retrieval infrastructure with hacker spirit! Star us at &lt;a href=&quot;https://github.com/devflowinc/trieve&quot;&gt;github.com/devflowinc/trieve&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;About Trieve:&lt;/p&gt;
&lt;p&gt;Incubated in 2023 through the &lt;a href=&quot;https://futo.org/&quot;&gt;FUTO&lt;/a&gt; Fellows Program and accelerated by &lt;a href=&quot;https://ycombinator.com&quot;&gt;Y Combinator&lt;/a&gt; (W24), Trieve is powering search, recommendations, and AI-driven features for businesses across diverse sectors, including e-commerce, ERP, and social media platforms. If you want to chat about building AI features into your product, book a meeting with us &lt;a href=&quot;https://cal.com/nick.k&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For media inquiries, please contact: &lt;a href=&quot;mailto:media@trieve.ai&quot;&gt;media@trieve.ai&lt;/a&gt;.&lt;/p&gt;
</content:encoded><category>announcements</category><author>Nick Khami</author></item><item><title>We are thrilled to announce that trieve is being acquired by Mintlify! 🎉</title><link>https://trieve.ai/blog/trieve-is-being-acquired-by-mintlify/</link><guid isPermaLink="true">https://trieve.ai/blog/trieve-is-being-acquired-by-mintlify/</guid><pubDate>Wed, 16 Jul 2025 21:28:00 GMT</pubDate><content:encoded>&lt;h1&gt;We are thrilled to announce that Trieve is being acquired by Mintlify! 🎉&lt;/h1&gt;
&lt;p&gt;We started Trieve with a mission: to make it easy for developers to build AI applications that can retrieve and reason over large amounts of data. Our inspiration came from the challenges we faced in building relevancy-optimized retrieval for our first product, a tool for litigation attorneys.&lt;/p&gt;
&lt;p&gt;We are incredibly proud of what we&apos;ve accomplished. The numbers speak for themselves: over 150M search queries, 2.6M AI conversations, and a community of over 5,400 active users.&lt;/p&gt;
&lt;p&gt;Mintlify was one of Trieve&apos;s earliest adopters, and we have always been impressed with the quality of their documentation. Their search experience, powered by our technology, set a standard for excellence.&lt;/p&gt;
&lt;p&gt;Mintlify&apos;s vision extends our own. They are dedicated to empowering builders, and we are excited to contribute to that mission. Denzell and I will be joining the Mintlify team to continue creating powerful tools for developers.&lt;/p&gt;
&lt;h2&gt;What does this mean for existing Trieve users?&lt;/h2&gt;
&lt;p&gt;The Trieve Cloud service will be sunset on November 1st, 2025. We are committed to supporting our users through this transition and will assist them in migrating to other platforms or to a self-hosted version of Trieve. All data will be exportable via the API, and we will be available to assist throughout the migration process.&lt;/p&gt;
&lt;p&gt;The open-source Trieve infrastructure will continue to be available and maintained. We will carry on managing the codebase and welcome contributions from the community.&lt;/p&gt;
&lt;p&gt;For any questions or assistance, please join us on &lt;a href=&quot;https://discord.gg/a9CGxRuwGv&quot;&gt;Discord&lt;/a&gt; or email us at &lt;a href=&quot;mailto:support@trieve.com&quot;&gt;support@trieve.com&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Open Source Considerations&lt;/h2&gt;
&lt;p&gt;From the outset, we were committed to ensuring that Trieve could operate without reliance on third-party services. Our goal was to create a sustainable open-source project that would remain accessible and usable, independent of the commercial entity behind it.&lt;/p&gt;
&lt;p&gt;While Trieve was a commercial endeavor, we chose the Business Source License (BSL) to protect our users from sudden, restrictive license changes. As part of this acquisition, we are moving to the MIT license to fully embrace the open-source model, ensuring the code can be used by anyone for any purpose.&lt;/p&gt;
&lt;p&gt;The repository is available at &lt;a href=&quot;https://github.com/devflowinc/trieve&quot;&gt;github.com/devflowinc/trieve&lt;/a&gt;, and we encourage contributions and the involvement of new maintainers from the community.&lt;/p&gt;
&lt;h2&gt;Looking Forward&lt;/h2&gt;
&lt;p&gt;We are excited about the future and the opportunity to build great things at Mintlify. Please do not hesitate to reach out with questions or for assistance with the migration. Our priority is to ensure a smooth transition for all users.&lt;/p&gt;
&lt;p&gt;Mintlify is hiring! If you are interested in joining us, please visit their &lt;a href=&quot;https://mintlify.com/careers&quot;&gt;careers page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;So long, and thanks for all the searches! 🚀&lt;/p&gt;
&lt;p&gt;- The Trieve Team&lt;/p&gt;
</content:encoded><category>announcements</category><category>news</category><author>Nicholas Khami</author></item><item><title>Guide for Self-Hosting Trieve on a VPS</title><link>https://trieve.ai/blog/trieve-self-hosting-on-vps/</link><guid isPermaLink="true">https://trieve.ai/blog/trieve-self-hosting-on-vps/</guid><pubDate>Thu, 12 Sep 2024 11:53:00 GMT</pubDate><content:encoded>&lt;h2&gt;1. Introduction&lt;/h2&gt;
&lt;p&gt;This guide provides comprehensive instructions for self-hosting Trieve. By following these steps, you&apos;ll be able to set up your own Trieve instance on a Hetzner Cloud server.&lt;/p&gt;
&lt;p&gt;&amp;lt;Warning&amp;gt;
Due to this guide being CPU-only, semantic search and ingest will be SLOW, 2+s for semantic search and ~10 chunks/s on ingest. Fulltext SPLADE and bm25 search types will remain fast, but be aware that running the embedding servers on GPU&apos;s is required for a more latency sensitive setup. See our &lt;a href=&quot;https://docs.trieve.ai/self-hosting/aws#3-create-values-yaml&quot;&gt;AWS&lt;/a&gt; or &lt;a href=&quot;https://docs.trieve.ai/self-hosting/gcp&quot;&gt;GCP&lt;/a&gt; guides for more information or contact us at &lt;a href=&quot;mailto:humans@trieve.ai&quot;&gt;humans@trieve.ai&lt;/a&gt; for more information.
&amp;lt;/Warning&amp;gt;&lt;/p&gt;
&lt;h2&gt;2. Creating the Server&lt;/h2&gt;
&lt;h3&gt;2.1 Prerequisites&lt;/h3&gt;
&lt;p&gt;Before beginning the server setup, ensure you have:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A domain name with access to its DNS configuration&lt;/li&gt;
&lt;li&gt;A Hetzner Cloud account&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;2.2 Server Setup Steps&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;Log in to the Hetzner Cloud Console at &lt;a href=&quot;https://console.hetzner.cloud&quot;&gt;https://console.hetzner.cloud&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Create a new project for your Trieve instance.&lt;/li&gt;
&lt;li&gt;Create a public IP address:
&lt;ul&gt;
&lt;li&gt;Choose the same location as your intended server location.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;![create primary hetzner IP](https://cdn.trieve.ai/blog/self-hosting-guide/hetzner-2.2-image.webp)&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Configure your domain:
&lt;ul&gt;
&lt;li&gt;Add an A record pointing to the public IP address you just created.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;![DNS record setup](https://cdn.trieve.ai/blog/self-hosting-guide/self-hosting-dns-records.webp)&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Host&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;IP&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;api&lt;/td&gt;
&lt;td&gt;A&lt;/td&gt;
&lt;td&gt;HETZNER-PUBLIC-IP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;auth&lt;/td&gt;
&lt;td&gt;A&lt;/td&gt;
&lt;td&gt;HETZNER-PUBLIC-IP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;dashboard&lt;/td&gt;
&lt;td&gt;A&lt;/td&gt;
&lt;td&gt;HETZNER-PUBLIC-IP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;chat&lt;/td&gt;
&lt;td&gt;A&lt;/td&gt;
&lt;td&gt;HETZNER-PUBLIC-IP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;search&lt;/td&gt;
&lt;td&gt;A&lt;/td&gt;
&lt;td&gt;HETZNER-PUBLIC-IP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;analytics&lt;/td&gt;
&lt;td&gt;A&lt;/td&gt;
&lt;td&gt;HETZNER-PUBLIC-IP&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;ol&gt;
&lt;li&gt;Add an SSH key to your Hetzner account:
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;To view your existing public key in terminal on your local machine, use:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;cat ~/.ssh/id_ed25519.pub
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If you don&apos;t have an SSH key, generate one with:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;ssh-keygen -t ed25519
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;![ssh keygen screenshot](https://cdn.trieve.ai/blog/self-hosting-guide/ssh-keygen.webp)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Go to the &quot;Security&quot; tab in Hetzner Cloud website.&lt;/li&gt;
&lt;li&gt;Click on &quot;Add SSH key&quot; and paste your public key.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;![hetzner adding ssh key](https://cdn.trieve.ai/blog/self-hosting-guide/hetzner-ssh-key.webp)&lt;/p&gt;
&lt;p&gt;For detailed instructions, refer to Hetzner&apos;s community guides for Linux and Windows:&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://community.hetzner.com/tutorials/howto-ssh-key&quot;&gt;Linux: Setting up an SSH key&lt;/a&gt;
&lt;a href=&quot;https://community.hetzner.com/tutorials/how-to-generate-ssh-key-putty&quot;&gt;Windows: Generate SSH key using PuTTYgen&lt;/a&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Create a new network:
&lt;ul&gt;
&lt;li&gt;Go to the &quot;Networks&quot; tab in Hetzner Cloud website.&lt;/li&gt;
&lt;li&gt;Add a new network with the IP range &lt;code&gt;192.168.1.0/24&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;![hetzner create network page](https://cdn.trieve.ai/blog/self-hosting-guide/hetzner-network-setup.webp)&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Create a new server:
&lt;ul&gt;
&lt;li&gt;Go to the &quot;Servers&quot; tab and click &quot;Add Server&quot;.&lt;/li&gt;
&lt;li&gt;Choose the same location as your public IP address.&lt;/li&gt;
&lt;li&gt;Select Ubuntu 24.04 as the operating system.&lt;/li&gt;
&lt;li&gt;Choose a server size (minimum 8vCPU/16GB-RAM, recommended 8vCPU/32GB-RAM).&lt;/li&gt;
&lt;li&gt;In networking settings:
&lt;ul&gt;
&lt;li&gt;Select your public IP.&lt;/li&gt;
&lt;li&gt;Check &quot;Private networks&quot; and select the network you created.&lt;/li&gt;
&lt;li&gt;Uncheck public IPv6.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Select your SSH key.&lt;/li&gt;
&lt;li&gt;In the Cloud config section, paste the provided configuration.&lt;/li&gt;
&lt;li&gt;Replace &lt;code&gt;{PASTE HERE YOUR SSH KEY}&lt;/code&gt; with your actual SSH public key.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;pre&gt;&lt;code&gt;#cloud-config
users:
  - name: trieve
    groups: users, admin, docker
    sudo: ALL=(ALL) NOPASSWD:ALL
    shell: /bin/bash
    ssh_authorized_keys:
      - {PASTE HERE YOU SSH KEY}
packages:
  - fail2ban
  - firewalld
  - jq
  - caddy
  - make
package_update: true
package_upgrade: true
write_files:
  - path: /etc/docker/daemon.json
    content: |
      {
        &quot;ipv6&quot;: false,
        &quot;iptables&quot;: false
      }
    permissions: &apos;0644&apos;
  - path: /etc/fail2ban/jail.local
    content: |
      [sshd]
      enabled = true
      banaction = iptables-multiport
    permissions: &apos;0644&apos;
  - path: /etc/ssh/sshd_config
    content: |
         Protocol 2
         HostKey /etc/ssh/ssh_host_rsa_key
         HostKey /etc/ssh/ssh_host_ecdsa_key
         HostKey /etc/ssh/ssh_host_ed25519_key
         KbdInteractiveAuthentication no
         UsePrivilegeSeparation yes
         KeyRegenerationInterval 3600
         ServerKeyBits 4096
         SyslogFacility AUTH
         LogLevel VERBOSE
         LoginGraceTime 60
         PermitRootLogin no
         StrictModes yes
         PubkeyAuthentication yes
         IgnoreRhosts yes
         HostbasedAuthentication no
         PermitEmptyPasswords no
         ChallengeResponseAuthentication no
         PasswordAuthentication no
         X11Forwarding no
         PrintMotd no
         PrintLastLog yes
         TCPKeepAlive yes
         AcceptEnv LANG LC_*
         Subsystem sftp /usr/lib/openssh/sftp-server
         UsePAM yes
         MaxAuthTries 3
         AuthenticationMethods publickey
         KexAlgorithms curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256
         Ciphers chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com,aes256-ctr,aes192-ctr,aes128-ctr
         MACs hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com
         ClientAliveInterval 300
         ClientAliveCountMax 2
         AllowAgentForwarding yes
         AllowTcpForwarding no
         PermitUserEnvironment no
         AllowUsers trieve
runcmd:
  - curl -fsSL https://get.docker.com | sh
  - newgrp docker
  - systemctl enable fail2ban
  - systemctl restart docker
  - firewall-cmd --zone=public --add-masquerade --permanent
  - firewall-cmd --permanent --zone=trusted --add-interface=docker0
  - firewall-cmd --reload
  - firewall-cmd --zone=public --add-port=80/tcp --permanent
  - firewall-cmd --zone=public --add-port=443/tcp --permanent
  - firewall-cmd --zone=public --add-port=22/tcp --permanent
  - firewall-cmd --permanent --new-zone=docker
  - firewall-cmd --permanent --zone=docker --add-source=172.16.0.0/12
  - firewall-cmd --permanent --zone=docker --set-target=ACCEPT
  - iptables -I DOCKER-USER -i docker0 ! -o docker0 -j DROP
  - mkdir -p /etc/iptables/
  - iptables-save | tee /etc/iptables/rules.v4
  - firewall-cmd --reload
  - systemctl restart docker
  - su - trieve -c &quot;git clone https://github.com/devflowinc/trieve.git&quot;
  - sed -i &apos;s/KC_HOSTNAME=\&quot;localhost\&quot;/KC_HOSTNAME=\&quot;\&quot;/&apos; /home/trieve/trieve/.env.example
  - su - trieve -c &quot;cd trieve &amp;amp;&amp;amp; cp .env.example .env &amp;amp;&amp;amp; docker compose up -d &amp;amp;&amp;amp; sleep 3 &amp;amp;&amp;amp; docker compose down&quot;
  - reboot
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;![creating hetzner server part 1](https://cdn.trieve.ai/blog/self-hosting-guide/hetzner-create-server-1.webp)
![creating hetzner server part 2](https://cdn.trieve.ai/blog/self-hosting-guide/hetzner-create-server-2.webp)
![creating hetzner server part 3](https://cdn.trieve.ai/blog/self-hosting-guide/hetzner-create-server-3.webp)&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Create the server and wait for the initialization process to complete (approximately 3 minutes).
If you want to see the logs during cloud-init initialization, log into the server via SSH and execute:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sudo -s
journalctl -f
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Connect to your server via SSH:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;ssh trieve@YOUR_SERVER_IP
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;![ssh to see formation logs](https://cdn.trieve.ai/blog/self-hosting-guide/hetzner-formation-monitoring.webp)&lt;/p&gt;
&lt;h2&gt;3. Trieve Configuration on the Server&lt;/h2&gt;
&lt;h3&gt;3.1 Caddy Configuration&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Log in as root:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sudo -s
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Clear the default Caddy configuration:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;echo &amp;gt; /etc/caddy/Caddyfile
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Edit the Caddyfile:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;nano /etc/caddy/Caddyfile
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Replace &lt;code&gt;YOUR-DOMAIN.COM&lt;/code&gt; with your actual domain in the following configuration:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;dashboard.YOUR-DOMAIN.COM {
    reverse_proxy localhost:5173
}
search.YOUR-DOMAIN.COM {
    reverse_proxy localhost:5174
}
chat.YOUR-DOMAIN.COM {
    reverse_proxy localhost:5175
}
analytics.YOUR-DOMAIN.COM {
    reverse_proxy localhost:5176
}
api.YOUR-DOMAIN.COM {
    reverse_proxy localhost:8090
}
auth.YOUR-DOMAIN.COM {
    reverse_proxy localhost:8080
}
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Verify the Caddy configuration:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;caddy fmt /etc/caddy/Caddyfile
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Check your DNS A records, should be same as &lt;code&gt;YOUR_SERVER_IP&lt;/code&gt; (Hetzner server public ip):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;ping api.YOUR-DOMAIN.COM
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If they indicate a different IP address:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;verify the correct configuration on the domain provider&apos;s side&lt;/li&gt;
&lt;li&gt;restart the server again (This ensures that the server picks up the latest DNS changes)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Reload Caddy:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;systemctl reload caddy.service
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Verify certificate creation:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;journalctl -u caddy | grep &quot;successfully&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;![verify caddyfile content](https://cdn.trieve.ai/blog/self-hosting-guide/hetzner-certificate-generation.webp)&lt;/p&gt;
&lt;h3&gt;3.2 Environment Variables Configuration&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Switch to the trieve user and navigate to the Trieve directory:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;su - trieve
cd /home/trieve/trieve
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Edit the .env file:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;nano .env
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Modify the following variables or add them to the end of the file. Replace &lt;code&gt;YOUR-DOMAIN.COM&lt;/code&gt; with actual domain name in all the listed environment variables:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;KC_HOSTNAME=&quot;auth.YOUR-DOMAIN.COM&quot;
KC_PROXY=edge
VITE_API_HOST=https://api.YOUR-DOMAIN.COM/api
VITE_SEARCH_UI_URL=https://search.YOUR-DOMAIN.COM
VITE_CHAT_UI_URL=https://chat.YOUR-DOMAIN.COM
VITE_ANALYTICS_UI_URL=https://analytics.YOUR-DOMAIN.COM
VITE_DASHBOARD_URL=https://dashboard.YOUR-DOMAIN.COM
OIDC_AUTH_REDIRECT_URL=&quot;https://auth.YOUR-DOMAIN.COM/realms/trieve/protocol/openid-connect/auth&quot;
OIDC_ISSUER_URL=&quot;https://auth.YOUR-DOMAIN.COM/realms/trieve&quot;
BASE_SERVER_URL=&quot;https://api.YOUR-DOMAIN.COM&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;![trieve env setup](https://cdn.trieve.ai/blog/self-hosting-guide/trieve-env-config.webp)&lt;/p&gt;
&lt;h3&gt;3.3 Running Trieve&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Start the Trieve application:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;docker compose -f docker-compose-cpu-embeddings.yml up -d
docker compose up -d
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Monitor the logs during startup:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;docker compose logs -f
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If everything is well configured, server show more or less such logs:&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;![trieve logs view](https://cdn.trieve.ai/blog/self-hosting-guide/running-trieve-logs.webp)&lt;/p&gt;
&lt;h2&gt;4. Keycloak Configuration&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Access the Keycloak admin console at &lt;code&gt;https://auth.YOUR-DOMAIN.COM/admin/master/console/#/trieve/clients/list&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Log in with the default credentials:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Username: &lt;code&gt;admin&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Password: &lt;code&gt;aintsecure&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Select the &quot;vault&quot; client and add the following configurations:&lt;/p&gt;
&lt;p&gt;Valid redirect and Valid post logout redirect URIs:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;https://api.YOUR-DOMAIN.COM/*
https://dashboard.YOUR-DOMAIN.COM/*
https://chat.YOUR-DOMAIN.COM/*
https://search.YOUR-DOMAIN.COM/*
https://analytics.YOUR-DOMAIN.COM/*
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Web origins:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;+
http://localhost:8090
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;![keycloak config preview](https://cdn.trieve.ai/blog/self-hosting-guide/keycloak-config-preview.webp)&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Navigate to &lt;code&gt;https://dashboard.YOUR-DOMAIN.COM/&lt;/code&gt; and create a new account when prompted.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;5. Changing Default Passwords (OPTIONAL, BUT RECOMMENDED)&lt;/h2&gt;
&lt;p&gt;When configuring Trieve, it&apos;s crucial to change all default passwords to ensure the security of your self-hosted instance. This section will guide you through changing the passwords for various components of the Trieve stack. You can choose whether to do it through a script or manually.&lt;/p&gt;
&lt;h3&gt;5.1 Changing via script&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Export new password:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;export NEW_PASSWORD=&quot;WRITE HERE YOUR NEW PASSWORD&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Use &lt;code&gt;sed&lt;/code&gt; one-liners command:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;NEW_PASSWORD=&quot;$NEW_PASSWORD&quot; &amp;amp;&amp;amp; sed -i &apos;s/^MINIO_ROOT_PASSWORD=.*/MINIO_ROOT_PASSWORD=&quot;&apos;&quot;$NEW_PASSWORD&quot;&apos;&quot;/; s/^REDIS_PASSWORD=.*/REDIS_PASSWORD=&quot;&apos;&quot;$NEW_PASSWORD&quot;&apos;&quot;/; s|^REDIS_URL=.*|REDIS_URL=&quot;redis://:&apos;&quot;$NEW_PASSWORD&quot;&apos;@localhost:6379&quot;|; s|^DATABASE_URL=.*|DATABASE_URL=&quot;postgres://postgres:&apos;&quot;$NEW_PASSWORD&quot;&apos;@localhost:5432/trieve&quot;|; s/^SALT=.*/SALT=&quot;&apos;&quot;$NEW_PASSWORD&quot;&apos;&quot;/; s/^S3_SECRET_KEY=.*/S3_SECRET_KEY=&quot;&apos;&quot;$NEW_PASSWORD&quot;&apos;&quot;/; s/^CLICKHOUSE_PASSWORD=.*/CLICKHOUSE_PASSWORD=&apos;&quot;$NEW_PASSWORD&quot;&apos;/&apos; .env &amp;amp;&amp;amp; sed -i &apos;s/POSTGRES_PASSWORD:.*/POSTGRES_PASSWORD: &apos;&quot;$NEW_PASSWORD&quot;&apos;/; s/KEYCLOAK_ADMIN_PASSWORD=.*/KEYCLOAK_ADMIN_PASSWORD=&apos;&quot;$NEW_PASSWORD&quot;&apos;/; s/KC_DB_PASSWORD=.*/KC_DB_PASSWORD=&apos;&quot;$NEW_PASSWORD&quot;&apos;/; s/CLICKHOUSE_PASSWORD=.*/CLICKHOUSE_PASSWORD=&apos;&quot;$NEW_PASSWORD&quot;&apos;/&apos; docker-compose.yml
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;5.2 Changing by hand&lt;/h3&gt;
&lt;h4&gt;5.2.1 Updating the .env File&lt;/h4&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Navigate to the Trieve directory:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;cd /home/trieve/trieve
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Open the .env file for editing:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;nano .env
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Update the following variables with strong, unique passwords:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;MINIO_ROOT_PASSWORD&lt;/code&gt;: Change from &quot;rootpassword&quot; to a secure password for the Minio root user.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;REDIS_PASSWORD&lt;/code&gt;: Replace &quot;thisredispasswordisverysecureandcomplex&quot; with a new, complex password.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;S3_SECRET_KEY&lt;/code&gt;: Change &quot;ssssssssssssssssssssTTTTTTTTTTTTTTTTTTTT&quot; to a new secret key.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;SALT&lt;/code&gt;: Change &quot;goodsaltisveryyummy&quot; to a new random string.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;CLICKHOUSE_PASSWORD&lt;/code&gt;: Change from &quot;password&quot; to a secure password.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;REDIS_URL&lt;/code&gt;: Replace &quot;thisredispasswordisverysecureandcomplex&quot; from URL with a new, complex password.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;DATABASE_URL&lt;/code&gt;: Change from &quot;password&quot; in URL to a secure password.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Save the file and exit the editor.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h4&gt;5.2.2 Updating Docker Compose Configuration&lt;/h4&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Open the docker-compose.yml file:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;nano docker-compose.yml
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Update the following passwords in the file:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;For the &lt;code&gt;db&lt;/code&gt; service, change the &lt;code&gt;POSTGRES_PASSWORD&lt;/code&gt; from &quot;password&quot; to a new, secure password.&lt;/li&gt;
&lt;li&gt;For the &lt;code&gt;keycloak&lt;/code&gt; service, change the &lt;code&gt;KEYCLOAK_ADMIN_PASSWORD&lt;/code&gt; from &quot;aintsecure&quot; and &lt;code&gt;KC_DB_PASSWORD&lt;/code&gt; from &quot;password&quot; to a new, secure password.&lt;/li&gt;
&lt;li&gt;For the &lt;code&gt;keycloak-db&lt;/code&gt; service, change the &lt;code&gt;POSTGRES_PASSWORD&lt;/code&gt; from &quot;password&quot; to a new, secure password (use the same password as for the &lt;code&gt;db&lt;/code&gt; service).&lt;/li&gt;
&lt;li&gt;For the &lt;code&gt;clickhouse-db&lt;/code&gt; service, change the &lt;code&gt;CLICKHOUSE_PASSWORD&lt;/code&gt; from &quot;password&quot; to a new, secure password.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Save the file and exit the editor.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;5.3 Applying the Changes&lt;/h3&gt;
&lt;p&gt;After updating the passwords, you need to restart the Docker containers to apply the changes:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Stop the running containers, and remove volumes with old passwords:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;make clean
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Start the containers with the new configuration:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;docker compose up -d --force-recreate
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;After changing all passwords, verify that all services are running correctly:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;docker compose ps
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Test the Trieve application to ensure everything is functioning as expected.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Remember to store these new passwords securely, such as in a password manager. Never share them or expose them in public repositories or logs.&lt;/p&gt;
&lt;h2&gt;6. Troubleshooting&lt;/h2&gt;
&lt;p&gt;If you&apos;re unable to access the Keycloak admin panel due to SSL certificate issues, which can occur when the SSL certificates haven&apos;t been properly generated or applied, you can use the following workaround:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Temporarily comment out the &lt;code&gt;KC_HOSTNAME&lt;/code&gt; variable in the .env file:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sed -i &apos;s/^KC\_HOSTNAME/#KC\_HOSTNAME/g&apos; .env
docker compose up -d --force-recreate
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Also temporarily change the SSH settings in the file &lt;code&gt;/etc/ssh/sshd_config&lt;/code&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sudo sed -i &apos;s/AllowTcpForwarding no/AllowTcpForwarding yes/g&apos; /etc/ssh/sshd_config
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Set up an SSH tunnel to securely access your server:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;ssh -vv -D 1337 -C -N trieve@YOUR_SERVER_IP
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Configure your browser to use a SOCKS5 proxy:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Host: &lt;code&gt;localhost&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Port: &lt;code&gt;1337&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Access Keycloak via &lt;code&gt;http://192.168.1.2:8080&lt;/code&gt; and complete the configuration. This allows you to access Keycloak without SSL, enabling you to make necessary changes such as disabling the SSL requirement or updating redirect URIs.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;After making the necessary changes, restore the &lt;code&gt;KC_HOSTNAME&lt;/code&gt; and &lt;code&gt;AllowTcpForwarding&lt;/code&gt; variable:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;sed -i &apos;s/^#KC\_HOSTNAME/KC\_HOSTNAME/g&apos; .env
docker compose up -d --force-recreate
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;sudo sed -i &apos;s/AllowTcpForwarding yes/AllowTcpForwarding no/g&apos; /etc/ssh/sshd_config
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Remove the SOCKS5 proxy configuration from your browser.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This workaround should only be used temporarily to resolve initial setup issues. Ensure that you properly configure SSL for production use to maintain security.&lt;/p&gt;
&lt;h2&gt;7. FAQ&lt;/h2&gt;
&lt;p&gt;Q: What are the minimum server requirements for running Trieve?
A: The minimum recommended server is 8vCPU/16GB-RAM, while the optimal server is 8vCPU/32GB-RAM.&lt;/p&gt;
&lt;p&gt;Q: How do I update Trieve after installation?
A: To update Trieve, pull the latest Docker images and restart the containers. Specific update instructions may vary depending on the version, so consult the official documentation for the most up-to-date process.&lt;/p&gt;
&lt;p&gt;Q: Is it possible to use a custom SSL certificate instead of Let&apos;s Encrypt?
A: Yes, you can use a custom SSL certificate. You&apos;ll need to modify the Caddy configuration to use your custom certificate instead of the automatic Let&apos;s Encrypt provisioning.&lt;/p&gt;
&lt;p&gt;Q: How can I backup my Trieve instance?
A: To backup your Trieve instance, you should regularly backup the Docker volumes containing your data and the .env file containing your configuration. Consider using tools like restic or duplicity for automated backups.&lt;/p&gt;
&lt;p&gt;Q: What should I do if I forget the Keycloak admin password?
A: If you forget the Keycloak admin password, you can reset it by accessing the Keycloak container and using the built-in admin CLI. Consult the Keycloak documentation for specific instructions on resetting the admin password.&lt;/p&gt;
</content:encoded><category>tutorials</category><category>explainers</category><author>Marcin Stankiewicz</author></item><item><title>Free Alternative to Algolia Docsearch with AI Chat</title><link>https://trieve.ai/blog/trieve-sitesearch-launch/</link><guid isPermaLink="true">https://trieve.ai/blog/trieve-sitesearch-launch/</guid><pubDate>Thu, 31 Oct 2024 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;Introducing Trieve Vector Inference&lt;/h2&gt;
&lt;p&gt;Trieve Vector Inference (TVI), our solution for fast, unmetered embedding vector inference in your own cloud or on your own hardware, is now generally available as a standalone product!&lt;/p&gt;
&lt;p&gt;Building AI features at scale exposes two critical limitations of cloud embedding APIs: high latency and rate limits. Modern AI applications require better infrastructure.&lt;/p&gt;
&lt;p&gt;The platform supports any embedding model, whether it’s your own custom model, a private model, or popular open-source options. You get the flexibility to choose the right model for your use case while maintaining complete control over your infrastructure.&lt;/p&gt;
&lt;p&gt;We put together TVI to eliminate these bottlenecks for our own core product. It’s served billions of queries across billions of documents. After requests from others, we’ve sanded it down, wrote up some docs, and are now making it available for all. You can even get it on &lt;a href=&quot;https://aws.amazon.com/marketplace/pp/prodview-kxk2t4nafpmn4?sr=0-1&amp;amp;ref_=beagle&amp;amp;applicationId=AWSMPContessa&quot;&gt;AWS Marketplace&lt;/a&gt;!&lt;/p&gt;
&lt;h2&gt;So, just how good is TVI?&lt;/h2&gt;
&lt;p&gt;To start, here are our benchmarks:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/src/assets/images/blog-posts/trieve-sitesearch-launch/tvi-benchmarks-docs.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;At 1000 RPS, your P99 latency with cloud-based embeddings will range between 23.59 to 27.03 seconds. With Trieve, we’re still measuring in milliseconds. It’s literally an order of magnitude faster.&lt;/p&gt;
&lt;p&gt;Additionally, check out the failed requests. Notice that we ran 30,000 with no failures. Without TVI, you’re only making ~7,000. If you’re going through Sagemaker, it’s around ~3,000.&lt;/p&gt;
&lt;h2&gt;TVI is a simple solution that solves two problems&lt;/h2&gt;
&lt;h3&gt;Rate limits&lt;/h3&gt;
&lt;p&gt;Rate limits force you to implement complex batching and queueing. Now you’re allocating development time on workarounds instead of building core product. This crucial part of your pipeline should not feel like an exercise. We’ve seen teams build elaborate workarounds:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Multiple API keys rotated on a schedule&lt;/li&gt;
&lt;li&gt;Distributed rate limit tracking across microservices&lt;/li&gt;
&lt;li&gt;Complex retry logic with exponential backoff&lt;/li&gt;
&lt;li&gt;Request queuing systems that rival Kafka in complexity&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;One customer had an entire Kubernetes cluster dedicated to managing their embedding pipeline - not because of compute needs, but just to handle rate limit coordination across their services. Another built a “request budget” system that required teams to reserve embedding capacity days in advance.&lt;/p&gt;
&lt;p&gt;None of this complexity adds value to your product. It’s pure overhead, stealing engineering time from features that actually matter to your users.&lt;/p&gt;
&lt;h3&gt;High Latency&lt;/h3&gt;
&lt;p&gt;High latency robs your magic. When every embedding request takes 300ms+ round trip, your real-time features aren’t really real-time anymore.&lt;/p&gt;
&lt;p&gt;The cascading effects are brutal:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Search results that lag behind user typing&lt;/li&gt;
&lt;li&gt;Recommendations that feel disconnected from user actions&lt;/li&gt;
&lt;li&gt;Chat experiences with noticeable “thinking” delays&lt;/li&gt;
&lt;li&gt;Batch processes that take hours instead of minutes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Add in retries for rate limits and your “real-time” feature is now consistently 1+ second behind.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/src/assets/images/blog-posts/trieve-sitesearch-launch/sama-working-on-low-latency-embeddings.png&quot; alt=&quot;Sama working on low latency embeddings&quot; /&gt;&lt;/p&gt;
&lt;h3&gt;We hope you like the TVI DX&lt;/h3&gt;
&lt;p&gt;Most AI dev tools feel like they are at home in Jupyter notebooks or made for quick prototyping. The platforms hyperscalers and others built to onboard the masses onto AI were not designed to handle large scale.&lt;/p&gt;
&lt;p&gt;Most production-grade software seems to fall into two buckets. It’s either 1) expensive and requires intense engagement with a vendor, and is limited in scope or 2) painful to self-host, let alone begin to productionize. We’ve talked to some folk who spent weeks building and tweaking their own servers. Others who went the enterprise route and now have to maintain annoying shadow infrastructure. Pain!&lt;/p&gt;
&lt;p&gt;Trieve is a dev-first company. This means all engineers. DevOps, frontend, backend, everyone. We really think TVI is one of those rare solutions that works for tinkerers and giants alike. General support is available over 12 hours per day over email, Discord, Slack, and our office line.&lt;/p&gt;
&lt;h2&gt;Getting Started&lt;/h2&gt;
&lt;p&gt;You have two options for getting started with TVI. You can 1) buy a license from us and self-host it or 2) deploy it via AWS through AWS Marketplace. Self-hosting gives you maximum control, while AWS Marketplace provides the fastest path to production and still a ton of control.&lt;/p&gt;
&lt;p&gt;Our deployment process is streamlined for both paths:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;AWS: Deployment via Helm through Marketplace&lt;/li&gt;
&lt;li&gt;Self-hosted: Docker images and clear documentation for any environment&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We back TVI with a 15-day integration guarantee - if you can’t get it running in your environment, we’ll refund your fees. We’ve had teams go from zero to production in under an hour.&lt;/p&gt;
&lt;h2&gt;The Future of Vector Inference&lt;/h2&gt;
&lt;p&gt;We’re committed to making vector inference more accessible, faster, and more cost-effective for teams building AI features. Our roadmap includes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Enhanced monitoring and observability (custom Grafana dashboards coming soon)&lt;/li&gt;
&lt;li&gt;Support for more cloud providers (GCP and Azure coming soon)&lt;/li&gt;
&lt;li&gt;Additional model optimizations (including quantization and pruning)&lt;/li&gt;
&lt;li&gt;Advanced scaling features (automatic horizontal scaling based on load)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Plus, we’re working on some exciting features we can’t talk about yet. If you think today’s latency numbers are good, stay tuned!&lt;/p&gt;
</content:encoded><category>other</category><category>reviews</category><author>Nick Khami</author></item><item><title>Introducing TVI: Embedding and Reranking Infra Built for Kube </title><link>https://trieve.ai/blog/tvi-blog/</link><guid isPermaLink="true">https://trieve.ai/blog/tvi-blog/</guid><content:encoded>&lt;h2&gt;Introducing Trieve Vector Inference&lt;/h2&gt;
&lt;p&gt;Trieve Vector Inference (TVI), our solution for fast, unmetered embedding vector inference in your own cloud or on your own hardware, is now generally available as a standalone product!&lt;/p&gt;
&lt;p&gt;Building AI features at scale exposes two critical limitations of cloud embedding APIs: high latency and rate limits. Modern AI applications require better infrastructure.&lt;/p&gt;
&lt;p&gt;The platform supports any embedding model, whether it’s your own custom model, a private model, or popular open-source options. You get the flexibility to choose the right model for your use case while maintaining complete control over your infrastructure.&lt;/p&gt;
&lt;p&gt;We put together TVI to eliminate these bottlenecks for our own core product. It’s served billions of queries across billions of documents. After requests from others, we’ve sanded it down, wrote up some docs, and are now making it available for all. You can even get it on &lt;a href=&quot;https://aws.amazon.com/marketplace/pp/prodview-kxk2t4nafpmn4?sr=0-1&amp;amp;ref_=beagle&amp;amp;applicationId=AWSMPContessa&quot;&gt;AWS Marketplace&lt;/a&gt;!&lt;/p&gt;
&lt;h2&gt;So, just how good is TVI?&lt;/h2&gt;
&lt;p&gt;To start, here are our benchmarks:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/src/assets/images/blog-posts/tvi-blog/tvi-benchmarks-docs.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;At 1000 RPS, your P99 latency with cloud-based embeddings will range between 23.59 to 27.03 seconds. With Trieve, we’re still measuring in milliseconds. It’s literally an order of magnitude faster.&lt;/p&gt;
&lt;p&gt;Additionally, check out the failed requests. Notice that we ran 30,000 with no failures. Without TVI, you’re only making ~7,000. If you’re going through Sagemaker, it’s around ~3,000.&lt;/p&gt;
&lt;h2&gt;TVI is a simple solution that solves two problems&lt;/h2&gt;
&lt;h3&gt;Rate limits&lt;/h3&gt;
&lt;p&gt;Rate limits force you to implement complex batching and queueing. Now you’re allocating development time on workarounds instead of building core product. This crucial part of your pipeline should not feel like an exercise. We’ve seen teams build elaborate workarounds:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Multiple API keys rotated on a schedule&lt;/li&gt;
&lt;li&gt;Distributed rate limit tracking across microservices&lt;/li&gt;
&lt;li&gt;Complex retry logic with exponential backoff&lt;/li&gt;
&lt;li&gt;Request queuing systems that rival Kafka in complexity&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;One customer had an entire Kubernetes cluster dedicated to managing their embedding pipeline - not because of compute needs, but just to handle rate limit coordination across their services. Another built a “request budget” system that required teams to reserve embedding capacity days in advance.&lt;/p&gt;
&lt;p&gt;None of this complexity adds value to your product. It’s pure overhead, stealing engineering time from features that actually matter to your users.&lt;/p&gt;
&lt;h3&gt;High Latency&lt;/h3&gt;
&lt;p&gt;High latency robs your magic. When every embedding request takes 300ms+ round trip, your real-time features aren’t really real-time anymore.&lt;/p&gt;
&lt;p&gt;The cascading effects are brutal:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Search results that lag behind user typing&lt;/li&gt;
&lt;li&gt;Recommendations that feel disconnected from user actions&lt;/li&gt;
&lt;li&gt;Chat experiences with noticeable “thinking” delays&lt;/li&gt;
&lt;li&gt;Batch processes that take hours instead of minutes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Add in retries for rate limits and your “real-time” feature is now consistently 1+ second behind.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/src/assets/images/blog-posts/tvi-blog/sama-working-on-low-latency-embeddings.png&quot; alt=&quot;Sama working on low latency embeddings&quot; /&gt;&lt;/p&gt;
&lt;h3&gt;We hope you like the TVI DX&lt;/h3&gt;
&lt;p&gt;Most AI dev tools feel like they are at home in Jupyter notebooks or made for quick prototyping. The platforms hyperscalers and others built to onboard the masses onto AI were not designed to handle large scale.&lt;/p&gt;
&lt;p&gt;Most production-grade software seems to fall into two buckets. It’s either 1) expensive and requires intense engagement with a vendor, and is limited in scope or 2) painful to self-host, let alone begin to productionize. We’ve talked to some folk who spent weeks building and tweaking their own servers. Others who went the enterprise route and now have to maintain annoying shadow infrastructure. Pain!&lt;/p&gt;
&lt;p&gt;Trieve is a dev-first company. This means all engineers. DevOps, frontend, backend, everyone. We really think TVI is one of those rare solutions that works for tinkerers and giants alike. General support is available over 12 hours per day over email, Discord, Slack, and our office line.&lt;/p&gt;
&lt;h2&gt;Getting Started&lt;/h2&gt;
&lt;p&gt;You have two options for getting started with TVI. You can 1) buy a license from us and self-host it or 2) deploy it via AWS through AWS Marketplace. Self-hosting gives you maximum control, while AWS Marketplace provides the fastest path to production and still a ton of control.&lt;/p&gt;
&lt;p&gt;Our deployment process is streamlined for both paths:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;AWS: Deployment via Helm through Marketplace&lt;/li&gt;
&lt;li&gt;Self-hosted: Docker images and clear documentation for any environment&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We back TVI with a 15-day integration guarantee - if you can’t get it running in your environment, we’ll refund your fees. We’ve had teams go from zero to production in under an hour.&lt;/p&gt;
&lt;h2&gt;The Future of Vector Inference&lt;/h2&gt;
&lt;p&gt;We’re committed to making vector inference more accessible, faster, and more cost-effective for teams building AI features. Our roadmap includes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Enhanced monitoring and observability (custom Grafana dashboards coming soon)&lt;/li&gt;
&lt;li&gt;Support for more cloud providers (GCP and Azure coming soon)&lt;/li&gt;
&lt;li&gt;Additional model optimizations (including quantization and pruning)&lt;/li&gt;
&lt;li&gt;Advanced scaling features (automatic horizontal scaling based on load)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Plus, we’re working on some exciting features we can’t talk about yet. If you think today’s latency numbers are good, stay tuned!&lt;/p&gt;
</content:encoded><category>explainers</category></item><item><title>Trieve&apos;s New Usage-Based Pricing</title><link>https://trieve.ai/blog/usage-based-pricing/</link><guid isPermaLink="true">https://trieve.ai/blog/usage-based-pricing/</guid><pubDate>Wed, 02 Apr 2025 10:31:00 GMT</pubDate><content:encoded>&lt;h1&gt;&lt;strong&gt;Our New Usage-Based Pricing&lt;/strong&gt;&lt;/h1&gt;
&lt;p&gt;Trieve is growing. While effective at first, the tier-based pricing strategy was causing pain for our customers and the team. We’ve also expanded our free tier to make it more generous and builder-friendly.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Restrictive and Large Pricing Tiers&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;Tiering was designed to control concerns about our largest cost centers: &lt;em&gt;Chunks Stored&lt;/em&gt;, &lt;em&gt;File Storage&lt;/em&gt;, &lt;em&gt;AI Messages&lt;/em&gt;, and &lt;em&gt;Datasets&lt;/em&gt;. However, the floors, ceilings, and gaps between our tiers proved to be restrictive for our customer base.&lt;/p&gt;
&lt;p&gt;As dataset count hockey-sticks, companies attempting to scale especially those using multi-tenant setups like Conduit and Vapi saw friction managing their usage to avoid jumping into the next tier by exceeding chunk, message, or dataset limits.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;New Features that original pricing didn’t include&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;Trieve has grown so fast in the past year that our pricing table wasn’t able to keep up with the full breadth and depth of our product. Users found these features already baked in as we launched them and the tier-based pricing worked well enough to limit on chunk and dataset count.&lt;/p&gt;
&lt;p&gt;For example&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We added a first-party integration to &lt;a href=&quot;https://chunkr.ai/&quot;&gt;&lt;strong&gt;chunkr&lt;/strong&gt;&lt;/a&gt;, and our very own VLM-based &lt;a href=&quot;https://pdf2md.trieve.ai/&quot;&gt;&lt;strong&gt;pdf2md&lt;/strong&gt;&lt;/a&gt; service.&lt;/li&gt;
&lt;li&gt;We maintain our own webcrawler called &lt;a href=&quot;https://github.com/devflowinc/firecrawl-simple&quot;&gt;&lt;strong&gt;firecrawl-simple&lt;/strong&gt;&lt;/a&gt; for seamless and performant web-scraping driectly into a Trieve search index.&lt;/li&gt;
&lt;li&gt;We created a search component generator to prototype and create drop-in discovery components.&lt;/li&gt;
&lt;li&gt;We turbocharged our analytics dashboard and routes, added voice search, and the list goes on.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;&lt;strong&gt;Pricing Table&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;Without further ado, our pricing table:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Product&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Free Tier&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Cost&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Users&lt;/td&gt;
&lt;td&gt;First 5 Users free&lt;/td&gt;
&lt;td&gt;$5 / User&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&quot;#platform-fee&quot;&gt;&lt;strong&gt;Platform Fee&lt;/strong&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;$ 5 / mo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Storage Cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&quot;#storage-charge&quot;&gt;&lt;strong&gt;Chunk Storage&lt;/strong&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;1000 Chunks (11 MB)&lt;/td&gt;
&lt;td&gt;$132 / 1M chunks ($12.07 / GB)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&quot;#file-storage&quot;&gt;&lt;strong&gt;File Storage&lt;/strong&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;10 GB&lt;/td&gt;
&lt;td&gt;$0.046 / GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&quot;#datasets&quot;&gt;&lt;strong&gt;Datasets&lt;/strong&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;2 datasets&lt;/td&gt;
&lt;td&gt;$0.05 / dataset&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Ingestion (Resets at end of month)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&quot;#search-and-write-tokens&quot;&gt;&lt;strong&gt;Write Tokens&lt;/strong&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;First 3M tokens / mo free&lt;/td&gt;
&lt;td&gt;$0.028 / 1M tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&quot;#file-ocr&quot;&gt;&lt;strong&gt;File OCR&lt;/strong&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;First 100 / mo free&lt;/td&gt;
&lt;td&gt;$0.01 / Page&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&quot;#web-crawling&quot;&gt;&lt;strong&gt;Web Crawling&lt;/strong&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;First 10 pages / mo free&lt;/td&gt;
&lt;td&gt;$0.00086 / Page Crawled&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&quot;#bytes-ingested&quot;&gt;&lt;strong&gt;Bytes Ingested&lt;/strong&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;First 1 GB&lt;/td&gt;
&lt;td&gt;$2 / GB ingested&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Per Search charges&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&quot;#search-and-write-tokens&quot;&gt;&lt;strong&gt;Search Tokens&lt;/strong&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;First 3M tokens / mo free&lt;/td&gt;
&lt;td&gt;$0.028 / 1M tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&quot;#message-tokens&quot;&gt;&lt;strong&gt;Message Tokens&lt;/strong&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;First 263,000 tokens / mo free&lt;/td&gt;
&lt;td&gt;$3.528 / 1M tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&quot;#analytics&quot;&gt;&lt;strong&gt;Analytic Events&lt;/strong&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;First 1M events / mo free&lt;/td&gt;
&lt;td&gt;$0.0001 / event&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2&gt;&lt;strong&gt;Pricing Breakdown&lt;/strong&gt;&lt;/h2&gt;
&lt;h3&gt;&lt;strong&gt;Platform Fee&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;To account for the drastic price decreases, we feel a base platform fee for a full all in one solution is now needed. This also gives us flexiblity to expand to multiple tiers in the future for even better bulk cost savings in the future.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Chunk Storage&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Chunk storage is the largest cost center. We hold all the vectors, payload indices, and metadata in RAM with 2x replication. Holding all of these in RAM allows us to provide the lowest possible latency for all of our searches.&lt;/p&gt;
&lt;p&gt;We measure the size of a single chunk in Trieve to be &lt;strong&gt;&lt;code&gt;(1536 * 4 bytes/vector) + (256 * 4 bytes/sparse vector) + (4096 bytes/payload ) = 11,264 bytes&lt;/code&gt;&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;We host &lt;strong&gt;&lt;code&gt;n4-standard-64&lt;/code&gt;&lt;/strong&gt; machines and use &lt;strong&gt;&lt;code&gt;hyperdisk-balanced&lt;/code&gt;&lt;/strong&gt; NVMe SSDs for our vector db storage layer. The cost per GB is &lt;strong&gt;&lt;code&gt;$8.65/GB of RAM + 0.2/GB = $8.67/GB&lt;/code&gt;&lt;/strong&gt;. We charge a ~40% markup which comes out to $12.07 / GB.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;File Storage&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;S3 charges $0.023/GB. We charge $0.046/GB which is a 100% upcharge on S3 storage.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Datasets&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;We charge $0.05/dataset. Every dataset created adds a new index into our vector db’s HNSW index. Each dataset also gets an ingestion queue for files. Crawls and chunks have equal priority.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Search and Write Tokens&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;OpenAI’s &lt;strong&gt;&lt;code&gt;text-embedding-3-small&lt;/code&gt;&lt;/strong&gt; charges $0.02/M tokens. We charge a 40% markup to embed the input so that’s &lt;strong&gt;&lt;code&gt;$0.02 * 40% markup = $0.028/M tokens&lt;/code&gt;&lt;/strong&gt;. Both searches and writes are charged the same fee.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;File OCR&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Past the free first 100 pages, we charge $0.01/page. PDF pages average 800 tokens.&lt;/p&gt;
&lt;p&gt;For our pdf2md OCR ingestion using gpt-4o, our raw cost is &lt;strong&gt;&lt;code&gt;800 tokens * (0.00001 LLM cost/token) = $0.008/page&lt;/code&gt;&lt;/strong&gt;. Accounting for our markup, it comes out to &lt;strong&gt;$0.01 / page&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Web Crawling&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;We price at the industry standard level which is around $86/100,000 pages or &lt;strong&gt;$0.00086/page&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Bytes Ingested&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;The first 1GB of bytes ingested are free. Overages at the end of the month are $2/GB. This is to account for networking costs and match other databases’ write charges.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Message Tokens&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;As a basis, we charge per token. We allow proxying OpenRouter or OpenAI through Trieve. Embedding and search steps are bundled into the cost of an AI message in Trieve.&lt;/p&gt;
&lt;p&gt;For each message, we create an embeddding and conduct an LLM inference step. OpenAI’s &lt;strong&gt;&lt;code&gt;gtp-4o-mini&lt;/code&gt;&lt;/strong&gt; model charges us $0.6/1M tokens and &lt;strong&gt;&lt;code&gt;text-embedding-3-small&lt;/code&gt;&lt;/strong&gt; charges $0.02/1M tokens.&lt;/p&gt;
&lt;h3&gt;&lt;strong&gt;Analytics&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Trieve provides analytics for all searches and chat messages automatically. The first 1,000,000 searches and messages a month are free and have a 5 year retention period. Any additional events are $0.0001/event.&lt;/p&gt;
&lt;h2&gt;&lt;strong&gt;Trieve Enterprise&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;The prices above are for our public cloud service.&lt;/p&gt;
&lt;p&gt;We offer dedicated cloud services that come with SLA guarantees and all of the enterprise goodies. If you require a custom quote for a large scale usecase, please contact us. We are happy to provide bulk discount quotes for massive usage.&lt;/p&gt;
</content:encoded><category>announcements</category><author>Nick Khami</author></item><item><title>Announcing Trieve&apos;s New TypeScript SDK!</title><link>https://trieve.ai/blog/we-have-a-new-js-sdk/</link><guid isPermaLink="true">https://trieve.ai/blog/we-have-a-new-js-sdk/</guid><pubDate>Tue, 10 Sep 2024 13:38:00 GMT</pubDate><content:encoded>&lt;p&gt;If you have used Trieve in a JavaScript application, you probably know that you need to make most of your calls to Trieve using fetch. While this approach is good, it&apos;s not ideal, and we want to provide users with an easier way to use our APIs.&lt;/p&gt;
&lt;p&gt;Well, behind the scenes we have been working on making Trieve easier to use than ever in JavaScript applications and that includes making a new JavaScript SDK that makes it much simpler to integrate Trieve into any application.&lt;/p&gt;
&lt;p&gt;First things first, you can install the new &lt;a href=&quot;https://www.npmjs.com/package/trieve-ts-sdk&quot;&gt;&lt;code&gt;trieve-ts-sdk&lt;/code&gt;&lt;/a&gt; with your favorite package manager:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;yarn add trieve-ts-sdk
# or
npm install trieve-ts-sdk
# or
pnpm install trieve-ts-sdk
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And now let&apos;s see how it works, and let&apos;s take a search call as an example.&lt;/p&gt;
&lt;p&gt;Before you would need to do something like:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;fetch(&apos;https://api.trieve.ai/api/chunk/search&apos;, {
  method: &apos;POST&apos;,
  headers: {
    &apos;TR-Dataset&apos;: &apos;dc6f3b0d-cf21-412b-9d16-fb7ade090365&apos;,
    Authorization: &apos;tr-********************************&apos;,
  },
  body: JSON.stringify({
    query: &apos;Sonic the Hedgehog&apos;,
  }),
});
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;While this method works well, it&apos;s not the cleanest approach. You will need to have the documentation open next to your code editor, as there are no types to assist you in making your function calls.Now, with the new SDK you can call it like so:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import { TrieveSDK } from &apos;trieve-ts-sdk&apos;;

export const trieve = new TrieveSDK({
  apiKey: &apos;&amp;lt;your-api-key&amp;gt;&apos;,
  datasetId: &apos;&amp;lt;dataset-to-use&amp;gt;&apos;,
});

const results = await trieve.search({
  query: &apos;Sonic the Hedgehog&apos;,
});
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With the help of the exported types it&apos;s also much easier to create a much more complicated search that includes, for example, filters:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import { TrieveSDK } from &apos;trieve-ts-sdk&apos;;

const results = await trieve.search({
  query: &apos;Sonic the Hedgehog&apos;,
  search_type: &apos;hybrid&apos;,
  filters: {
    must: [
      {
        field: &apos;meta.rating&apos;,
        range: {
          gt: 80,
        },
      },
    ],
    must_not: [
      {
        field: &apos;metadata.console&apos;,
        match: [&apos;gba&apos;, &apos;wii&apos;],
      },
    ],
  },
});
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;![screenshot of typed Trieve SDK](https://cdn.trieve.ai/blog/we-have-a-new-js-sdk/fully-typed-trieve-ts-sdk.webp)&lt;/p&gt;
&lt;p&gt;And it&apos;s not just methods for chunks, we have functions for most of our API that you can use, want to stream a RAG completion? We got that:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;const reader = await trieve.createMessageReader({
  topic_id: id || currentTopic,
  new_message_content: currentQuestion,
  llm_options: {
    completion_first: true,
  },
});
handleReader(reader);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We also created &lt;a href=&quot;https://ts-sdk.trieve.ai/&quot;&gt;comprehensive docs&lt;/a&gt; so that all these functions are easy for you to find whether you use TypeScript or not.&lt;/p&gt;
&lt;p&gt;Okay, the last step is to install it and get to building search and RAG in your application!&lt;/p&gt;
</content:encoded><category>announcements</category><category>news</category><author>nikkitaFTW (Sara V)</author></item></channel></rss>