Learn how we use massively parallel LLM inference to cheat at search. Don't leave results to chance. See what's new

The Future of Work: Commanding Armies of Parallel LLMs

Directing 200 LLMs to simultaneously tackle a single task is the future of work.

May 22, 2025

Massive parallel LLM inference is underrated.

If you use AI today, you’re likely:

  1. Asking ChatGPT or Claude to do something for you
  2. Navigating away to thumb on another task for just long enough
  3. Navigating back to judge its output to then either:
    a. Accepting it, implementing the solution, and moving on
    b. Rejecting it and trying to reason with the AI again, and again, and again…

This is a waste of time! The future of work looks like StarCraft or Age of Empires.

Imagine you have an allocation of 10,000 requests—rather than a single prompt—at 100,000 tokens per second. (For context, the average GPT interaction uses 50-150 tokens per turn.) You’re directing this computational firepower to solve problems and create solutions, acting as the commander with agentic systems as your units. If the point of battle is to win, you need asymmetric advantages. But this isn’t just about sheer firepower or brute force—though that’s part of it. The real power lies in your ability to direct and manage these forces strategically to maximize their effectiveness.

Like any real working environment, you don’t spread capacity evenly across 200 different tasks. You provision agents to match the problem’s nature. Some tasks stand alone, while others form clusters—interconnected webs where complexity and complication (two distinct beasts) hide cloaked in an ether of ethers.

Yes, each LLM has some probability of finding a solution. But you don’t just want one correct solution per problem. You want enough to get multiple, then mix and match them into one extremely high quality solve. Think of it like Monte Carlo sampling, you’re hedging against randomness as you explore the solution space.

Data labeling is a fantastic base case example

As a familiar and persistent example, we built a demo app that categorizes images of clothing using parallel LLM calls. As a search company, being able to structure arbitrary datasets across different sources greatly improves our ability to index and manipulate information into awesome AI experiences for our customers’ customers.

But here’s what’s actually cool: you could photograph all your belongings and have AI instantly sort which items are worth more than $50. With humans, you need dozens of people working in parallel to finish in reasonable time.

Google spent years tricking hundreds of millions of people into labeling data through CAPTCHAs. Now you can leverage the years of human effort that went into this and accomplish similar tasks for much less in a few hours. As you might also assume, this will get faster and cheaper as LLMs improve.

Clothing Image Labeler

Upload images of clothing and automatically categorize them. Powered by Trieve.

Step 1. Customize or Select Tags
Add, edit, or remove tags to define the clothing categories you want to use for labeling.
Outerwear
Tops
Bottoms
Footwear
Headwear
Accessories
Denim Jacket
Leather Jacket
T-Shirt
Long Sleeve Shirt
Sweater
Jeans
Socks
Boots
Sneakers
Cap
Beanie
Belt
Bag
Bracelet
Ring
Money Clip
Sports Bra
Step 2. Upload Images or Use Demo Images
Upload your own clothing images or try the demo images below. Images will be automatically labeled using your selected tags.
demo-preview-0demo-preview-1demo-preview-2
No upload needed for demo – click above to try with sample images!
or

Click or drag images here to upload - PNG, JPG, and WEBP supported.

Parallel compute should be all of the time

We should be using parallel compute for everything AI. We should be supervisors, not spectators.

Right now, as a software engineer, I open my IDE, prompt an agent to edit code, then watch it work. Strong WALL-E fitless human vibes; disengaged, passive.

I don’t want this. I want to be in flow state, hyper-engaged and forward-leaning. I want to feel and produce like Ender commanding an entire fleet.

It should be trivial to provision multiple LLMs with high variance to tackle tasks in parallel. If AI can generate product photos, I want to pick 3 styles and deploy 6 agents, 2 per style-generating image sets—simultaneously. If something’s going wrong, I can zoom in and intervene while they work. The feedback I give one agent can be immediately applied to the others, allowing them to adjust their outputs in real-time.

This is similar to a strategy some employ on Fiverr when they hire multiple freelancers for the same project to judge the quality of their work and pick the best candidate.

New tools are needed

You can sort of replicate this with git worktrees for programming, but it’s clunky. We need new software designed for managing parallelized general intelligence.

What we have now feels like a horseless carriage, we’re mimicking old patterns instead of embracing new possibilities. The future UX for AI isn’t watching a single agent work. It’s commanding hundreds simultaneously.

Maybe these tools will be vertical-specific. Maybe they’ll be general-purpose. Either way, I’m excited to see what we build and get it in your hands.