US Tech & AI

A weekend ‘vibe code’ hack by Andrej Karpathy quietly sketches the missing layer of enterprise AI orchestration

By Eric November 30, 2025

This weekend, Andrej Karpathy, a prominent figure in AI as the former director at Tesla and a founding member of OpenAI, embarked on a unique project that intertwines literature with artificial intelligence. He created what he termed the “LLM Council,” a playful yet insightful software application designed to facilitate a multi-AI reading experience. Rather than reading alone, Karpathy envisioned a panel of AI models, each providing critiques and insights on the text, ultimately synthesizing a final answer under the guidance of a “Chairman.” The application, which he shared on GitHub with a light-hearted disclaimer about its ephemeral nature, serves as a fascinating glimpse into the future of AI orchestration in enterprise settings.

At its core, the LLM Council exemplifies a new approach to AI infrastructure, particularly as businesses prepare for significant platform investments in 2026. The application operates through a sophisticated three-stage workflow, emulating human decision-making processes. Initially, user queries are dispatched to a select group of advanced AI models, including OpenAI’s GPT-5.1 and Google’s Gemini 3.0 Pro. These models generate responses in parallel, which are then subjected to a peer review process where they critique one another based on accuracy and insight. Finally, a designated “Chairman LLM” synthesizes these responses into a cohesive answer for the user. Karpathy noted that the results often revealed surprising preferences among the models, with some favoring the insights of their peers over their own, highlighting the potential for collaborative AI decision-making.

However, beyond its entertaining premise, the LLM Council serves as a critical reference architecture for enterprise technology leaders. It illustrates the simplicity of routing and aggregating AI models while underscoring the complexities involved in making such systems enterprise-ready. Built on a minimal architecture using FastAPI and React, the project emphasizes the growing trend of treating AI models as interchangeable components, thus mitigating vendor lock-in. Yet, Karpathy’s creation also exposes significant gaps in security, compliance, and operational reliability that are essential for production systems. As enterprises increasingly rely on AI, the challenge lies not just in the technology itself but in establishing robust governance frameworks that ensure data security and compliance. Ultimately, the LLM Council invites enterprise leaders to reflect on their AI strategies, balancing the allure of rapid, customizable solutions against the necessity for comprehensive oversight and support.

This weekend,
Andrej Karpathy
, the former director of AI at Tesla and a founding member of OpenAI, decided he wanted to read a book. But he did not want to read it alone. He wanted to read it accompanied by a committee of artificial intelligences, each offering its own perspective, critiquing the others, and eventually synthesizing a final answer under the guidance of a “Chairman.”
To make this happen, Karpathy wrote what he called a ”
vibe code project
” — a piece of software written quickly, largely by AI assistants, intended for fun rather than function. He posted the result, a repository called ”
LLM Council
,” to GitHub with a stark disclaimer: “I’m not going to support it in any way… Code is ephemeral now and libraries are over.”
Yet, for technical decision-makers across the enterprise landscape, looking past the casual disclaimer reveals something far more significant than a weekend toy. In a few hundred lines of
Python
and
JavaScript
, Karpathy has sketched a reference architecture for the most critical, undefined layer of the modern software stack: the orchestration middleware sitting between corporate applications and the volatile market of AI models.
As companies finalize their platform investments for 2026,
LLM Council
offers a stripped-down look at the “build vs. buy” reality of AI infrastructure. It demonstrates that while the logic of routing and aggregating AI models is surprisingly simple, the operational wrapper required to make it enterprise-ready is where the true complexity lies.
How the LLM Council works: Four AI models debate, critique, and synthesize answers
To the casual observer, the
LLM Council
web application looks almost identical to ChatGPT. A user types a query into a chat box. But behind the scenes, the application triggers a sophisticated, three-stage workflow that mirrors how human decision-making bodies operate.
First, the system dispatches the user’s query to a panel of frontier models. In Karpathy’s default configuration, this includes OpenAI’s
GPT-5.1
, Google’s
Gemini 3.0 Pro
, Anthropic’s
Claude Sonnet 4.5
, and xAI’s
Grok 4
. These models generate their initial responses in parallel.
In the second stage, the software performs a peer review. Each model is fed the anonymized responses of its counterparts and asked to evaluate them based on accuracy and insight. This step transforms the AI from a generator into a critic, forcing a layer of quality control that is rare in standard chatbot interactions.
Finally, a designated “Chairman LLM” — currently configured as Google’s Gemini 3 — receives the original query, the individual responses, and the peer rankings. It synthesizes this mass of context into a single, authoritative answer for the user.
Karpathy noted that the results were often surprising. “Quite often, the models are surprisingly willing to select another LLM’s response as superior to their own,” he wrote on X (formerly Twitter). He described using the tool to read book chapters, observing that the models consistently praised GPT-5.1 as the most insightful while rating Claude the lowest. However, Karpathy’s own qualitative assessment diverged from his digital council; he found GPT-5.1 “too wordy” and preferred the “condensed and processed” output of Gemini.
FastAPI, OpenRouter, and the case for treating frontier models as swappable components
For CTOs and platform architects, the value of
LLM Council
lies not in its literary criticism, but in its construction. The repository serves as a primary document showing exactly what a modern, minimal AI stack looks like in late 2025.
The application is built on a “thin” architecture. The backend uses
FastAPI
, a modern
Python
framework, while the frontend is a standard
React
application built with
Vite
. Data storage is handled not by a complex database, but by simple
JSON files
written to the local disk.
The linchpin of the entire operation is
OpenRouter
, an API aggregator that normalizes the differences between various model providers. By routing requests through this single broker, Karpathy avoided writing separate integration code for
OpenAI
,
Google
, and
Anthropic
. The application does not know or care which company provides the intelligence; it simply sends a prompt and awaits a response.
This design choice highlights a growing trend in enterprise architecture: the commoditization of the model layer. By treating frontier models as interchangeable components that can be swapped by editing a single line in a configuration file — specifically the COUNCIL_MODELS list in the backend code — the architecture protects the application from vendor lock-in. If a new model from
Meta
or
Mistral
tops the leaderboards next week, it can be added to the council in seconds.
What’s missing from prototype to production: Authentication, PII redaction, and compliance
While the core logic of
LLM Council
is elegant, it also serves as a stark illustration of the gap between a “weekend hack” and a production system. For an enterprise platform team, cloning Karpathy’s repository is merely step one of a marathon.
A technical audit of the code reveals the missing “boring” infrastructure that commercial vendors sell for premium prices. The system lacks authentication; anyone with access to the web interface can query the models. There is no concept of user roles, meaning a junior developer has the same access rights as the CIO.
Furthermore, the governance layer is nonexistent. In a corporate environment, sending data to four different external AI providers simultaneously triggers immediate compliance concerns. There is no mechanism here to redact Personally Identifiable Information (PII) before it leaves the local network, nor is there an audit log to track who asked what.
Reliability is another open question. The system assumes the
OpenRouter API
is always up and that the models will respond in a timely fashion. It lacks the circuit breakers, fallback strategies, and retry logic that keep business-critical applications running when a provider suffers an outage.
These absences are not flaws in Karpathy’s code — he explicitly stated he does not intend to support or improve the project — but they define the value proposition for the commercial AI infrastructure market.
Companies like
LangChain
,
AWS Bedrock
, and various AI gateway startups are essentially selling the “hardening” around the core logic that Karpathy demonstrated. They provide the security, observability, and compliance wrappers that turn a raw orchestration script into a viable enterprise platform.
Why Karpathy believes code is now “ephemeral” and traditional software libraries are obsolete
Perhaps the most provocative aspect of the project is the philosophy under which it was built. Karpathy described the development process as ”
99% vibe-coded
,” implying he relied heavily on AI assistants to generate the code rather than writing it line-by-line himself.
“Code is ephemeral now and libraries are over, ask your LLM to change it in whatever way you like,” he wrote in the repository’s documentation.
This statement marks a radical shift in software engineering capability. Traditionally, companies build internal libraries and abstractions to manage complexity, maintaining them for years. Karpathy is suggesting a future where code is treated as “promptable scaffolding” — disposable, easily rewritten by AI, and not meant to last.
For enterprise decision-makers, this poses a difficult strategic question. If internal tools can be ”
vibe coded
” in a weekend, does it make sense to buy expensive, rigid software suites for internal workflows? Or should platform teams empower their engineers to generate custom, disposable tools that fit their exact needs for a fraction of the cost?
When AI models judge AI: The dangerous gap between machine preferences and human needs
Beyond the architecture, the
LLM Council
project inadvertently shines a light on a specific risk in automated AI deployment: the divergence between human and machine judgment.
Karpathy’s observation that his models preferred GPT-5.1, while he preferred Gemini, suggests that AI models may have shared biases. They might favor verbosity, specific formatting, or rhetorical confidence that does not necessarily align with human business needs for brevity and accuracy.
As enterprises increasingly rely on ”
LLM-as-a-Judge
” systems to evaluate the quality of their customer-facing bots, this discrepancy matters. If the automated evaluator consistently rewards “wordy and sprawled” answers while human customers want concise solutions, the metrics will show success while customer satisfaction plummets. Karpathy’s experiment suggests that relying solely on AI to grade AI is a strategy fraught with hidden alignment issues.
What enterprise platform teams can learn from a weekend hack before building their 2026 stack
Ultimately,
LLM Council
acts as a Rorschach test for the AI industry. For the hobbyist, it is a fun way to read books. For the vendor, it is a threat, proving that the core functionality of their products can be replicated in a few hundred lines of code.
But for the enterprise technology leader, it is a reference architecture. It demystifies the orchestration layer, showing that the technical challenge is not in routing the prompts, but in governing the data.
As platform teams head into 2026, many will likely find themselves staring at Karpathy’s code, not to deploy it, but to understand it. It proves that a multi-model strategy is not technically out of reach. The question remains whether companies will build the governance layer themselves or pay someone else to wrap the “vibe code” in enterprise-grade armor.

A weekend ‘vibe code’ hack by Andrej Karpathy quietly sketches the missing layer of enterprise AI orchestration

Related Articles

The best smart rings for tracking sleep and health

Creating a glass box: How NetSuite is engineering trust into AI

EU investigates Google over AI-generated summaries in search results