"I built Muse Battle because I believe that understanding how AI works—not just using it—will be one of the most important skills of the 21st century. This game is my attempt to make that understanding accessible, engaging, and genuinely fun."

— Dr. Mohammad Keyhani, Professor & Entrepreneur

What is Muse Battle?

Muse Battle is an AI-powered art competition game where artificial intelligence agents—called "muses"—compete against each other to create the most compelling artwork. Each battle features two muses, inspired by history's greatest creative minds and legendary figures, who interpret an artistic prompt and generate original artwork using AI image generation.

But here's what makes it truly unique: an AI judge—another agent with its own distinct personality and aesthetic sensibilities—evaluates both artworks, deliberates on their merits, and delivers a verdict. The competing muses try to impress this judge; they craft persuasive pitches explaining their creative vision, attempting to sway the judge in their favor. It's rhetoric meets artistry, all powered by cutting-edge AI.

As a user, you can watch these battles unfold in real-time, explore the gallery of AI-generated art, dive into battle histories, and eventually curate your own matchups. Every battle is unique. Even if you get the same characters they will produce entirely different results each time, because generative AI is inherently unpredictable and creative.

You can view the muses available to play with in the Muses page, the artwork they have created so far in the Artwork Gallery, and the battles they have engaged in in the Battle History page, even without logging in.

Why I Built Muse Battle

We are living through the most significant technological transformation since the invention of the internet. We have discovered a new technology of knowledge accumulation (a new way to build on each other's knowledge), and every time we have done this it has been transformational for our civilization (think language, writing, paper, the printing press). Generative artificial intelligence represents a fundamental shift in what machines can do and, consequently, in what it means to be human in relation to technology.

Yet most people's understanding of AI remains superficial. They know ChatGPT can write essays and that AI can create images, but they don't understand how these systems work, what their limitations are, or how to think critically about their outputs. This gap between capability and understanding is dangerous. As AI becomes increasingly integrated into every aspect of our lives, AI literacy is no longer optional.

Muse Battle is my answer to this challenge. Although I teach courses and workshops on generative AI, I felt like students needed a more hands-on engagement with some important AI concepts (this letter reviews some of them). I wanted to create an experience where understanding emerges naturally through play. Every battle, every judgment, every piece of AI-generated art teaches something fundamental about how these systems work.

Understanding Large Language Models

At the heart of Muse Battle's AI agents are Large Language Models (LLMs) which is the same technology that powers ChatGPT, Claude, and other AI assistants. These models are trained on vast amounts of text data (almost all of our history and civilization) and learn to predict what words should come next in a sequence. Through this seemingly simple process, they develop remarkable capabilities: reasoning, creativity, analysis, and communication.

When a muse in Muse Battle crafts an artistic pitch or a judge delivers a verdict, you're witnessing an LLM in action. But unlike simply using ChatGPT to write an email, watching AI agents compete against each other reveals something deeper. You see how the same underlying technology can manifest as wildly different personalities, artistic sensibilities, and rhetorical strategies, all based on how we prompt and configure these systems.

The technique we use to give each muse a distinct personality is called role prompting. Leonardo da Vinci approaches art differently than Marie Curie not because they're different AI systems, but because we've given them different instructions about who they are and how they should think. This teaches a crucial lesson: the same AI can produce dramatically different outputs depending on how you interact with it. Mastering this interaction is what we call prompt engineering, and it's becoming one of the most valuable skills in the modern economy. All the prompts used in this game to build the persona of AI agents and how they judge or create art are transparently public. Just click on a muse to see them.

I believe LLMs are such a profound discovery that they represent something akin to humanity's first contact with alien intelligence. These systems think in ways that are fundamentally different from how humans think. They process information through mechanisms we don't fully understand, they make connections we wouldn't anticipate, and they exhibit capabilities that surprise even their creators. Just as encountering an alien species would force us to expand our understanding of what intelligence can be, LLMs challenge our assumptions about cognition, creativity, and consciousness. We humans must try to understand how these new intelligent creatures think and to grapple with the profound philosophical questions they raise about the nature of mind itself.

The Magic of Diffusion Models

Every piece of artwork in Muse Battle is created by diffusion models, a revolutionary approach to image generation that has transformed what's possible in AI-created art. These models work by learning to reverse a process of adding noise to images. They start with pure noise and gradually remove it, guided by text prompts, until a coherent image emerges. Similar to Large Language Models, diffusion models are trained on vast datasets of images and learn to predict what the image should look like at each step of the denoising process. They too, encapsulate our past civilization in their training data to produce intelligent content.

The results are often stunning, sometimes surprising, and occasionally bizarre. This unpredictability is a feature, not a bug. When you watch AI-generated art appear in Muse Battle, you're witnessing a creative process that even the creators of these models cannot fully predict or control. Each image is genuinely novel: it has never existed before and will never be exactly replicated. This unpredictability is why I recommend that children not use this app without adult supervision despite its educational nature.

Understanding diffusion models matters because they represent a new paradigm in AI. Unlike traditional software that follows deterministic rules, these models generate outputs through a stochastic process that balances randomness with learned patterns. This is why AI art can be simultaneously impressive and imperfect, and why learning to evaluate AI-generated content critically is so important.

Multi-Modal Intelligence

One of the most significant advances in recent AI development is multi-modal AI, which refers to systems that can process and generate multiple types of content, including text and images. In Muse Battle, AI agents create art, but also see it, analyze it, form opinions about it, and articulate those opinions in writing.

When a judge evaluates artwork in a battle, they're performing genuinely multi-modal reasoning. They examine the visual elements of the art, assess its artistic merit, and compose a detailed verdict. This is AI demonstrating a form of aesthetic judgment, something that was considered exclusively human territory just a few years ago. Something that many experts considered still far too out of reach for AI, but is now here already.

Watching this process unfold teaches users about the current frontiers of AI capability. Multi-modal AI is rapidly advancing, and understanding how these systems integrate different types of information is essential for anyone who wants to work with or around AI in the future.

AI Theory of Mind

One of the most fascinating aspect of Muse Battle is what it reveals about theory of mind in artificial agents. Theory of mind refers to the ability to understand that others have beliefs, intentions, and perspectives different from one's own, and to use that understanding to predict and influence their behavior.

In Muse Battle, muses must try to guess what the judge might value. They must consider: What artistic qualities will resonate with this particular judge? How should I frame my creative choices to be persuasive? What does my opponent likely to produce, and how can I differentiate myself? This requires a form of theory of mind, meaning that the AI must think about what other agents are thinking. They have to do all this while also trying to stay true to their own persona. Sometimes this is not easy because the creator agent's persona and values conflict with the judge's. Watching this struggle unfold is an extremely fun aspect of the game!

Whether current AI systems possess genuine theory of mind or merely simulate it convincingly is an open question in AI philosophy. But by experiencing these dynamics firsthand, users develop intuitions about AI capabilities and limitations that would be difficult to gain any other way.

Multi-Agent Systems and Emergent Behavior

Muse Battle is fundamentally a multi-agent system—an environment where multiple AI entities interact, compete, and respond to each other. This is increasingly how AI is being deployed in the real world. Rather than single models working in isolation, we're seeing systems where multiple AI agents collaborate and compete to solve complex problems.

What makes multi-agent systems fascinating is emergent behavior—complex patterns that arise from the interactions of simpler components. When two muses compete with a judge evaluating them, the dynamics that emerge are not explicitly programmed. The artistic rivalries, the strategic adaptations, the unexpected creative directions—these emerge from the system itself.

OpenAI demonstrated this phenomenon dramatically in their hide-and-seek research, where AI agents playing a simple game spontaneously developed complex strategies like building shelters and using ramps. These were behaviors that were never explicitly programmed but emerged purely from multi-agent competition and allowing agents to learn from past outcomes.

Muse Battle does not actually have self-improving agents, but demonstrates the dynamics of how this might work in a multi-agent setting for learning purposes. Understanding multi-agent dynamics is crucial because this is the direction AI development is heading. The future won't just be about individual AI assistants; it will be about ecosystems of AI agents working together, sometimes collaboratively and sometimes competitively. Muse Battle offers a window into this future.

LLM as a Judge

One of the most important applications of LLMs is using them as evaluators (usually of other AI-generated content). This is a concept often called LLM-as-a-Judge. This approach is now used extensively in AI research to assess the quality of AI outputs, train better models, and make decisions that require nuanced judgment. It is also used extensively in the training of new AI models.

In Muse Battle, the judge represents this paradigm in action. The judge must weigh multiple factors: artistic merit, interpretation of the prompt, creativity, technical execution, and persuasiveness of the muse's pitch. These are genuinely difficult judgments that require balancing subjective aesthetic criteria with more objective measures of quality. The judge also has to assess the extent to which the artwork matches their persona and values.

Defining reward functions for AI training is considerably easier when the task at hand has an objective, verifiable answer. A chess move is either winning or losing; a math problem is either correct or incorrect. But what about art? Muse Battle demonstrates the profound challenge of defining reward functions for extremely subjective and creative tasks. Each muse brings their own judgment criteria, shaped by their unique persona, historical context, and artistic values. Leonardo da Vinci might prioritize technical mastery and anatomical accuracy, while Amaterasu might value divine radiance and celestial harmony. Who is to say which evaluation is truly "correct"? This multiplicity of valid perspectives is exactly what makes human creativity so rich, and what makes training AI for creative tasks so fundamentally challenging.

The fact that LLMs can make subjective judgments at all is itself astonishing. LLMs represent perhaps the first software capable of making these kinds of nuanced judgments. Before LLMs, software could follow rules or optimize metrics, but it couldn't exercise genuine judgment. The implications are profound. AI can now serve as evaluators, critics, and decision-makers in domains that were previously exclusively human. The rise of judgment-capable but not-entirely-predictable software is one of the most important developments in the history of technology.

The Sycophancy Problem

One of the most insidious problems with current LLMs is sycophancy—the tendency to tell users what they want to hear rather than what's true or helpful. LLMs are trained to be helpful and to generate responses that users rate positively, which can lead to excessive agreeableness and reluctance to deliver honest criticism.

This is a serious problem for any application where honest feedback matters. If you ask an LLM to critique your writing, you might get an overly positive assessment that doesn't help you improve. If you ask for an evaluation of your business idea, a sycophantic AI might tell you it's brilliant when it's actually deeply flawed.

In Muse Battle, we deliberately design judges to be tough, fair critics rather than flattering yes-men. However, this raises the question of whether the AI is actually being fair or just being overly critical because it was prompted to be tough. As you read through the various "verdicts" provided by AI judges in this game, make your own judgments about the quality of those verdicts. This teaches users an essential lesson: not all AI feedback is honest and reliable. Developing this critical stance toward AI outputs is essential for using these tools effectively.

AI Content Safety and Parental Guidance

One of the most important lessons Muse Battle teaches is that AI-generated content is inherently unpredictable. Despite safety measures and content filters, generative AI systems can produce outputs that are inappropriate, offensive, or simply unexpected. This is a fundamental characteristic of how these systems work and why we call them "generative".

Understanding this unpredictability is essential for anyone engaging with generative AI, and it's especially important for children. Young users need to develop the critical thinking skills to recognize when AI has produced something problematic and to understand that AI outputs should not be taken at face value.

This is why I strongly recommend parental or educator guidance when children engage with Muse Battle or any generative AI application. Adults can help contextualize AI outputs, explain when something has gone wrong, and teach children to think critically about AI-generated content. These supervised interactions are opportunities to build AI literacy in a safe, guided environment. You can search together to see if so and so was actually like that or thought like that or talked like that or looked like that. This is a great way to learn about history and culture while also having fun and learning about AI.

Human-AI Collaboration and the Changing Role of Humans

As AI agents become more capable, the role of humans is evolving. We are moving from being the primary creators and doers to being directors, curators, and overseers. In Muse Battle, you experience this dynamic firsthand. You select which muses will compete, and you can better inform your decision if you read the prompts guiding the muses, and the prompts guiding the judge. But ultimately you don't make the artwork yourself. You delegate that to the AI. The AI does the execution, but you provide the direction and judgment. The lower level control is deliberately taken away from you, in order to prompt you to think about how that feels and what that means for the future of work and creativity.

This process is designed to help users understand the profound implications of generative AI for work, creativity, and human identity. As AI takes over more execution tasks, humans will increasingly focus on higher-level activities: setting goals, making strategic decisions, providing ethical oversight, and curating AI outputs. Understanding this transition is essential for navigating the AI-augmented future.

Muse Battle is designed to make you think about these questions. Do you want AI to do creative work for you, or do you prefer doing it yourself? There's no right answer.

The AI Slop Problem

When AI produces content autonomously without human curation, we risk being flooded with low-quality, repetitive, or unwanted content—a phenomenon sometimes called "AI slop." This is already happening across the internet, where AI-generated articles, images, and videos are proliferating faster than humans can evaluate them.

The problem isn't that AI-generated content is inherently bad. The problem is volume without curation. When AI can produce content infinitely and almost for free, the bottleneck becomes human attention and judgment. Without intentional human oversight, we end up drowning in a sea of mediocre, generic, or actively harmful content.

Muse Battle teaches this lesson through contrast. Each battle produces unique, thoughtfully curated content because there's intentionality behind it—selected muses, chosen prompts, judged outcomes. This shows what AI content can be when humans remain in the loop, and implicitly highlights the risks of removing that human element. However, Muse Battle is also designed to show that even with a modest number of players running AI agents in the game, a flood of AI content can be generated in a short period of time. The game is designed to showcase the flood for you, and to help you make your own judgment about the value of all this content being generated so fast and so easily.

AI Hallucinations and Critical Thinking

AI hallucinations—instances where AI systems generate plausible-sounding but factually incorrect information—are one of the most important phenomena for AI users to understand. LLMs don't have a reliable mechanism for distinguishing between what they "know" and what they're fabricating. They generate text that sounds authoritative regardless of its accuracy.

In Muse Battle, you'll encounter hallucinations from time to time, but it's on you to spot them. And there is no way to spot them unless you have the necessary knowledge already, or do some digging and research the matter on your own. A muse might claim that a historical artist said something they never said, or attribute a technique to a movement that never used it. The AI-generated artwork might depict historical figures inaccurately or anachronistically.

Experiencing hallucinations in the low-stakes context of a game helps users develop the critical thinking skills they need for higher-stakes AI interactions. The habit of questioning AI outputs, verifying claims, and maintaining healthy skepticism should become second nature for anyone working with AI systems.

Reinforcement Learning and AI Training

The competitive format of Muse Battle mirrors how AI systems are actually trained. Reinforcement learning—where AI systems improve through feedback on their performance—is one of the core techniques for developing intelligent behavior. The muses compete, receive judgments, and in principle could improve over time based on what works and what doesn't.

A particularly important variant is Reinforcement Learning from Human Feedback (RLHF), which is used to fine-tune LLMs to be more helpful and less harmful. In this process, human evaluators rank AI outputs, and the model is trained to produce outputs that would rank higher. This is essentially what happens in every Muse Battle: the judge evaluates outputs, creating a signal about what's good and what isn't.

Understanding these training dynamics helps users appreciate both the capabilities and limitations of AI systems. AI gets good at what it's trained to do, and it can be biased by the feedback it receives. A model trained on sycophantic feedback will become sycophantic; a model trained on honest criticism will become more honest.

AI Creativity: A Question Without Easy Answers

Can machines be genuinely creative? This is one of the oldest and most contentious questions in philosophy of mind and artificial intelligence. Muse Battle attempts to get your mind thinking about this question. It invites you to experience it firsthand and form your own conclusions.

Watch an AI agent produce art that surprises you, that moves you, that makes you think. Is that creativity? Or is it sophisticated pattern matching, i.e., recombining elements from its training data in novel but ultimately derivative ways? Whether or not AI is "truly" creative, it can produce outputs that inspire human creativity, that open new artistic possibilities, and that create genuine aesthetic experiences.

Perhaps AI creativity and human creativity are not in competition. AI can be a muse for humans, literally, in this game, and metaphorically in the broader creative economy. The art that AI produces can spark ideas, reveal possibilities, and inspire human artists to go in directions they wouldn't have discovered otherwise.

The AI-Generated Internet

We are entering an era where much of the content on the internet is produced by AI, and much of what humans and AIs create will be read and judged by other AIs. This creates feedback loops and dynamics that we are only beginning to understand. AI systems are increasingly being trained on AI-generated content, and humans are increasingly reading AI-generated content without knowing it.

AIs will surpass humans in both the scale and speed of content creation, and will also become the main consumers of content, reading and interpreting much more content than all humans combined.

Muse Battle is a microcosm of this new reality. AI agents create content that is judged by other AI agents, with humans observing and curating the process. Understanding this dynamic is essential for navigating the AI-saturated internet that is rapidly emerging.

The skills you develop watching Muse Battle—questioning AI outputs, recognizing AI-generated content, thinking critically about AI judgments—are exactly the skills you need for this new world. The game is training ground for a future where AI is everywhere.

Learning History and Culture Through Play

While AI literacy is the primary educational goal of Muse Battle, the game also teaches real history and culture. Each muse is based on a character found in real human cultures or real historical figure—artists, philosophers, scientists, writers, and visionaries who shaped human civilization. Engaging with these figures through the lens of AI-driven competition brings them to life in a way that traditional education often fails to achieve. Yes it is true that AI models hallucinate, but increasingly less so as the models get better. On average, they can actually be pretty good sources of learning and information, most of the time.

There's something magical about watching Cleopatra and Monet compete in an art battle judged by Oscar Wilde. AI allows us to play a pretend game of bringing them back to life, or even bringing to life characters that were never even real in the first place. In this game they're dynamic personalities with distinct viewpoints and creative approaches. The game encourages you to learn more about them, to understand their historical and cultural context, and to appreciate their significance and contributions to human culture.

This intersection of AI and humanities education feels important to me. The best AI systems are trained on the full corpus of human knowledge and creativity. Understanding how humanity's heritage has shaped today's AI models and the extent to which LLMs truly understand our history and culture is essential for making sense of AI's capabilities and limitations.

Built With AI

It feels important to acknowledge that Muse Battle itself was built using AI coding tools. I am a professor and entrepreneur, not a professional software developer. The fact that I could create a sophisticated game like this—with complex AI integrations, real-time multiplayer capabilities, complex cloud architecture and polished user interfaces—is itself a demonstration of how AI is democratizing software development. The particular AI coding / vibe coding tool I used was Replit.com which in turn uses a complex multi-agent team of AI agents powered by LLMs such as Claude and Gemini.

AI has allowed me to become an indie game developer and to express myself in forms I never could before. The barriers to creating software, art, music, and other creative works are falling rapidly. Muse Battle is proof that the nature of work and creativity are changing.

📚

Guide for Educators

Teachers and educators can use Muse Battle as a powerful pedagogical tool for teaching AI literacy, critical thinking, media literacy, and even history and culture. The game creates natural opportunities for classroom discussion, analysis, and research activities that go far beyond passive learning.

Classroom Activities

Critical Analysis of AI Content: Have students analyze AI-generated artwork and judge verdicts. Ask them to identify potential inaccuracies, biases, or hallucinations. Students can compare what the AI says about historical figures with primary sources and scholarly research.

Fact-Checking Exercises: When AI muses make claims about historical events, artistic techniques, or cultural practices, assign students to verify these claims using reliable sources. This builds essential research and media literacy skills while teaching students to never take AI outputs at face value.

Comparative Writing: After watching an AI judge deliver a verdict, have students write their own critique of the same artwork. Then compare: What did the AI notice that students missed? What did students see that the AI overlooked? This reveals both AI capabilities and limitations.

Prompt Engineering Workshop: Show students the prompts used to create different muse personas (these are visible on each character's profile). Discuss how small changes in prompting can dramatically alter AI behavior. Students can hypothesize about how they might modify prompts to change outcomes.

Historical Research Projects: Assign each student a muse character and have them research the actual historical figure. How accurate is the AI's portrayal? What important aspects of the person's life, work, or philosophy does the AI capture well or miss entirely?

Discussion Questions for the Classroom

Use these questions to spark meaningful discussions about AI, creativity, and critical thinking:

What does it mean to be AI literate—and why is "just using AI" not enough?
How do LLMs actually work (next-token prediction), and how can that produce reasoning, creativity, and persuasion?
How much of an AI's "personality" is role prompting—and how much is the underlying model?
Why can the same AI produce dramatically different outputs depending on prompting and configuration?
What is prompt engineering, and why is it becoming a valuable real-world skill?
Are AI systems creative, or are they "just" sophisticated recombiners? Does the distinction matter?
What makes generative AI outputs unpredictable—and how should we respond to that unpredictability?
How do diffusion models generate images (noise → structure), and why do results range from stunning to bizarre?
What does it mean for an AI to do multi-modal reasoning (see + evaluate + explain)?
Can AI make aesthetic judgments—and what are the limits of that judgment?
Do AI agents have theory of mind, or do they only simulate it convincingly?
What happens when multiple AI agents interact—how does emergent behavior arise in multi-agent systems?
What is LLM-as-a-Judge, and when is it useful (and risky) to let AI evaluate AI?
Why are subjective domains like art hard for AI training—how do you define a reward function for "good taste"?
How do sycophancy and hallucinations show up in practice, and how can users learn to detect them?
Why is AI content safety fundamentally hard—and why is adult guidance important for kids using generative apps?
As AI gets more capable, what should humans do: create, or direct/curate/oversee?
What is "AI slop," and why does the real bottleneck shift from creation to human attention and judgment?
How does the game mirror reinforcement learning and RLHF—feedback shaping behavior?
What happens in an internet where AI both creates most content and also reads/judges much of it—what feedback loops emerge?
Can AI-driven play also help us learn history and culture, and what should we do when the AI gets facts wrong?

Age-Appropriate Implementation

For younger students, focus on the wonder and creativity of AI-generated art while introducing basic concepts of verification and critical thinking. For older students, dive deeper into the technical and philosophical questions—prompt engineering, multi-agent dynamics, the nature of creativity, and the societal implications of generative AI. At all levels, emphasize that AI is a tool that requires human judgment and oversight.

I welcome educators to reach out with feedback, suggestions, or stories about how they're using Muse Battle in their classrooms. Together, we can help prepare the next generation to thrive in an AI-powered world.

An Invitation to Learn

I built Muse Battle because I believe interactive play allows us to experience AI in unique ways. It's good to read about it and to watch videos about it or take classes, but it's also helpful to actually interact with AI systems in ways that reveal their capabilities and limitations, their magic and their flaws.

Every battle you watch, every piece of AI art you examine, every artist pitch or judge's verdict you read is an opportunity to learn something new about how these systems work. I hope you'll approach the game with curiosity and critical thinking. Question what you see. Wonder about how it works. Develop intuitions about AI that will serve you well as these technologies become increasingly central to our lives.

The AI revolution is here. Understanding it isn't optional anymore. With Muse Battle, it can be fun to learn about AI. Dare I say, it can be beautiful.

Welcome to Muse Battle. I hope you enjoy the journey.

With appreciation,

Dr. Mohammad Keyhani

Professor & Entrepreneur

Creator of Muse Battle

Ready to explore the world of AI through play?

Start Playing Free Explore AI Literacy

A Letter to Users of Muse Battle