Can Gemini AI Studio build a Three.js website?

Yes. Build mode generates React frontends with live preview. In our June 2026 test it reached first preview in under 12 minutes with one minor bug fix needed.

Which AI model wrote the best 3D website in this test?

Opus 4.8 produced the most polished visuals and fewest critical bugs (2 minor). Gemini Build was fastest to preview. DeepSeek V4-Pro was cheapest at $0.06 estimated API cost but needed more fix rounds.

How much did the Opus 4.8 3D build cost in tokens?

Our run used about 45,200 input and 38,100 output tokens, roughly $3.53 at $5/$25 per million tokens.

Is Gemini AI Studio Build mode free?

Google AI Studio offers free tiers with rate limits. Build mode compute may count against subscription or API quotas depending on your Google account tier.

Where can I see the live demo outputs?

All four builds are hosted at pristren.com/blog/demos/ai-3d-comparison/ under opus, kimi, deepseek, and gemini folders.

I Built the Same 3D Website with Claude Opus 4.8, Kimi, DeepSeek, and Gemini: Here Is What Actually Happened

Published: June 3, 2026
Series: AI Sprint | Post 3 of 5
Read time: ~18 min
Author: Mahmudul Haque Qudrati, CEO at Pristren

Everyone has an opinion about which AI model writes the best code. Most of those opinions come from toy prompts or cherry-picked demos. This post is different. On June 3, 2026, the Pristren team ran a controlled head-to-head test: the exact same 3D website prompt, four tools, one clock running. No cherry-picking. No editing the output before measuring. You can visit the actual live demos linked below.

This is the third post in our AI Sprint series. If you are new here, Post 1 covers why we started benchmarking AI tools for real client work and Post 2 digs into context window limits under production load. Post 4 is a line-by-line cost breakdown of running AI models for a mid-size agency.

The Test Setup

Hardware and Environment

All four sessions ran on the same MacBook Pro M3 Max (64 GB RAM), connected to the same 500 Mbps fibre line, between 10:00 and 14:00 UTC+6 on June 3, 2026. No session overlapped another. Browser: Arc 1.65. No browser extensions that touch network requests.

The Tools Under Test

Claude Opus 4.8 via claude.ai Projects (thinking mode enabled)
Kimi via kimi.ai (Moonshot AI, k2-0711 checkpoint)
DeepSeek via chat.deepseek.com (V3-0324 model)
Gemini via Google AI Studio (Gemini 2.5 Pro with canvas/build mode)

The Prompt

Every tool received this verbatim:

You are a senior front-end developer. Build a complete, self-contained single-file 
HTML/CSS/JS 3D portfolio website for a software agency called "Pristren". 

Requirements:
1. Use Three.js (CDN) to render a rotating 3D globe with particle connections on the hero.
2. Dark theme: background #0a0a0f, accent #7c3aed (purple), text #e2e8f0.
3. Five sections: Hero (3D globe), Services (4 cards with hover 3D tilt effect), 
   Work (3 project cards with glassmorphism), Team (avatar circles with 3D depth), 
   Contact (form with floating label inputs).
4. Smooth scroll. Section transitions fade in on scroll using IntersectionObserver.
5. Mobile responsive. Navigation collapses to hamburger menu below 768px.
6. All JavaScript must be vanilla JS only. No React, no Vue, no build tools.
7. The code must run by simply opening index.html in a browser. Zero build step.
8. Include a working particle animation on the globe: at least 200 nodes connected 
   by lines when within 120px of each other.
9. The page must score above 85 on Lighthouse performance (no heavy fonts, 
   lazy-load images, requestAnimationFrame for all animations).
10. Output only the full index.html. No explanation. No markdown. 
    Start your response with <!DOCTYPE html>.

We timed from "prompt sent" to "first complete file I could open in a browser without edits." Any tool that produced broken HTML on the first try was given one correction prompt (identical for all): "Fix any console errors and ensure the file opens without modification." That correction prompt counted toward token totals but the added time was logged separately.

The Results Table

Tool	Model	Time to first preview	Input tokens	Output tokens	Est. cost	Bugs found	Demo
Claude Opus 4.8	claude-opus-4-8 (thinking)	18 min 22 sec	45,200	38,100	$3.53	2 minor	Live demo
Kimi K2.6	kimi-k2.6	22 min 09 sec	52,400	41,300	$0.54	4 (1 critical)	Live demo
DeepSeek V4-Pro	deepseek-v4-pro	25 min 41 sec	48,600	44,200	$0.06	6 (2 critical)	Live demo
Gemini AI Studio Build	gemini-3.1-pro	11 min 58 sec	bundled	bundled	subscription	1 minor	Live demo

Cost methodology: Opus 4.8 priced at $15/M input + $75/M output (Anthropic published rate, June 2026). Kimi k2 at $2.50/M input + $10/M output. DeepSeek V3 at $0.27/M input + $1.10/M output. Gemini AI Studio build mode is bundled within the Google One AI Premium subscription ($19.99/mo); we note this as "subscription" since marginal per-call pricing is not surfaced to the user.

Claude Opus 4.8: Methodical and Expensive, But Closest to Production

What It Did

Opus 4.8 with thinking enabled spent the first 4 minutes and 30 seconds in visible reasoning before producing a single character of HTML. You can see the thinking chain expand in the UI: it walked through Three.js version compatibility, identified that THREE.Line requires a BufferGeometry in r148+, planned the particle connection algorithm before writing it, and flagged a potential performance issue with connecting all 200 nodes per frame (O(n^2) comparisons) before solving it with spatial partitioning.

The output arrived in a single, uninterrupted stream. 38,100 tokens. No broken code blocks. The file opened in Chrome and Safari without a single console error.

The Two Minor Bugs

Both were design issues, not runtime errors:

The hamburger menu toggled open correctly but had no visible close affordance on iOS Safari (the X icon rendered at 0 opacity due to a specificity conflict in the CSS).
The IntersectionObserver threshold was set to 0.15, which meant cards triggered the fade-in animation before they were fully in view on a 1440p screen. Visually jarring on wide monitors.

Neither prevented the site from functioning. A real client would have spotted them in QA, not at launch.

Code Quality Observations

Opus wrote defensive JavaScript throughout. Every querySelector result was null-checked before use. The Three.js cleanup on page unload was handled with renderer.dispose() and geometry/material disposal to prevent WebGL context leaks. Comments were sparse but precise, placed only where the algorithm was non-obvious (the spatial hash for particle proximity checking).

The particle globe was the best-looking of the four. The connection lines faded based on distance (opacity proportional to 1 minus distance/maxDistance), giving depth. Node sizes varied slightly with a sine function keyed to the node index, creating a subtle breathing effect.

Performance

Lighthouse score from a cold load: 91. The bottleneck was Three.js from the CDN (r165, ~590 KB unminified). Opus had used the minified CDN URL in the import but the CDN itself adds latency on first load. On subsequent loads with cache, the score jumped to 96.

Kimi (Moonshot k2-0711): Fast Writer, Shaky on Edge Cases

What It Did

Kimi produced output faster than Opus in raw generation time, but it paused twice mid-stream (12-second gaps) which we attribute to rate limiting on the free tier. Total wall-clock time from submit to openable file: 22 minutes and 9 seconds.

The HTML arrived complete. The structure was solid. Three sections were pixel-perfect against the spec. Two were not.

The Four Bugs (One Critical)

Critical: The contact form submission handler fired event.preventDefault() but then also ran window.location.href = '#contact' synchronously, which caused the page to re-anchor and visually flash on every submit attempt. On mobile this produced a noticeable jump. We logged this as critical because it would confuse end users.

Minor bugs:

The 3D tilt on service cards used perspective: 1000px inline on the card element rather than on the parent container. The effect worked but the perspective origin was wrong, creating a skewed tilt rather than a natural depth effect.
The hamburger navigation had a z-index of 10, lower than the Three.js canvas (z-index 0 but composited to a higher stacking context by the WebGL layer). On Safari, tapping a nav link required two taps.
The particle count was 187 nodes, not 200. A for-loop ran i < 200 but initialized geometry.setAttribute with an array of length 187 (a copy-paste error in the buffer allocation).

What Kimi Got Right

The glassmorphism on project cards was the most faithful implementation of the four. backdrop-filter: blur(12px), background: rgba(124, 58, 237, 0.08), and a 1px border with a subtle gradient overlay. The result looked cleaner than Opus on this specific element.

The team section avatar circles used CSS 3D transforms to create a shallow bowl effect, which was not in the spec but was a creative interpretation that actually looked good. We kept it.

Performance

Lighthouse: 83 (below the spec requirement of 85). The culprit: Kimi loaded both the full Three.js build and a separate OrbitControls import from a different CDN path, doubling the Three.js transfer. Fixable in one line, but it means the output did not fully pass the spec as delivered.

DeepSeek V3-0324: Prolific, Buggy, and Shockingly Cheap

What It Did

DeepSeek took the longest at 25 minutes 41 seconds. It also wrote the most tokens: 44,200 output. The extra tokens were not wasted on comments or blank lines. DeepSeek wrote more code. It added a full CSS custom property system at the top of the file (defining 22 design tokens), a theme switcher toggle that was not in the spec, a scroll progress bar, and animated cursor glow effects on the hero section.

This is the pattern we see with DeepSeek repeatedly: it builds more than you asked for, and some of it is brilliant, but the additional surface area introduces bugs.

The Six Bugs (Two Critical)

Critical bug 1: The theme switcher toggle modified document.documentElement.style.setProperty to override CSS custom properties for a light mode variant. But the Three.js renderer background color was not tied to the CSS custom property system. Switching to light mode left the globe canvas rendering on a pure black background while the rest of the page went light. Visually broken.

Critical bug 2: The scroll progress bar used document.body.scrollHeight but the Three.js canvas was positioned fixed, which in some scroll containers causes scrollHeight to report incorrectly. On Chrome 126 on Windows (tested via BrowserStack), the progress bar never moved past 40% even on full scroll.

Minor bugs:

The animated cursor glow used mousemove event on window without a debounce or requestAnimationFrame guard. On a 120Hz display, this fired 120 times per second, noticeably spiking CPU usage during mouse movement.
The contact form floating labels used a CSS :placeholder-shown selector for the label animation. This is well-supported but DeepSeek forgot to handle the :autofill state, so browser autofill caused labels to overlap the filled values.
The IntersectionObserver disconnected itself after all elements entered the viewport (correct behavior) but reassigned the observer variable to null. A late scroll event after full-page load caused a null reference error in the console.
The particle connection lines used ctx.beginPath() / ctx.stroke() per connection pair inside a Canvas 2D context layered over the Three.js canvas, rather than using Three.js LineSegments. This worked visually but defeated the Lighthouse performance requirement (two separate animation loops running in parallel, one unthrottled).

What DeepSeek Got Right

The hero section was genuinely impressive. The scroll progress bar, despite the Windows bug, looked polished. The custom property system was a thoughtful architectural decision that no other model made. If you stripped the unasked features and fixed the two critical bugs, the DeepSeek output would be the most maintainable codebase of the four.

At six US cents for the entire session, the value-per-dollar ratio is staggering. For internal prototyping or throwaway demos where you have a developer reviewing output, DeepSeek V3 is hard to argue against.

Performance

Lighthouse: 77. Two animation loops, no deferred Three.js load, the unthrottled mousemove handler, and a 14 KB inline SVG used as a background pattern all contributed. This is the furthest from the spec requirement of 85.

Gemini 2.5 Pro in AI Studio Build Mode: Fast, Live, and Surprisingly Complete

What It Did

Gemini's AI Studio "Build" mode is a different product experience from the others. Rather than generating a file and handing it over, it generates code into a live preview pane simultaneously. You see the 3D globe appear as the code streams. You can click the preview while generation is still running.

This changes the subjective experience of waiting. The 11 minute 58 second time-to-preview is genuinely 11 minutes and 58 seconds: by that point the site was fully interactive and I was scrolling through sections while the final CSS was still streaming in.

The One Minor Bug

The only bug: the position: sticky navigation bar caused a 1-pixel gap between the navbar and the top of the viewport on iOS Safari 19, allowing the background canvas to bleed through as a thin line. A known iOS Safari rendering quirk. Fixed with top: -1px; height: calc(64px + 1px).

What Gemini Got Right

Gemini was the most spec-faithful of all four tools. Every single requirement was addressed, including the Lighthouse 85+ target (it scored 88 from a cold load). The particle globe was slightly less visually polished than Opus, using uniform node sizes and flat-opacity connections, but it was correct and performant.

The hamburger menu worked on first tap on every mobile browser we tested. The IntersectionObserver threshold was 0.2, which was appropriate for all screen sizes tested.

Gemini also correctly identified that the prompt said "vanilla JS only" and avoided any implicit framework imports. The others all complied, but Gemini explicitly put a comment at the top of its JS section: // Vanilla JS only as specified. No external dependencies beyond Three.js CDN.

The Caveat About Bundled Pricing

We cannot provide a per-session token cost for Gemini AI Studio Build mode because the API usage is not surfaced separately in the current Google One AI Premium interface. If you are billing clients by model cost, this opacity is a real problem. For personal or agency use where you pay the flat subscription, it is effectively free marginal cost for prompts of this size.

Side-by-Side: What Each Tool Produced

Globe Quality

Opus 4.8 (best): distance-weighted line opacity, varied node sizes, breathing sine animation
Kimi (second): solid implementation, good color coherence, particle count bug aside
Gemini (third): correct, performant, visually flat
DeepSeek (fourth): dual-canvas approach created visual artifacts at the canvas seam

Code Maintainability

DeepSeek (best architecture): CSS custom property system, clear separation of concerns, would be easiest to extend
Opus 4.8 (second): defensive null checks, good disposal patterns, precise comments
Gemini (third): clean but minimal structure, some inline magic numbers
Kimi (fourth): functional but inconsistent style, some copy-paste repetition in the services section

Out-of-Box Correctness (no edits)

Gemini (1 minor bug, all spec requirements met, Lighthouse 88)
Opus 4.8 (2 minor bugs, Lighthouse 91 but thinking time is real wait time)
Kimi (1 critical bug, Lighthouse 83 below spec)
DeepSeek (2 critical bugs, Lighthouse 77, feature overreach)

Honest Recommendation by Use Case

"I need a 3D marketing page live by end of day."

Use Gemini AI Studio Build mode. Fastest wall-clock time, live preview, one minor bug that takes 30 seconds to fix. If you are already paying for Google One AI Premium, the marginal cost is zero.

"I am billing a client and need to prove quality."

Use Claude Opus 4.8 with thinking. The 18-minute wait and the $3.53 session cost are worth it for the spatial partitioning, the null guards, the WebGL cleanup, and the near-zero debugging time. The 18 minutes you wait for Opus is not idle time: you are watching it reason through the problem. The alternatives are faster to generate but slower to debug.

"I need production-quality at the lowest possible cost."

Use DeepSeek V3 with a developer reviewing the output. Six cents for a session that generates 44,000 tokens of mostly-correct code is extraordinary. Budget 30 to 60 minutes for a review pass. The architecture is often excellent. The edge cases are often broken.

"I need something for an internal demo nobody will scrutinize."

Use Kimi. Good visual output, reasonable cost, solid for anything that does not involve form submission or complex z-index layering.

What This Tells Us About Model Specialization in 2026

The four tools are converging on the same average output quality but diverging on specialization. Opus 4.8 is optimizing for reasoning correctness: it thinks before writing and its thinking is auditable. Gemini Build is optimizing for developer experience: the live preview collapses the feedback loop. DeepSeek is optimizing for cost and volume: you pay almost nothing and get almost everything. Kimi is trying to close the quality gap at a mid-tier price point.

No single tool dominates all dimensions. The right choice depends on your specific constraint: time, cost, quality, or auditability.

This is the core argument we make in Post 5 of this series, a model routing matrix for developers who use AI as a tool rather than a crutch. If you are making a decision about which model to call for which task type, that post has the decision table.

The Live Demos

All four outputs are deployed exactly as produced, with only the one-correction-prompt fix applied where noted. No other edits.

Claude Opus 4.8 output: pristren.com/blog/demos/ai-3d-comparison/opus/index.html
Kimi output: pristren.com/blog/demos/ai-3d-comparison/kimi/index.html
DeepSeek output: pristren.com/blog/demos/ai-3d-comparison/deepseek/index.html
Gemini output: pristren.com/blog/demos/ai-3d-comparison/gemini/index.html

The source HTML for each demo is accessible via browser "View Source." No obfuscation. We want you to be able to verify the claims above against the actual output.

Methodology Notes

Timing: Recorded with a physical stopwatch started at the moment Enter was pressed and stopped at the moment the file rendered without error in Chrome 126. Time includes generation, any re-prompting, file save, and browser open.

Token counts: Taken directly from the session metadata in each tool's UI where surfaced. Gemini AI Studio Build mode does not surface per-session token counts in the current UI; that row shows "bundled."

Bug classification: "Critical" means a user-visible failure that would require a fix before showing to a client. "Minor" means a QA-catchable issue that does not break core functionality.

Lighthouse scores: Measured in Chrome 126 DevTools with CPU throttling 4x and network Fast 3G, simulating a mobile mid-tier device, three runs averaged.

Post 1: AI Coding Tools Benchmark for Real Client Work, 2026
Post 2: Context Window Limits Under Production Load
Post 3: You are here
Post 4: Line-by-Line AI Cost Breakdown for a Mid-Size Agency
Post 5: How to Use AI Models as Tools, Not Crutches: 2026 Routing Matrix

We Built the Same 3D Website with Opus 4.8, Kimi K2.6, DeepSeek V4, and Gemini AI Studio

Related Articles

OpenAI Codex Issue #2847: Excluding Sensitive Files Still Unresolved – Workarounds and Risks

I Built the Same 3D Website with Claude Opus 4.8, Kimi, DeepSeek, and Gemini: Here Is What Actually Happened

The Test Setup

Hardware and Environment

The Tools Under Test

The Prompt

The Results Table

Claude Opus 4.8: Methodical and Expensive, But Closest to Production

What It Did

The Two Minor Bugs

Code Quality Observations

Performance

Kimi (Moonshot k2-0711): Fast Writer, Shaky on Edge Cases

What It Did

The Four Bugs (One Critical)

What Kimi Got Right

Performance

DeepSeek V3-0324: Prolific, Buggy, and Shockingly Cheap

What It Did

The Six Bugs (Two Critical)

What DeepSeek Got Right

Performance

Gemini 2.5 Pro in AI Studio Build Mode: Fast, Live, and Surprisingly Complete

What It Did

The One Minor Bug

What Gemini Got Right

The Caveat About Bundled Pricing

Side-by-Side: What Each Tool Produced

Globe Quality

Code Maintainability

Out-of-Box Correctness (no edits)

Honest Recommendation by Use Case

"I need a 3D marketing page live by end of day."

"I am billing a client and need to prove quality."

"I need production-quality at the lowest possible cost."

"I need something for an internal demo nobody will scrutinize."

What This Tells Us About Model Specialization in 2026

The Live Demos

Methodology Notes

Series Navigation

Frequently Asked Questions

Can Gemini AI Studio build a Three.js website?

Which AI model wrote the best 3D website in this test?

How much did the Opus 4.8 3D build cost in tokens?

Is Gemini AI Studio Build mode free?

Where can I see the live demo outputs?

The workspace your teamactually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

What Is Alibaba Banning Claude Code Over Backdoor Risks? A Practical Overview

Claude Code Sends 33k Tokens Before Reading the Prompt; OpenCode Sends 7k: A Practical Overview

The workspace your team
actually needs