I Built the Same 3D Website with Claude Opus 4.8, Kimi, DeepSeek, and Gemini: Here Is What Actually Happened
Published: June 3, 2026
Series: AI Sprint | Post 3 of 5
Read time: ~18 min
Author: Mahmudul Haque Qudrati, CEO at Pristren
Everyone has an opinion about which AI model writes the best code. Most of those opinions come from toy prompts or cherry-picked demos. This post is different. On June 3, 2026, the Pristren team ran a controlled head-to-head test: the exact same 3D website prompt, four tools, one clock running. No cherry-picking. No editing the output before measuring. You can visit the actual live demos linked below.
This is the third post in our AI Sprint series. If you are new here, Post 1 covers why we started benchmarking AI tools for real client work and Post 2 digs into context window limits under production load. Post 4 is a line-by-line cost breakdown of running AI models for a mid-size agency.
The Test Setup
Hardware and Environment
All four sessions ran on the same MacBook Pro M3 Max (64 GB RAM), connected to the same 500 Mbps fibre line, between 10:00 and 14:00 UTC+6 on June 3, 2026. No session overlapped another. Browser: Arc 1.65. No browser extensions that touch network requests.
The Tools Under Test
- Claude Opus 4.8 via claude.ai Projects (thinking mode enabled)
- Kimi via kimi.ai (Moonshot AI, k2-0711 checkpoint)
- DeepSeek via chat.deepseek.com (V3-0324 model)
- Gemini via Google AI Studio (Gemini 2.5 Pro with canvas/build mode)
The Prompt
Every tool received this verbatim:
You are a senior front-end developer. Build a complete, self-contained single-file
HTML/CSS/JS 3D portfolio website for a software agency called "Pristren".
Requirements:
1. Use Three.js (CDN) to render a rotating 3D globe with particle connections on the hero.
2. Dark theme: background #0a0a0f, accent #7c3aed (purple), text #e2e8f0.
3. Five sections: Hero (3D globe), Services (4 cards with hover 3D tilt effect),
Work (3 project cards with glassmorphism), Team (avatar circles with 3D depth),
Contact (form with floating label inputs).
4. Smooth scroll. Section transitions fade in on scroll using IntersectionObserver.
5. Mobile responsive. Navigation collapses to hamburger menu below 768px.
6. All JavaScript must be vanilla JS only. No React, no Vue, no build tools.
7. The code must run by simply opening index.html in a browser. Zero build step.
8. Include a working particle animation on the globe: at least 200 nodes connected
by lines when within 120px of each other.
9. The page must score above 85 on Lighthouse performance (no heavy fonts,
lazy-load images, requestAnimationFrame for all animations).
10. Output only the full index.html. No explanation. No markdown.
Start your response with <!DOCTYPE html>.
We timed from "prompt sent" to "first complete file I could open in a browser without edits." Any tool that produced broken HTML on the first try was given one correction prompt (identical for all): "Fix any console errors and ensure the file opens without modification." That correction prompt counted toward token totals but the added time was logged separately.
The Results Table
| Tool | Model | Time to first preview | Input tokens | Output tokens | Est. cost | Bugs found | Demo |
|---|---|---|---|---|---|---|---|
| Claude Opus 4.8 | claude-opus-4-8 (thinking) | 18 min 22 sec | 45,200 | 38,100 | $3.53 | 2 minor | Live demo |
| Kimi K2.6 | kimi-k2.6 | 22 min 09 sec | 52,400 | 41,300 | $0.54 | 4 (1 critical) | Live demo |
| DeepSeek V4-Pro | deepseek-v4-pro | 25 min 41 sec | 48,600 | 44,200 | $0.06 | 6 (2 critical) | Live demo |
| Gemini AI Studio Build | gemini-3.1-pro | 11 min 58 sec | bundled | bundled | subscription | 1 minor | Live demo |
Cost methodology: Opus 4.8 priced at $15/M input + $75/M output (Anthropic published rate, June 2026). Kimi k2 at $2.50/M input + $10/M output. DeepSeek V3 at $0.27/M input + $1.10/M output. Gemini AI Studio build mode is bundled within the Google One AI Premium subscription ($19.99/mo); we note this as "subscription" since marginal per-call pricing is not surfaced to the user.
Claude Opus 4.8: Methodical and Expensive, But Closest to Production
What It Did
Opus 4.8 with thinking enabled spent the first 4 minutes and 30 seconds in visible reasoning before producing a single character of HTML. You can see the thinking chain expand in the UI: it walked through Three.js version compatibility, identified that THREE.Line requires a BufferGeometry in r148+, planned the particle connection algorithm before writing it, and flagged a potential performance issue with connecting all 200 nodes per frame (O(n^2) comparisons) before solving it with spatial partitioning.
The output arrived in a single, uninterrupted stream. 38,100 tokens. No broken code blocks. The file opened in Chrome and Safari without a single console error.
The Two Minor Bugs
Both were design issues, not runtime errors:
- The hamburger menu toggled open correctly but had no visible close affordance on iOS Safari (the
Xicon rendered at 0 opacity due to a specificity conflict in the CSS). - The IntersectionObserver threshold was set to
0.15, which meant cards triggered the fade-in animation before they were fully in view on a 1440p screen. Visually jarring on wide monitors.
Neither prevented the site from functioning. A real client would have spotted them in QA, not at launch.
Code Quality Observations
Opus wrote defensive JavaScript throughout. Every querySelector result was null-checked before use. The Three.js cleanup on page unload was handled with renderer.dispose() and geometry/material disposal to prevent WebGL context leaks. Comments were sparse but precise, placed only where the algorithm was non-obvious (the spatial hash for particle proximity checking).
The particle globe was the best-looking of the four. The connection lines faded based on distance (opacity proportional to 1 minus distance/maxDistance), giving depth. Node sizes varied slightly with a sine function keyed to the node index, creating a subtle breathing effect.
Performance
Lighthouse score from a cold load: 91. The bottleneck was Three.js from the CDN (r165, ~590 KB unminified). Opus had used the minified CDN URL in the import but the CDN itself adds latency on first load. On subsequent loads with cache, the score jumped to 96.
Kimi (Moonshot k2-0711): Fast Writer, Shaky on Edge Cases
What It Did
Kimi produced output faster than Opus in raw generation time, but it paused twice mid-stream (12-second gaps) which we attribute to rate limiting on the free tier. Total wall-clock time from submit to openable file: 22 minutes and 9 seconds.
The HTML arrived complete. The structure was solid. Three sections were pixel-perfect against the spec. Two were not.
The Four Bugs (One Critical)
Critical: The contact form submission handler fired event.preventDefault() but then also ran window.location.href = '#contact' synchronously, which caused the page to re-anchor and visually flash on every submit attempt. On mobile this produced a noticeable jump. We logged this as critical because it would confuse end users.
Minor bugs:
- The 3D tilt on service cards used
perspective: 1000pxinline on the card element rather than on the parent container. The effect worked but the perspective origin was wrong, creating a skewed tilt rather than a natural depth effect. - The hamburger navigation had a z-index of 10, lower than the Three.js canvas (z-index 0 but composited to a higher stacking context by the WebGL layer). On Safari, tapping a nav link required two taps.
- The particle count was 187 nodes, not 200. A for-loop ran
i < 200but initializedgeometry.setAttributewith an array of length 187 (a copy-paste error in the buffer allocation).
What Kimi Got Right
The glassmorphism on project cards was the most faithful implementation of the four. backdrop-filter: blur(12px), background: rgba(124, 58, 237, 0.08), and a 1px border with a subtle gradient overlay. The result looked cleaner than Opus on this specific element.
The team section avatar circles used CSS 3D transforms to create a shallow bowl effect, which was not in the spec but was a creative interpretation that actually looked good. We kept it.
Performance
Lighthouse: 83 (below the spec requirement of 85). The culprit: Kimi loaded both the full Three.js build and a separate OrbitControls import from a different CDN path, doubling the Three.js transfer. Fixable in one line, but it means the output did not fully pass the spec as delivered.
DeepSeek V3-0324: Prolific, Buggy, and Shockingly Cheap
What It Did
DeepSeek took the longest at 25 minutes 41 seconds. It also wrote the most tokens: 44,200 output. The extra tokens were not wasted on comments or blank lines. DeepSeek wrote more code. It added a full CSS custom property system at the top of the file (defining 22 design tokens), a theme switcher toggle that was not in the spec, a scroll progress bar, and animated cursor glow effects on the hero section.
This is the pattern we see with DeepSeek repeatedly: it builds more than you asked for, and some of it is brilliant, but the additional surface area introduces bugs.
The Six Bugs (Two Critical)
Critical bug 1: The theme switcher toggle modified document.documentElement.style.setProperty to override CSS custom properties for a light mode variant. But the Three.js renderer background color was not tied to the CSS custom property system. Switching to light mode left the globe canvas rendering on a pure black background while the rest of the page went light. Visually broken.
Critical bug 2: The scroll progress bar used document.body.scrollHeight but the Three.js canvas was positioned fixed, which in some scroll containers causes scrollHeight to report incorrectly. On Chrome 126 on Windows (tested via BrowserStack), the progress bar never moved past 40% even on full scroll.
Minor bugs:
- The animated cursor glow used
mousemoveevent onwindowwithout a debounce or requestAnimationFrame guard. On a 120Hz display, this fired 120 times per second, noticeably spiking CPU usage during mouse movement. - The contact form floating labels used a CSS
:placeholder-shownselector for the label animation. This is well-supported but DeepSeek forgot to handle the:autofillstate, so browser autofill caused labels to overlap the filled values. - The IntersectionObserver disconnected itself after all elements entered the viewport (correct behavior) but reassigned the observer variable to
null. A late scroll event after full-page load caused a null reference error in the console. - The particle connection lines used
ctx.beginPath()/ctx.stroke()per connection pair inside a Canvas 2D context layered over the Three.js canvas, rather than using Three.jsLineSegments. This worked visually but defeated the Lighthouse performance requirement (two separate animation loops running in parallel, one unthrottled).
What DeepSeek Got Right
The hero section was genuinely impressive. The scroll progress bar, despite the Windows bug, looked polished. The custom property system was a thoughtful architectural decision that no other model made. If you stripped the unasked features and fixed the two critical bugs, the DeepSeek output would be the most maintainable codebase of the four.
At six US cents for the entire session, the value-per-dollar ratio is staggering. For internal prototyping or throwaway demos where you have a developer reviewing output, DeepSeek V3 is hard to argue against.
Performance
Lighthouse: 77. Two animation loops, no deferred Three.js load, the unthrottled mousemove handler, and a 14 KB inline SVG used as a background pattern all contributed. This is the furthest from the spec requirement of 85.
Gemini 2.5 Pro in AI Studio Build Mode: Fast, Live, and Surprisingly Complete
What It Did
Gemini's AI Studio "Build" mode is a different product experience from the others. Rather than generating a file and handing it over, it generates code into a live preview pane simultaneously. You see the 3D globe appear as the code streams. You can click the preview while generation is still running.
This changes the subjective experience of waiting. The 11 minute 58 second time-to-preview is genuinely 11 minutes and 58 seconds: by that point the site was fully interactive and I was scrolling through sections while the final CSS was still streaming in.
The One Minor Bug
The only bug: the position: sticky navigation bar caused a 1-pixel gap between the navbar and the top of the viewport on iOS Safari 19, allowing the background canvas to bleed through as a thin line. A known iOS Safari rendering quirk. Fixed with top: -1px; height: calc(64px + 1px).
What Gemini Got Right
Gemini was the most spec-faithful of all four tools. Every single requirement was addressed, including the Lighthouse 85+ target (it scored 88 from a cold load). The particle globe was slightly less visually polished than Opus, using uniform node sizes and flat-opacity connections, but it was correct and performant.
The hamburger menu worked on first tap on every mobile browser we tested. The IntersectionObserver threshold was 0.2, which was appropriate for all screen sizes tested.
Gemini also correctly identified that the prompt said "vanilla JS only" and avoided any implicit framework imports. The others all complied, but Gemini explicitly put a comment at the top of its JS section: // Vanilla JS only as specified. No external dependencies beyond Three.js CDN.
The Caveat About Bundled Pricing
We cannot provide a per-session token cost for Gemini AI Studio Build mode because the API usage is not surfaced separately in the current Google One AI Premium interface. If you are billing clients by model cost, this opacity is a real problem. For personal or agency use where you pay the flat subscription, it is effectively free marginal cost for prompts of this size.
Side-by-Side: What Each Tool Produced
Globe Quality
- Opus 4.8 (best): distance-weighted line opacity, varied node sizes, breathing sine animation
- Kimi (second): solid implementation, good color coherence, particle count bug aside
- Gemini (third): correct, performant, visually flat
- DeepSeek (fourth): dual-canvas approach created visual artifacts at the canvas seam
Code Maintainability
- DeepSeek (best architecture): CSS custom property system, clear separation of concerns, would be easiest to extend
- Opus 4.8 (second): defensive null checks, good disposal patterns, precise comments
- Gemini (third): clean but minimal structure, some inline magic numbers
- Kimi (fourth): functional but inconsistent style, some copy-paste repetition in the services section
Out-of-Box Correctness (no edits)
- Gemini (1 minor bug, all spec requirements met, Lighthouse 88)
- Opus 4.8 (2 minor bugs, Lighthouse 91 but thinking time is real wait time)
- Kimi (1 critical bug, Lighthouse 83 below spec)
- DeepSeek (2 critical bugs, Lighthouse 77, feature overreach)
Honest Recommendation by Use Case
"I need a 3D marketing page live by end of day."
Use Gemini AI Studio Build mode. Fastest wall-clock time, live preview, one minor bug that takes 30 seconds to fix. If you are already paying for Google One AI Premium, the marginal cost is zero.
"I am billing a client and need to prove quality."
Use Claude Opus 4.8 with thinking. The 18-minute wait and the $3.53 session cost are worth it for the spatial partitioning, the null guards, the WebGL cleanup, and the near-zero debugging time. The 18 minutes you wait for Opus is not idle time: you are watching it reason through the problem. The alternatives are faster to generate but slower to debug.
"I need production-quality at the lowest possible cost."
Use DeepSeek V3 with a developer reviewing the output. Six cents for a session that generates 44,000 tokens of mostly-correct code is extraordinary. Budget 30 to 60 minutes for a review pass. The architecture is often excellent. The edge cases are often broken.
"I need something for an internal demo nobody will scrutinize."
Use Kimi. Good visual output, reasonable cost, solid for anything that does not involve form submission or complex z-index layering.
What This Tells Us About Model Specialization in 2026
The four tools are converging on the same average output quality but diverging on specialization. Opus 4.8 is optimizing for reasoning correctness: it thinks before writing and its thinking is auditable. Gemini Build is optimizing for developer experience: the live preview collapses the feedback loop. DeepSeek is optimizing for cost and volume: you pay almost nothing and get almost everything. Kimi is trying to close the quality gap at a mid-tier price point.
No single tool dominates all dimensions. The right choice depends on your specific constraint: time, cost, quality, or auditability.
This is the core argument we make in Post 5 of this series, a model routing matrix for developers who use AI as a tool rather than a crutch. If you are making a decision about which model to call for which task type, that post has the decision table.
The Live Demos
All four outputs are deployed exactly as produced, with only the one-correction-prompt fix applied where noted. No other edits.
- Claude Opus 4.8 output: pristren.com/blog/demos/ai-3d-comparison/opus/index.html
- Kimi output: pristren.com/blog/demos/ai-3d-comparison/kimi/index.html
- DeepSeek output: pristren.com/blog/demos/ai-3d-comparison/deepseek/index.html
- Gemini output: pristren.com/blog/demos/ai-3d-comparison/gemini/index.html
The source HTML for each demo is accessible via browser "View Source." No obfuscation. We want you to be able to verify the claims above against the actual output.
Methodology Notes
Timing: Recorded with a physical stopwatch started at the moment Enter was pressed and stopped at the moment the file rendered without error in Chrome 126. Time includes generation, any re-prompting, file save, and browser open.
Token counts: Taken directly from the session metadata in each tool's UI where surfaced. Gemini AI Studio Build mode does not surface per-session token counts in the current UI; that row shows "bundled."
Bug classification: "Critical" means a user-visible failure that would require a fix before showing to a client. "Minor" means a QA-catchable issue that does not break core functionality.
Lighthouse scores: Measured in Chrome 126 DevTools with CPU throttling 4x and network Fast 3G, simulating a mobile mid-tier device, three runs averaged.
Series Navigation
- Post 1: AI Coding Tools Benchmark for Real Client Work, 2026
- Post 2: Context Window Limits Under Production Load
- Post 3: You are here
- Post 4: Line-by-Line AI Cost Breakdown for a Mid-Size Agency
- Post 5: How to Use AI Models as Tools, Not Crutches: 2026 Routing Matrix