This document records the DeepSeek failures observed while generating long frontend implementation prompts in /zh/create/new, the current mitigations, and the recommended next architecture. It is a maintenance note for src/lib/generation/llm.ts, src/lib/generation/workflow.ts, src/app/api/generate/workflow/stream/route.ts, and src/components/project-workflow.tsx.
Current UI note: references below to the historical Code node now map to the 代码 / Code tab inside the Preview window. The SSE code step and stored code artifacts still exist, but the canvas no longer renders a separate Code card.
What Failed
The failures were not one single bug. They were a chain of model-routing, output-format, and preview-runtime issues.
Wrong provider fallback
When the UI selected deepseek-v4-pro, the server could still fall back to PackyAPI after an empty DeepSeek response if Packy credentials existed. That made the UI confusing: the selected model was DeepSeek, but the error said PackyAPI.
DeepSeek empty response in JSON mode
The workflow used response_format: { type: "json_object" } for intent and brief generation. With very long prompts, DeepSeek sometimes returned an empty message.content, especially when the prompt asked for large structured output.
Full prompt sent as one oversized HTML request
For prompts such as TOONHUB and Prisma, the user pasted a complete production prompt. The old direct path sent the full prompt to DeepSeek and asked for a complete single-file HTML document in one response. That increased the chance of empty responses, truncation, and low-quality output.
JSON brief parse failure
A later mitigation split the flow into a compressed brief plus CSS/body/script parts, but the compressed brief was still requested as JSON. DeepSeek sometimes returned prose or truncated text, so parseJsonObject failed with:
Model response was not valid JSON. The model response may be empty or truncated; retry or increase LLM_MAX_TOKENS.
Preview blank despite Code being generated
In one Prisma run, html_code existed, but the generated body was only:
<div id="root"></div>
The model interpreted the request as a React/Vite project scaffold instead of translating the component into iframe-ready HTML. Code appeared in the then-separate Code node, but Preview was blank because the iframe had no visible markup.
Current Handling
The current code intentionally still calls DeepSeek. Local deterministic fallback is not the normal path.
Provider selection is explicit
deepseek-* model names force the DeepSeek provider. Empty-response fallback to PackyAPI is no longer implicit. A fallback model only runs when LLM_EMPTY_RESPONSE_FALLBACK_MODEL is explicitly configured.
DeepSeek empty-response retries
chatCompletion retries DeepSeek JSON failures without response_format, and retries non-JSON empty responses with reduced max tokens.
Full implementation prompts skip intent clarification
looksLikeFullPrompt detects long production prompts and routes them directly into generation rather than spending another DeepSeek call on intent.
Direct prompt generation is split
Long full prompts now use a multi-step model path:
DeepSeek compresses the original prompt into a short plain-text implementation brief.
DeepSeek generates CSS only.
DeepSeek generates visible body HTML only.
DeepSeek generates small inline JavaScript only.
The server assembles those model-generated parts into a single preview HTML document.
The brief step no longer requires JSON
The direct full-prompt brief is now plain text, not JSON. This avoids breaking the whole run when DeepSeek returns useful prose that is not valid JSON.
Blank preview guard
The body step explicitly rejects React/Vite placeholders such as #root, createRoot, ReactDOM, and implementation commentary. The server also checks for root-only or near-empty preview HTML and asks DeepSeek to regenerate the body/script if needed.
2026-05-16 speed and quality optimization
Direct full-prompt generation now sends CSS and body generation in parallel after the plain-text brief. JavaScript generation is skipped by default for static prompts that do not mention interactions, carousel state, tabs, forms, scroll-linked behavior, or similar runtime needs. The assembled HTML head also preserves Google Font links found in the source prompt. The body request receives a mustPreserve payload with visible copy, media URLs, font URLs, and colors; if important quoted visible text is missing from the assembled preview, the body is repaired once before returning.
2026-05-16 React/Vite full-prompt path
DeepSeek full implementation prompts now default to a real React/Vite artifact path when WORKFLOW_REACT_VITE_ENABLED is not false. This path bypasses the plain-text brief and the CSS/body/script split. The server sends the original full prompt directly to the model and, by default, uses a component-sharded strategy: DeepSeek generates src/components/Hero.tsx, src/components/ContentPrimary.tsx, src/components/ContentSecondary.tsx, and src/index.css in smaller stage calls while the server writes the tiny src/App.tsx and src/components/ContentSections.tsx composition shells. The server then creates a temporary Vite project with fixed whitelisted dependencies, runs vite build, inlines the built CSS/JS back into html_code, and stores the source files in project_files.
Build failures get a repair pass. Vite syntax errors are now parsed for failing file paths, so repair rewrites only the broken target file(s) instead of asking DeepSeek to rewrite the whole project. Component output defaults to 6144 tokens with focused hero/primary-content/secondary-content payloads; repair defaults to 8192 tokens. This avoids both the older 3072-token truncation and the too-large all-content call that caused [react-vite:content-component] empty responses. If a component-stage model call still fails, the workflow can fall back to the src/App.tsx + src/index.css split strategy and then run targeted repair. If component-sharded repair still fails, the workflow can also fall back to the split strategy.
Preview QA is intentionally softer for DeepSeek. Missing visible text or media URLs are stored in qaReport as quality warnings instead of blocking a built preview, because the follow-up [react-vite:repair-preview-qa-failed] call can itself empty-response on long prompts. Only an empty preview remains a hard failure. DeepSeek QA repair is opt-in with WORKFLOW_REACT_VITE_DEEPSEEK_QA_REPAIR=true; all QA repair can be disabled with WORKFLOW_REACT_VITE_QA_REPAIR=false. This keeps repair output small enough for DeepSeek and avoids losing already-good generated files. If build repair still fails, the workflow returns a real error instead of falling back to a fake local page. Stage failures are labeled as [react-vite:hero-component], [react-vite:content-primary-component], [react-vite:content-secondary-component], [react-vite:index-css], [react-vite:repair-vite-build-failed-src-components-Hero-tsx], etc. Set WORKFLOW_REACT_VITE_STRATEGY=split or WORKFLOW_REACT_VITE_SINGLE_CALL=true only when explicitly testing the older larger-output modes.
2026-05-16 template-conversion bypass for DeepSeek
Website mode originally converted long full prompts into SiteGenerationTemplate through another JSON model call before code generation. That reintroduced the same DeepSeek empty-response failure: JSON mode was retried without response_format. DeepSeek and very long full prompts now bypass that model JSON conversion. The server extracts a lightweight planning template from the original prompt, keeps the original prompt as the authoritative Full Prompt body, and then sends that Full Prompt into the React/Vite code-generation model calls. Set WORKFLOW_FORCE_MODEL_TEMPLATE_CONVERSION=true only when explicitly testing model-based template conversion.
Why Quality Is Still Not Good Enough
The current mitigation improves reliability, but it is not the best product architecture.
The main quality issue is that the system is asking a text model to translate a detailed React/Tailwind/framer-motion spec into standalone HTML/CSS/JS without rendering feedback. The model can satisfy tokens and still miss visual fidelity, layout density, exact animation behavior, or component hierarchy.
Specific weaknesses:
No rendered self-check before returning Preview.
The legacy HTML fallback still has CSS/body/script drift, but DeepSeek full-prompt runs now avoid that path by default.
The React/Vite path keeps the full prompt whenever it is under the server safety limit; only over-limit prompts are head+tail truncated and marked in qaReport.promptTruncated. The model request no longer carries duplicated server-extracted must-preserve facts; those facts are used for QA instead.
There is no rubric-based evaluation against the original prompt.
The Agent does not repair layout issues from screenshots or DOM inspection.
The UI waits for the whole backend run before meaningful intermediate code/preview feedback.
Better Architecture
The better solution is not more prompt patching. It is a staged generation-and-QA pipeline.
Recommended v2 flow:
Plan step
Model returns a compact implementation plan, not code:
sections
assets
required interactions
animation requirements
responsive breakpoints
acceptance criteria
Keep this as JSON only if the schema is small and validated with Zod. Otherwise store it as plain text plus extracted fields.
Code step with a real target format
Pick one target per product mode:
For instant iframe previews: standalone HTML/CSS/JS.
For shippable app output: real React/Vite files in a workspace.
Mixing "React project prompt" with "standalone iframe preview" is the source of many quality problems.
Render step
Save the generated HTML to a temporary preview target and render it with Playwright or the Browser plugin.
QA step
Programmatically inspect:
iframe/body is non-empty
console has no blocking errors
key text exists
referenced media URLs are present
mobile and desktop screenshots are not blank
layout has no obvious overflow
Repair step
Feed a compact failure report back to the model:
missing content
console errors
screenshot or DOM notes
exact files/HTML excerpts to modify
Then regenerate only the broken part.
Versioned artifacts
Store model plan, generated code, QA result, repair result, token/time metadata, and final HTML in workflow_run_versions.
Faster Response Plan
Current direct full-prompt generation is slow because it can require four sequential DeepSeek calls: brief, CSS, body, script. Long Prisma runs have taken roughly 4-6 minutes.
Short-term improvements:
Start streaming node progress earlier, before all model calls finish.
Generate CSS and body in parallel after the plain-text brief.
Skip JS generation when the prompt does not require interaction beyond CSS animations.
Lower direct prompt source excerpts and token caps.
Add per-step timeout and retry only the failed step, not the full run.
Cache the plain-text brief for identical prompts.
Implemented on 2026-05-16:
Direct-path SSE now activates Full Prompt, Design System, and Preview Code-tab progress before model generation finishes, so the user sees immediate workflow progress.
CSS and body part generation run in parallel.
React/Vite component-sharded generation runs Hero, ContentSections, and CSS calls in parallel by default.
Static prompts skip the JS model call unless WORKFLOW_DIRECT_SKIP_SCRIPT_WHEN_STATIC=false.
A best-effort quality repair runs when required visible quoted text is missing, unless WORKFLOW_DIRECT_QUALITY_REPAIR=false.
React/Vite repair output is capped lower by default (WORKFLOW_REACT_VITE_REPAIR_MAX_TOKENS=4096) because repair now returns only changed files.
Medium-term improvements:
Use a faster model for brief/planning and keep DeepSeek V4 Pro for code only.
Let the user choose "Fast draft" vs "High fidelity".
For full React prompts, generate real React files and run a Vite build instead of forcing standalone HTML. This is now the default DeepSeek full-prompt path; the next step is rendered browser QA with screenshots and console inspection.
Add a background job model so the UI can return immediately and update progress asynchronously.
Recommended default:
Fast draft: one model call that returns standalone HTML, followed by automated blank-page QA.
High fidelity: plan -> code -> render -> QA -> repair, with visible progress and a longer expected runtime.
Operational Notes
Failed runs are stored in workflow_runs.error.
Draft output is stored in workflow_run_versions.html_code. React/Vite source artifacts are additionally stored in workflow_run_versions.project_files, with artifact_kind, build_log, and qa_report.
A run can have generated code in the Preview Code tab but a blank Preview iframe if html_code contains a shell without visible body markup.
Existing failed or blank runs do not repair automatically. The user must regenerate after the fix, or a future repair action must be added.
Keep LLM_EMPTY_RESPONSE_FALLBACK_MODEL blank unless an explicit fallback policy is desired.