Three months ago, I decided to build my daughter a simple reminders app: something personal to help her manage basic reminders at school and home for a medical condition. I thought, “Perfect! I’ll use AI tools and have this done in a weekend.” I started with Cursor. Then tried Copilot. Then bounced between half a dozen other tools, each promising to make development effortless.

Two weeks later, I had four abandoned repositories, inconsistent UI patterns, half-implemented features, and an app that still wasn’t ready. Every tool delivered working code quickly, but none maintained the vision I had in my head. Each iteration felt like starting over because I hadn’t given the AI the guardrails it needed to stay on track.

That failure taught me something: AI agents are incredible at execution but terrible at maintaining your vision without structure. Then I discovered agent workflows with JetBrains Junie. Instead of jumping straight into code, I started with context documents: design principles and style guides that would constrain what the agent could do. I broke features into small, independently shippable tasks. I added quality gates at every step.

The reminders app started working. Not perfectly at first, but consistently. Features stayed coherent. The UI remained predictable. Most importantly, I could pick up where I left off without having to decipher what past-me was thinking.

That experience sent me down a rabbit hole. I studied Harper Reed’s collaborative planning approach, which taught me to slow down and ask better questions before any code gets written. Patrick Ellis’s structured development methods showed me how to break work into right-sized chunks that ship independently. His agent workflow patterns revealed exactly where to add quality gates that catch issues early.

What emerged is the workflow I’m sharing here: refined through building features, client projects, and yes, eventually finishing my daughter’s app. It works consistently across both greenfield projects and legacy codebases because it’s not about the tools; it’s about the structure you bring to the conversation.

This isn’t revolutionary; it’s just combining proven techniques that work for both greenfield projects and legacy codebases.

The key insights I’ve borrowed and adapted:

Setting clear guardrails upfront: Context docs that constrain scope and style before any code gets written
Breaking work into right-sized chunks: Features become manageable, independently shippable tasks
Building quality checks directly into the process: Automated compliance plus visual verification catch issues early

Why This Approach Works

Before diving into the details, here’s why this workflow consistently delivers better results than ad-hoc AI development:

Consistency: Same structured approach works for greenfield projects and legacy codebases
Tool-Agnostic: Works with Cursor, Claude Code, Copilot, or any AI coding assistant. The structured approach matters more than the tool
Quality Guardrails: Context docs constrain scope and style before any code gets written
Traceability: Specifications and task lists are organized by feature in /features/, while implementation code lives in its natural location within your project structure
Built-in Quality Control: Automated compliance checks plus visual verification catch issues early
Human Oversight: Final judgment call remains with you, not the AI

Workflow at a Glance

🏗️ One-Time Setup (~30 minutes total)

Set the guardrails (15-30 mins): Assemble shared context docs so the agent understands your style, architecture, and constraints

🔄 Per-Feature Workflow (~30min-2hrs per feature)

Draft features collaboratively (30-60 mins): Iterate on research, specs, and a right-sized build plan before touching code
Generate lightweight artifacts (10-15 mins): Create spec.md and tasks.md that capture intent while keeping implementation changes in their natural repo locations
Execute with oversight (1-2 hours): Let the agent work through tasks while enforcing compliance checks and visual verification
Close the loop (15-30 mins): Address findings with human judgment, document outcomes, and perform final review before merge

Sound like a lot? The upfront investment pays dividends, especially when you’re building multiple features or working with a team.

The Detailed Workflow

Now that you understand the benefits, let’s walk through each phase.

🏗️ One-Time Setup

1. Deep Research & Context Setup (15-30 minutes)

Start by creating your project’s guardrails: two key documents that will guide every AI interaction going forward. Think of this as laying down the non-negotiables that prevent scope creep and maintain consistency.

Create these files in /context/:

design-principles.md: architectural/process constraints
style-guide.md: tone, formatting, naming, visuals

Why this matters: Without these guardrails, you’ll end up re-explaining your preferences to the AI every single time. Do it once, reference it forever.

The Two-Step Research Process

Step 1: Gather Deep Research (10-15 minutes active time + 5-30 minutes AI research time)

Use a tool with deep research capabilities to gather comprehensive, well-sourced information. You’ll spend 10-15 minutes crafting your research prompt and starting the research, then the AI agent works autonomously for 5-30 minutes (depending on the tool) to browse sources and synthesize findings into cited reports. Choose one:

ChatGPT Deep Research: OpenAI’s research agent that browses hundreds of sources and creates analyst-level reports with citations (ChatGPT Pro, 100 queries/month)
Gemini Deep Research: Google’s AI research assistant powered by Gemini 2.5 that creates multi-page reports with 1 million token context window (Gemini Advanced subscribers)
Perplexity Deep Research: AI search engine that performs dozens of searches and reads hundreds of sources, completing most research in under 3 minutes (free with limited queries, unlimited with Pro)

Choose based on speed (Perplexity for 3-minute results), depth (Gemini’s 1M token context for complex research), or integration (ChatGPT Pro if already subscribed).

Design principles research prompt:

Research the design principles that should guide (e.g., “React component architecture” or “API design for SaaS platforms”). Scan reputable sources like framework docs, architecture guides, and accessibility specs. I need comprehensive findings on:

Common constraints and architectural patterns

Best practices with rationale

Do’s and don’ts with examples

Accessibility and inclusive design considerations

Provide detailed findings with source citations.

Style guide research prompt:

Research how to communicate about (e.g., “developer documentation” or “fintech product copy”). Review trustworthy sources including official docs, product copy examples, and inclusive language guides. I need comprehensive findings on:

Voice and tone standards for this domain

Vocabulary conventions and terminology

Formatting and structure patterns

Accessibility and inclusive language requirements

Provide detailed findings with source citations.

Step 2: Refine Into Context Documents (5-10 minutes)

Take the research results and use your AI coding assistant (Claude Code, Cursor, etc.) to distill them into actionable context documents.

Design principles refinement prompt:

Using the research below, create a concise design-principles.md file with:

# Design Principles for <TOPIC> heading

A short purpose statement (2-3 sentences)

3–7 principles, each with Name, Why it matters, and How to enforce sub-bullets

At least one principle focusing on accessibility or inclusive design

A ‘Reference Links’ section with the URLs from the research

Keep it concise (under 400 words), actionable, and avoid duplicating overlapping items.

[Paste your research results here]

Style guide refinement prompt:

Using the research below, create a concise style-guide.md file with:

# Style Guide for <TOPIC> heading

A short voice-and-tone overview plus accessibility commitment statement

Sections for Voice, Tone, Vocabulary, Formatting, and Accessibility & Inclusive Language (each with concise bullet guidance and examples)

A checklist of 5–7 do/don’t items showing compliant vs. non-compliant wording

A ‘Reference Links’ section citing every source

Keep the whole document under 450 words and make accessibility guidance explicit, not a footnote.

[Paste your research results here]

🔄 Per-Feature Workflow

2. Feature Planning (30-60 minutes)

Now comes the collaborative part. Instead of jumping straight into code, you’ll work with the AI to thoroughly understand what you’re building. This is where Harper Reed’s LLM codegen workflow becomes invaluable because the “ask me one question at a time” approach prevents miscommunication and ensures you’re both aligned before any code gets written.

The goal: Get from vague idea to crystal-clear specification that any developer (human or AI) could implement confidently. This phase produces one spec.md and one tasks.md per feature (not per project), organized in /features/<feature-name>/.

Your folder structure for each feature:

/context/
  design-principles.md
  style-guide.md

/features/
  <feature-name>/
    spec.md
    tasks.md

Note: Only planning artifacts live under /features/<feature-name>/. Implementation code stays in its natural location within your project structure.

Harper-style prompts I use

Idea validation prompt (one-question loop)

Ask me one question at a time so we can develop a thorough, step-by-step spec for this idea. Each question should build on my previous answers. End when you can compile a developer-ready specification. Only one question at a time.

Spec compilation prompt

Here’s the idea: . Now compile our findings into a comprehensive, developer-ready specification: requirements, architecture, data handling, error paths, and a pragmatic verification plan.

Blueprint → tasks prompt

Draft a step-by-step build blueprint. Break it into small, iterative chunks that integrate continuously. Refine until the steps are right-sized (no big jumps). Produce a sequence of prompts a code-gen LLM can follow, ending with wiring everything together.

3. Iterative Review with the Agent (5-15 minutes)

Don’t accept the first draft. Have the AI challenge its own work; this catches edge cases, over-engineering, and unclear requirements before they become problems in your codebase.

Iterative review prompt

Review this spec for missing edge cases, risky assumptions, and over-engineered sections. Challenge anything unclear, then suggest the smallest set of revisions that would make it production-ready.

4. Save the Plan (2-3 minutes)

Once you’re happy with the spec, save it as spec.md under your feature folder. This becomes your single source of truth.

Save the plan prompt

Now that we’ve wrapped up the brainstorming process, can you compile our findings into a comprehensive, developer-ready specification? Include all relevant requirements, architecture choices, data handling details, error handling strategies, and a testing plan so I can drop the result straight into spec.md.

Example spec.md output:

# Feature: Dark Mode Toggle

## Requirements
- Add a toggle button in the site header to switch between light and dark themes
- Persist user preference in localStorage
- Apply theme immediately without page reload
- Support system preference detection on first visit

## Architecture
- Use CSS custom properties for theme colors
- Create a React component (ThemeToggle.tsx) for the toggle UI
- Implement useTheme hook for theme state management
- Add data-theme attribute to document root

## Implementation Details
- Theme colors: Define --bg-primary, --text-primary, --accent variables
- Storage key: 'user-theme-preference'
- Default behavior: Respect prefers-color-scheme media query
- Transition: Apply smooth 200ms transitions on theme changes

## Error Handling
- Fallback to light theme if localStorage is unavailable
- Handle invalid stored values gracefully

## Verification
- Test toggle functionality in header
- Verify persistence across page reloads
- Check system preference detection
- Validate smooth transitions between themes

Note: This example is intentionally simplified for clarity. Complex features may require additional sections like API contracts, database schema changes, migration strategies, or detailed edge case handling. Adjust the level of detail based on your feature’s complexity.

5. Task List (10-15 minutes)

Break your spec into bite-sized, actionable tasks. The key is making each task independently shippable. No “part 1 of 3” nonsense that leaves your codebase in a broken state.

What to prune:

Merge trivial steps that don’t justify separate tasks
Drop low-value work (e.g., comprehensive tests for a simple blog post)
Ensure each task delivers working functionality

Create task list prompt

Convert the spec into a Markdown tasks.md checklist I can tick off as I work; keep each task independently shippable and note any waiting-on dependencies.

Example tasks.md output:

# Tasks: Dark Mode Toggle

## Setup
- [ ] Define CSS custom properties for light and dark themes in global stylesheet
- [ ] Create ThemeToggle component file structure

## Core Implementation
- [ ] Implement useTheme hook with localStorage persistence
- [ ] Add theme toggle button to site header with sun/moon icons
- [ ] Apply data-theme attribute to document root on theme change
- [ ] Add system preference detection on first visit

## Polish & Verification
- [ ] Add smooth 200ms transitions for theme changes
- [ ] Test theme persistence across page reloads
- [ ] Verify system preference detection works correctly
- [ ] Test toggle accessibility (keyboard navigation, screen readers)

Note: This example is intentionally simplified for clarity. Complex features may require more granular tasks, dependency tracking between tasks, or specific testing checkpoints. Adjust the number and specificity of tasks based on your feature’s complexity and your team’s workflow. Both examples (spec.md and tasks.md) use the same dark mode toggle feature to demonstrate how specs translate into actionable tasks.

6. Execute (1-2 hours, depending on scope)

Now the fun part: let the AI do the heavy lifting while you focus on oversight. The agent works through your task list, marking progress as it goes. Implementation changes go where they belong in your repo structure, not in some artificial “feature folder.”

What to watch for: The AI should reference your context docs frequently and ask clarifying questions when it hits ambiguity.

What Your Project Looks Like After Several Features

Here’s what a mature project structure looks like after implementing multiple features:

my-project/
├── context/
│   ├── design-principles.md         # Architectural constraints & guidelines
│   └── style-guide.md               # Voice, tone, accessibility standards
│
├── features/                        # Planning artifacts only
│   ├── user-authentication/
│   │   ├── spec.md                  # Requirements & architecture decisions
│   │   └── tasks.md                 # ✅ All tasks completed
│   ├── product-catalog/
│   │   ├── spec.md                  # Search, filtering, pagination spec
│   │   └── tasks.md                 # ✅ All tasks completed
│   ├── shopping-cart/
│   │   ├── spec.md                  # State management & persistence
│   │   └── tasks.md                 # 🔄 In progress (3/5 tasks done)
│   └── dark-mode-toggle/
│       ├── spec.md                  # Theme switching & preferences
│       └── tasks.md                 # ✅ All tasks completed
│
├── src/                            # Actual implementation
│   ├── components/
│   │   ├── auth/
│   │   │   ├── LoginForm.tsx       # From user-authentication feature
│   │   │   └── SignUpForm.tsx      # From user-authentication feature
│   │   ├── catalog/
│   │   │   ├── ProductCard.tsx     # From product-catalog feature
│   │   │   ├── SearchBar.tsx       # From product-catalog feature
│   │   │   └── FilterPanel.tsx     # From product-catalog feature
│   │   ├── cart/
│   │   │   ├── CartSidebar.tsx     # From shopping-cart feature
│   │   │   └── CartItem.tsx        # From shopping-cart feature
│   │   └── ui/
│   │       ├── ThemeToggle.tsx     # From dark-mode-toggle feature
│   │       └── Button.tsx          # Shared component
│   ├── hooks/
│   │   ├── useAuth.ts              # From user-authentication feature
│   │   ├── useCart.ts              # From shopping-cart feature
│   │   └── useTheme.ts             # From dark-mode-toggle feature
│   ├── pages/
│   │   ├── auth.tsx                # From user-authentication feature
│   │   ├── catalog.tsx             # From product-catalog feature
│   │   └── checkout.tsx            # From shopping-cart feature
│   └── styles/
│       └── themes.css              # From dark-mode-toggle feature
│
├── tests/
│   ├── auth.test.ts                # From user-authentication feature
│   ├── catalog.test.ts             # From product-catalog feature
│   └── theme.test.ts               # From dark-mode-toggle feature
│
└── docs/
    ├── api-integration.md          # From user-authentication feature
    └── accessibility-report.md     # From dark-mode-toggle feature

Key observations:

Context docs stay the guardrails so every feature starts from the same playbook.
Implementation lands where it belongs: code and tests sit next to the modules they change.
Planning artifacts travel with the feature under /features/<feature>/, giving handoffs durable docs without cluttering the repo.
Progress is legible at a glance thanks to the ✅/🔄 markers in each feature’s tasks.md.

Execute tasks prompt

For the current feature, load /features/<feature-name>/spec.md, /features/<feature-name>/tasks.md, and the context docs (/context/design-principles.md, /context/style-guide.md). Start with the first unchecked task in tasks.md, execute it end-to-end (apply code changes, run the relevant checks), and tick it off. Loop until every remaining item is done, pausing only if something is ambiguous.

7. Compliance & Visual Verification (15-30 minutes)

Here’s where most AI development workflows fall short because they skip quality control. Don’t make that mistake. Run systematic checks against your guardrails and verify the visual implementation matches your expectations.

Automated compliance checks (AI-assisted):

Have your AI agent review code changes against /context/design-principles.md
Check style guide adherence against /context/style-guide.md
The agent analyzes diffs and reports violations with severity levels

Visual verification (requires running your app):

Install and wire up Playwright MCP (a Model Context Protocol server that bridges your agent to Playwright) so the agent can drive the browser, grab screenshots, and inspect console/network logs autonomously
This approach, popularized by Patrick Ellis, allows agents to verify responsive design, accessibility, and user experience independently
The agent can validate visual changes, catch layout issues, and verify interactive features without manual intervention

Compliance & visual verification prompt

Review the latest diffs and explain how they align with /context/design-principles.md and /context/style-guide.md. For each guideline that is upheld or broken, note it explicitly. List any recommendations grouped by severity (High, Medium, Low, Nitpick) and propose the smallest fix for each.

8. Review & Iterate (15-30 minutes)

Based on your compliance findings, make targeted fixes. Focus on high and medium priority issues first. Don’t get lost in nitpicks unless you have time to spare.

Review & iterate prompt

Here are the compliance findings (High, Medium, Low, Nitpick). Fix everything High and Medium in one pass, addressing the most impactful Low items if they’re quick wins. After applying changes, summarize what was fixed and note which lower-severity items remain (with reasons) before handing the branch back.

9. Human in the Loop (15-30 minutes)

This is your moment. No AI can replace human judgment for user experience, accessibility nuances, and business logic validation. Take a final pass through the implementation, then ship with confidence.

Human handoff prompt

Prepare a concise handoff for the human reviewer: summarize the code changes, cite how they comply with /context/design-principles.md and /context/style-guide.md, list the manual verification steps already run (screenshots, accessibility checks, etc.), and flag any remaining risks or follow-ups. End with a checklist the human can work through before committing.

Common Pitfalls (And How to Avoid Them)

🚫 “The AI keeps going off-track”

Solution: Your context docs aren’t specific enough. Spend more time in step 1 defining constraints and examples. I learned this the hard way when my agent kept choosing different UI libraries for each component. Turns out “use modern React patterns” isn’t specific enough.

🚫 “Tasks are too big and break things”

Solution: Break them down further. Each task should be shippable independently, even if it’s just a small improvement.

🚫 “The specs keep changing mid-implementation”

Solution: Spend more time in the planning phase (steps 2-3). Better to iterate on the spec than on the code.

🚫 “Quality is inconsistent”

Solution: Don’t skip the compliance checks (step 7). Automate what you can, but always do the human review (step 9).

🚫 “This feels like too much overhead”

Solution: Start with a simple feature to build the habit. The workflow scales down for small changes and up for complex features.

Getting Started: Your First Feature

Ready to try this workflow? Here’s how to ease into it:

🚀 1. Start Small (This week)

Pick a simple feature you’ve been putting off (maybe a UI component or a small utility function). Use this as your training ground.

📋 2. Set Up Your Context (30 minutes)

Follow the Two-Step Research Process from the One-Time Setup section: use a deep research tool (ChatGPT Deep Research, Gemini, or Perplexity) to gather comprehensive findings, then refine those results into /context/design-principles.md and /context/style-guide.md with your AI coding assistant. Even basic versions will help.

💬 3. Practice the Planning Loop (Next feature)

Use Harper’s “one question at a time” approach for your next feature planning session. You’ll be surprised how much clarity it brings. Patrick Ellis’s walkthroughs on agent workflow governance are perfect companions here; borrow his checklists to keep your loop grounded in reality.

🔄 4. Build the Habit (Ongoing)

After 3-4 features using this workflow, you’ll start to see the patterns. The upfront investment becomes automatic, and the quality improvements compound.

Ready to try it? Pick a real feature and see how this workflow fits your codebase. You’ll gain clearer specifications, better documentation, and more predictable outcomes, even if you only adopt parts of the workflow.

Key Takeaways

Structure beats tooling: This workflow succeeds with any AI assistant. Consistent process matters more than which tool you choose
Invest upfront in context docs: Design principles and style guides prevent scope creep and maintain consistency across all AI interactions
Plan collaboratively before coding: Harper Reed’s “one question at a time” approach catches edge cases and misalignments early
Break work into right-sized chunks: Each task should be independently shippable, not “part 1 of 3” that leaves code broken
Build quality checks into the process: Automated compliance reviews plus visual verification catch issues before they ship
Keep human judgment in the loop: AI handles heavy lifting, but final calls on UX, accessibility, and business logic stay with you
Preserve planning, not implementation: Store specs and tasks in /features/, but let code live in its natural project structure
Start small and build the habit: Try one feature to learn the rhythm; the upfront investment compounds quickly

Building with AI Agents: My Current Workflow