Intent Layer for Auto Mode

The Intent Layer enables intelligent, automatic model selection and execution when users enable "auto mode" on the canvas. It analyzes user prompts, canvas context, and automatically selects and configures the optimal AI model for the task.

Overview

When auto mode is enabled, the system follows this flow:

  1. Intent Analysis - LLM analyzes user prompt and canvas context

  2. Model Search - Azure AI Search finds compatible models

  3. Model Selection - LLM selects optimal model and generates parameters

  4. Execution - Calls execute-node API with configured parameters

Architecture

Canvas (Auto Mode) 

/api/auto-mode

IntentOrchestrator
    ├── IntentAnalyzer (GPT-4.1 Nano)
    ├── ModelSearchService (Azure AI Search)
    └── execute-node API

Components

1. IntentAnalyzer (intent-analyzer.ts)

Uses Azure OpenAI GPT-4.1 Nano to:

  • Infer output type if not specified by user

  • Analyze required model capabilities

  • Generate semantic search queries

  • Select optimal model from search results

  • Generate API parameters for selected model

Key Methods:

  • analyzeIntent(context) - Step 1: Analyze user intent

  • selectModelAndGenerateCall(intent, models, context) - Step 2: Select model

2. ModelSearchService (model-search-service.ts)

Integrates with Azure AI Search to:

  • Search models by semantic queries and filters

  • Apply user tier constraints (free/pro/enterprise)

  • Filter by capabilities, performance, cost

  • Provide fallback models when search fails

Key Methods:

  • searchForIntent(intentAnalysis, userTier) - Search optimized for intent

  • searchModels(query, filters, options) - Generic search interface

3. IntentOrchestrator (intent-orchestrator.ts)

Main coordinator that:

  • Validates requests

  • Orchestrates the full pipeline

  • Handles errors and fallbacks

  • Calls execute-node API

  • Tracks performance metrics

Key Methods:

  • processAutoModeRequest(request) - Main entry point

  • validateRequest(request) - Input validation

API Usage

Auto Mode Request

POST /api/auto-mode
{
  "prompt": "Create a professional headshot photo",
  "selectedType": "image", // Optional - LLM will infer if not provided
  "inputNodes": [
    {
      "id": "input-1",
      "type": "image",
      "url": "https://example.com/photo.jpg",
      "metadata": { "width": 1024, "height": 1024 }
    }
  ],
  "canvasInfo": {
    "userId": "user-123",
    "projectId": "project-456",
    "nodeId": "node-789",
    "userTier": "pro"
  }
}

Auto Mode Response

{
  "success": true,
  "data": {
    "intentAnalysis": {
      "outputType": "image",
      "confidence": 0.95,
      "reasoning": "User wants to create a professional headshot...",
      "searchQuery": "professional headshot portrait high quality",
      "requiredCapabilities": {
        "inputTypes": ["text"],
        "outputType": "image"
      }
    },
    "availableModels": [
      {
        "id": "fal-ai_flux-pro-v1-1-ultra",
        "display_name": "FLUX Pro Ultra",
        "quality_score": 9.5,
        "speed_tier": "medium"
      }
    ],
    "modelSelection": {
      "selectedModel": {
        "id": "fal-ai_flux-pro-v1-1-ultra",
        "displayName": "FLUX Pro Ultra",
        "qualityScore": 9.5
      },
      "apiParameters": {
        "prompt": "professional headshot of a business person in a modern office",
        "guidance_scale": 7.5,
        "num_inference_steps": 50
      },
      "selectionReasoning": "Selected for high quality portrait generation",
      "confidence": 0.9
    },
    "executionResult": {
      "success": true,
      "data": {
        "imageUrl": "https://generated-image-url.jpg"
      }
    }
  }
}

Configuration

Environment Variables

# Azure OpenAI (for intent analysis)
AZURE_OPENAI_4_1_KEY=your-azure-openai-key
AZURE_OPENAI_ENDPOINT=https://your-resource.cognitiveservices.azure.com

# Azure AI Search (for model search)
AZURE_SEARCH_SERVICE=eidos-mvp
AZURE_SEARCH_API_KEY=your-search-key

API Keys Configuration

The system uses GPT-4.1 Nano from config/api-keys.ts:

AZURE_OPENAI: {
  API_KEY: process.env.AZURE_OPENAI_4_1_KEY,
  ENDPOINT: "https://joyce-resource.cognitiveservices.azure.com",
  DEPLOYMENT_NAME: "gpt-4.1-nano",
  API_VERSION: "2025-01-01-preview",
  MODEL: "gpt-4.1-nano"
}

Usage Scenarios

1. Text-to-Image (No Context)

{
  prompt: "Create a sunset landscape",
  selectedType: "image",
  inputNodes: []
}
// → Selects FLUX or Imagen for high-quality landscape

2. Image-to-Image (Style Transfer)

{
  prompt: "Make this look like a Van Gogh painting",
  inputNodes: [{ type: "image", url: "photo.jpg" }]
}
// → Selects FLUX I2I or style transfer model

3. Multi-Image to Video

{
  prompt: "Create smooth transition between these images",
  selectedType: "video",
  inputNodes: [
    { type: "image", url: "img1.jpg" },
    { type: "image", url: "img2.jpg" }
  ]
}
// → Selects Veo2 or Kling for image-to-video

4. Auto Type Inference

{
  prompt: "Turn this photo into a 3D model",
  inputNodes: [{ type: "image", url: "object.jpg" }]
}
// → LLM infers outputType: "3d", selects Trellis or Hunyuan3D

5. Budget-Conscious Selection

{
  prompt: "Quick logo design",
  canvasInfo: { userTier: "free" }
}
// → Selects economy tier models with good speed

User Tier Constraints

Free Tier

  • Economy cost tier models only

  • Max 30s latency

  • Basic quality threshold (≥6.0)

Pro Tier

  • Standard cost tier models

  • Quality threshold ≥7.0

  • Balanced speed/quality

Enterprise Tier

  • All cost tiers available

  • Premium quality threshold (≥8.0)

  • No latency restrictions

Testing

Run the comprehensive test suite:

npm run test:intent-layer
# or
node lib/intent/test-intent-layer.js

Test scenarios include:

  • Text-to-image generation

  • Image style transfer

  • Multi-modal inputs

  • Type inference

  • User tier constraints

  • Performance preferences

Error Handling

The system includes robust error handling:

  1. LLM Failures - Fallback to rule-based analysis

  2. Search Failures - Fallback to basic model search

  3. No Models Found - Try broader search criteria

  4. Execution Failures - Return partial results with error info

Performance

Typical processing times:

  • Intent Analysis: 500-1500ms

  • Model Search: 100-300ms

  • Model Selection: 500-1200ms

  • Total: ~1-3 seconds (excluding model execution)

Monitoring

Each request includes detailed metadata:

{
  metadata: {
    processingTimeMs: 2150,
    steps: [
      { step: "intent_analysis", duration: 890, success: true },
      { step: "model_search", duration: 120, success: true },
      { step: "model_selection", duration: 780, success: true },
      { step: "model_execution", duration: 15400, success: true }
    ]
  }
}

Future Enhancements

  1. User Feedback Loop - Learn from user selections

  2. Caching - Cache intent analysis for similar prompts

  3. A/B Testing - Test different selection strategies

  4. Advanced Context - Consider project history, user preferences

  5. Multi-Step Workflows - Chain multiple models automatically


The Intent Layer is now ready for production auto mode!

Last updated