Intent Layer for Auto Mode

The Intent Layer enables intelligent, automatic model selection and execution when users enable "auto mode" on the canvas. It analyzes user prompts, canvas context, and automatically selects and configures the optimal AI model for the task.

Overview

When auto mode is enabled, the system follows this flow:

Intent Analysis - LLM analyzes user prompt and canvas context
Model Search - Azure AI Search finds compatible models
Model Selection - LLM selects optimal model and generates parameters
Execution - Calls execute-node API with configured parameters

Architecture

Canvas (Auto Mode) 
    ↓
/api/auto-mode
    ↓
IntentOrchestrator
    ├── IntentAnalyzer (GPT-4.1 Nano)
    ├── ModelSearchService (Azure AI Search)
    └── execute-node API

Components

1. IntentAnalyzer (`intent-analyzer.ts`)

Uses Azure OpenAI GPT-4.1 Nano to:

Infer output type if not specified by user
Analyze required model capabilities
Generate semantic search queries
Select optimal model from search results
Generate API parameters for selected model

Key Methods:

analyzeIntent(context) - Step 1: Analyze user intent
selectModelAndGenerateCall(intent, models, context) - Step 2: Select model

2. ModelSearchService (`model-search-service.ts`)

Integrates with Azure AI Search to:

Search models by semantic queries and filters
Apply user tier constraints (free/pro/enterprise)
Filter by capabilities, performance, cost
Provide fallback models when search fails

Key Methods:

searchForIntent(intentAnalysis, userTier) - Search optimized for intent
searchModels(query, filters, options) - Generic search interface

3. IntentOrchestrator (`intent-orchestrator.ts`)

Main coordinator that:

Validates requests
Orchestrates the full pipeline
Handles errors and fallbacks
Calls execute-node API
Tracks performance metrics

Key Methods:

processAutoModeRequest(request) - Main entry point
validateRequest(request) - Input validation

API Usage

Auto Mode Request

POST /api/auto-mode
{
  "prompt": "Create a professional headshot photo",
  "selectedType": "image", // Optional - LLM will infer if not provided
  "inputNodes": [
    {
      "id": "input-1",
      "type": "image",
      "url": "https://example.com/photo.jpg",
      "metadata": { "width": 1024, "height": 1024 }
    }
  ],
  "canvasInfo": {
    "userId": "user-123",
    "projectId": "project-456",
    "nodeId": "node-789",
    "userTier": "pro"
  }
}

Auto Mode Response

{
  "success": true,
  "data": {
    "intentAnalysis": {
      "outputType": "image",
      "confidence": 0.95,
      "reasoning": "User wants to create a professional headshot...",
      "searchQuery": "professional headshot portrait high quality",
      "requiredCapabilities": {
        "inputTypes": ["text"],
        "outputType": "image"
      }
    },
    "availableModels": [
      {
        "id": "fal-ai_flux-pro-v1-1-ultra",
        "display_name": "FLUX Pro Ultra",
        "quality_score": 9.5,
        "speed_tier": "medium"
      }
    ],
    "modelSelection": {
      "selectedModel": {
        "id": "fal-ai_flux-pro-v1-1-ultra",
        "displayName": "FLUX Pro Ultra",
        "qualityScore": 9.5
      },
      "apiParameters": {
        "prompt": "professional headshot of a business person in a modern office",
        "guidance_scale": 7.5,
        "num_inference_steps": 50
      },
      "selectionReasoning": "Selected for high quality portrait generation",
      "confidence": 0.9
    },
    "executionResult": {
      "success": true,
      "data": {
        "imageUrl": "https://generated-image-url.jpg"
      }
    }
  }
}

Configuration

Environment Variables

# Azure OpenAI (for intent analysis)
AZURE_OPENAI_4_1_KEY=your-azure-openai-key
AZURE_OPENAI_ENDPOINT=https://your-resource.cognitiveservices.azure.com

# Azure AI Search (for model search)
AZURE_SEARCH_SERVICE=eidos-mvp
AZURE_SEARCH_API_KEY=your-search-key

API Keys Configuration

The system uses GPT-4.1 Nano from config/api-keys.ts:

AZURE_OPENAI: {
  API_KEY: process.env.AZURE_OPENAI_4_1_KEY,
  ENDPOINT: "https://joyce-resource.cognitiveservices.azure.com",
  DEPLOYMENT_NAME: "gpt-4.1-nano",
  API_VERSION: "2025-01-01-preview",
  MODEL: "gpt-4.1-nano"
}

Usage Scenarios

1. Text-to-Image (No Context)

{
  prompt: "Create a sunset landscape",
  selectedType: "image",
  inputNodes: []
}
// → Selects FLUX or Imagen for high-quality landscape

2. Image-to-Image (Style Transfer)

{
  prompt: "Make this look like a Van Gogh painting",
  inputNodes: [{ type: "image", url: "photo.jpg" }]
}
// → Selects FLUX I2I or style transfer model

3. Multi-Image to Video

{
  prompt: "Create smooth transition between these images",
  selectedType: "video",
  inputNodes: [
    { type: "image", url: "img1.jpg" },
    { type: "image", url: "img2.jpg" }
  ]
}
// → Selects Veo2 or Kling for image-to-video

4. Auto Type Inference

{
  prompt: "Turn this photo into a 3D model",
  inputNodes: [{ type: "image", url: "object.jpg" }]
}
// → LLM infers outputType: "3d", selects Trellis or Hunyuan3D

5. Budget-Conscious Selection

{
  prompt: "Quick logo design",
  canvasInfo: { userTier: "free" }
}
// → Selects economy tier models with good speed

User Tier Constraints

Free Tier

Economy cost tier models only
Max 30s latency
Basic quality threshold (≥6.0)

Pro Tier

Standard cost tier models
Quality threshold ≥7.0
Balanced speed/quality

Enterprise Tier

All cost tiers available
Premium quality threshold (≥8.0)
No latency restrictions

Testing

Run the comprehensive test suite:

npm run test:intent-layer
# or
node lib/intent/test-intent-layer.js

Test scenarios include:

Text-to-image generation
Image style transfer
Multi-modal inputs
Type inference
User tier constraints
Performance preferences

Error Handling

The system includes robust error handling:

LLM Failures - Fallback to rule-based analysis
Search Failures - Fallback to basic model search
No Models Found - Try broader search criteria
Execution Failures - Return partial results with error info

Performance

Typical processing times:

Intent Analysis: 500-1500ms
Model Search: 100-300ms
Model Selection: 500-1200ms
Total: ~1-3 seconds (excluding model execution)

Monitoring

Each request includes detailed metadata:

{
  metadata: {
    processingTimeMs: 2150,
    steps: [
      { step: "intent_analysis", duration: 890, success: true },
      { step: "model_search", duration: 120, success: true },
      { step: "model_selection", duration: 780, success: true },
      { step: "model_execution", duration: 15400, success: true }
    ]
  }
}

Future Enhancements

User Feedback Loop - Learn from user selections
Caching - Cache intent analysis for similar prompts
A/B Testing - Test different selection strategies
Advanced Context - Consider project history, user preferences
Multi-Step Workflows - Chain multiple models automatically

✅ The Intent Layer is now ready for production auto mode!

Previouslib

Last updated 1 month ago

Overview

Architecture

Components

1. IntentAnalyzer (intent-analyzer.ts)

2. ModelSearchService (model-search-service.ts)

3. IntentOrchestrator (intent-orchestrator.ts)

API Usage

Auto Mode Request

Auto Mode Response

Configuration

Environment Variables

API Keys Configuration

Usage Scenarios

1. Text-to-Image (No Context)

2. Image-to-Image (Style Transfer)

3. Multi-Image to Video

4. Auto Type Inference

5. Budget-Conscious Selection

User Tier Constraints

Free Tier

Pro Tier

Enterprise Tier

Testing

Error Handling

Performance

Monitoring

Future Enhancements

1. IntentAnalyzer (`intent-analyzer.ts`)

2. ModelSearchService (`model-search-service.ts`)

3. IntentOrchestrator (`intent-orchestrator.ts`)