Intent Layer for Auto Mode
The Intent Layer enables intelligent, automatic model selection and execution when users enable "auto mode" on the canvas. It analyzes user prompts, canvas context, and automatically selects and configures the optimal AI model for the task.
Overview
When auto mode is enabled, the system follows this flow:
Intent Analysis - LLM analyzes user prompt and canvas context
Model Search - Azure AI Search finds compatible models
Model Selection - LLM selects optimal model and generates parameters
Execution - Calls execute-node API with configured parameters
Architecture
Canvas (Auto Mode)
↓
/api/auto-mode
↓
IntentOrchestrator
├── IntentAnalyzer (GPT-4.1 Nano)
├── ModelSearchService (Azure AI Search)
└── execute-node API
Components
1. IntentAnalyzer (intent-analyzer.ts
)
intent-analyzer.ts
)Uses Azure OpenAI GPT-4.1 Nano to:
Infer output type if not specified by user
Analyze required model capabilities
Generate semantic search queries
Select optimal model from search results
Generate API parameters for selected model
Key Methods:
analyzeIntent(context)
- Step 1: Analyze user intentselectModelAndGenerateCall(intent, models, context)
- Step 2: Select model
2. ModelSearchService (model-search-service.ts
)
model-search-service.ts
)Integrates with Azure AI Search to:
Search models by semantic queries and filters
Apply user tier constraints (free/pro/enterprise)
Filter by capabilities, performance, cost
Provide fallback models when search fails
Key Methods:
searchForIntent(intentAnalysis, userTier)
- Search optimized for intentsearchModels(query, filters, options)
- Generic search interface
3. IntentOrchestrator (intent-orchestrator.ts
)
intent-orchestrator.ts
)Main coordinator that:
Validates requests
Orchestrates the full pipeline
Handles errors and fallbacks
Calls execute-node API
Tracks performance metrics
Key Methods:
processAutoModeRequest(request)
- Main entry pointvalidateRequest(request)
- Input validation
API Usage
Auto Mode Request
POST /api/auto-mode
{
"prompt": "Create a professional headshot photo",
"selectedType": "image", // Optional - LLM will infer if not provided
"inputNodes": [
{
"id": "input-1",
"type": "image",
"url": "https://example.com/photo.jpg",
"metadata": { "width": 1024, "height": 1024 }
}
],
"canvasInfo": {
"userId": "user-123",
"projectId": "project-456",
"nodeId": "node-789",
"userTier": "pro"
}
}
Auto Mode Response
{
"success": true,
"data": {
"intentAnalysis": {
"outputType": "image",
"confidence": 0.95,
"reasoning": "User wants to create a professional headshot...",
"searchQuery": "professional headshot portrait high quality",
"requiredCapabilities": {
"inputTypes": ["text"],
"outputType": "image"
}
},
"availableModels": [
{
"id": "fal-ai_flux-pro-v1-1-ultra",
"display_name": "FLUX Pro Ultra",
"quality_score": 9.5,
"speed_tier": "medium"
}
],
"modelSelection": {
"selectedModel": {
"id": "fal-ai_flux-pro-v1-1-ultra",
"displayName": "FLUX Pro Ultra",
"qualityScore": 9.5
},
"apiParameters": {
"prompt": "professional headshot of a business person in a modern office",
"guidance_scale": 7.5,
"num_inference_steps": 50
},
"selectionReasoning": "Selected for high quality portrait generation",
"confidence": 0.9
},
"executionResult": {
"success": true,
"data": {
"imageUrl": "https://generated-image-url.jpg"
}
}
}
}
Configuration
Environment Variables
# Azure OpenAI (for intent analysis)
AZURE_OPENAI_4_1_KEY=your-azure-openai-key
AZURE_OPENAI_ENDPOINT=https://your-resource.cognitiveservices.azure.com
# Azure AI Search (for model search)
AZURE_SEARCH_SERVICE=eidos-mvp
AZURE_SEARCH_API_KEY=your-search-key
API Keys Configuration
The system uses GPT-4.1 Nano from config/api-keys.ts
:
AZURE_OPENAI: {
API_KEY: process.env.AZURE_OPENAI_4_1_KEY,
ENDPOINT: "https://joyce-resource.cognitiveservices.azure.com",
DEPLOYMENT_NAME: "gpt-4.1-nano",
API_VERSION: "2025-01-01-preview",
MODEL: "gpt-4.1-nano"
}
Usage Scenarios
1. Text-to-Image (No Context)
{
prompt: "Create a sunset landscape",
selectedType: "image",
inputNodes: []
}
// → Selects FLUX or Imagen for high-quality landscape
2. Image-to-Image (Style Transfer)
{
prompt: "Make this look like a Van Gogh painting",
inputNodes: [{ type: "image", url: "photo.jpg" }]
}
// → Selects FLUX I2I or style transfer model
3. Multi-Image to Video
{
prompt: "Create smooth transition between these images",
selectedType: "video",
inputNodes: [
{ type: "image", url: "img1.jpg" },
{ type: "image", url: "img2.jpg" }
]
}
// → Selects Veo2 or Kling for image-to-video
4. Auto Type Inference
{
prompt: "Turn this photo into a 3D model",
inputNodes: [{ type: "image", url: "object.jpg" }]
}
// → LLM infers outputType: "3d", selects Trellis or Hunyuan3D
5. Budget-Conscious Selection
{
prompt: "Quick logo design",
canvasInfo: { userTier: "free" }
}
// → Selects economy tier models with good speed
User Tier Constraints
Free Tier
Economy cost tier models only
Max 30s latency
Basic quality threshold (≥6.0)
Pro Tier
Standard cost tier models
Quality threshold ≥7.0
Balanced speed/quality
Enterprise Tier
All cost tiers available
Premium quality threshold (≥8.0)
No latency restrictions
Testing
Run the comprehensive test suite:
npm run test:intent-layer
# or
node lib/intent/test-intent-layer.js
Test scenarios include:
Text-to-image generation
Image style transfer
Multi-modal inputs
Type inference
User tier constraints
Performance preferences
Error Handling
The system includes robust error handling:
LLM Failures - Fallback to rule-based analysis
Search Failures - Fallback to basic model search
No Models Found - Try broader search criteria
Execution Failures - Return partial results with error info
Performance
Typical processing times:
Intent Analysis: 500-1500ms
Model Search: 100-300ms
Model Selection: 500-1200ms
Total: ~1-3 seconds (excluding model execution)
Monitoring
Each request includes detailed metadata:
{
metadata: {
processingTimeMs: 2150,
steps: [
{ step: "intent_analysis", duration: 890, success: true },
{ step: "model_search", duration: 120, success: true },
{ step: "model_selection", duration: 780, success: true },
{ step: "model_execution", duration: 15400, success: true }
]
}
}
Future Enhancements
User Feedback Loop - Learn from user selections
Caching - Cache intent analysis for similar prompts
A/B Testing - Test different selection strategies
Advanced Context - Consider project history, user preferences
Multi-Step Workflows - Chain multiple models automatically
✅ The Intent Layer is now ready for production auto mode!
Last updated