Overview
usheru AI is a comprehensive film analytics and insights platform that combines a Next.js frontend with a Python FastAPI backend to provide AI-powered data analysis capabilities. The system enables users to ask natural language questions about film data and receive intelligent insights through a conversational interface.
Key Features
· AI-Powered Chat Interface: Natural language to SQL conversion using LangGraph and OpenAI
· Real-time Analytics: Live data visualization and insights
· User Management: Admin dashboard with user authentication and role management
· Multi-region Support: Configurable data sources and regions
· Streaming Responses: Real-time data processing with Server-Sent Events
· Comprehensive Monitoring: Error tracking, performance monitoring, and observability
Architecture Overview
Purpose & Scope:
Summarize what the system does, its main responsibilities, and its boundaries within the organization or ecosystem.
Main Users / Actors:
Super User:
Invite new users
Define each user’s scope
Global dataset
Region-limited(e.g., Ireland, Germany, a specific country)
Assign per-user quotas
Number of queries/day or month
LLM-token budget/day or month
Select which LLM model each user may access
End User:
Sign in via magic-link or email/password (managed by Supabase)
See only the datasets and remaining quota you’ve been granted
Ask data-driven and predictive questions in natural language
Key Integrations:
List major external dependencies:
Authentication provider (e.g., Auth0, Firebase)
Third-party APIs (e.g., TMDB, Stripe, Google Maps)
Analytics / Monitoring (e.g., GTM, Amazon CloudWatch)
Architecture Summary:
Overview
Description: A unified web portal where authorized users can interact with a global or region-scoped catalog of film data stored in Google BigQuery via an LLM-powered chatbot.
Frontend: Next.js on Vercel
Auth & User Data: Supabase
Chat & Data Logic: Python APIs on EC2
Data Access & Scoping
All film-related tables live in a global BigQuery project
Row-level access enforced in queries based on user scope
Region filters injected automatically into the SQL generated by the LLM agent
Quota & Rate-Limiting
Supabase stores user’s current usage counters (queries, tokens)
Python API enforces:
Per-request check against remaining quota
Increment counters after each successful call
Redis cache for “remaining quota” to reduce Supabase lookups for frequent reads
Web Portal (UI)
Login screen: role detection → redirect to Admin Dashboard or Chatbot UI
Admin Dashboard:
User list, invite links, role/scope editor
Quota overview & adjustment control
Model-assignment management
Money & Token Reporting: ability to check tokens consumed and money spent per client
Chat Interface:
Conversation history
“Remaining quota” badge
Quick-links for common queries (e.g. “Top 10 films in my region”)
Thread Management: start new threads and view previous conversation
Chatbot Features
Data Retrieval
Understands plain-language questions (“How many films did Spielberg direct in the ’90s?”)
Translates into parameterized BigQuery SQL via LangGraph
Runs query, formats results back to user
Predictive Analytics
ML regression endpoints (e.g. “Predict next-year box-office for this franchise”)
Uses pre-trained Python models hosted on EC2
Insight Suggestions
Auto-suggest deeper questions based on data patterns (e.g. “Do you want trend over time?”)
Chatbot Features
Data Retrieval
Understands plain-language questions (“How many films did Spielberg direct in the ’90s?”)
Translates into parameterized BigQuery SQL via LangGraph
Runs query, formats results back to user
Predictive Analytics
ML regression endpoints (e.g. “Predict next-year box-office for this franchise”)
Uses pre-trained Python models hosted on EC2
Insight Suggestions
Auto-suggest deeper questions based on data patterns (e.g. “Do you want trend over time?”)
Backend Architecture
Python API (EC2)
REST/gRPC endpoints for: chat, quota checks, ML predictions, insight jobs
LangGraph orchestrates LLM → SQL mapping
Token tracking middleware
BigQuery:
Film datasets, versioned schema
Supabase
Users table (id, email, role, scope, quotas, assigned LLM model)
Usage table (daily/monthly counters)
Redis:
Cache hot data (remaining quota, frequently-used metadata)
Email
Invitation links, newsletter dispatch
ML & Predictive Modules
Regression models in Python (scikit-learn, TensorFlow, etc.)
Exposed as separate API routes under /predict
Model registry to pick the correct version for each user (if needed)
Background Insights & Notifications
CRON jobs on EC2 (or AWS EventBridge) to run scheduled analyses
Generate summary reports (e.g. “Top growth genres this week”)
Send via email
Monitoring & Logging
LangSmith for LLMs usage monitoring
Sentry for application error tracking
Security & Compliance
OAuth/JWT via Supabase for all API calls
Vercel edge functions enforce route-level ACLs
Integrations & Deployment
Set up CI/CD pipelines on Vercel (frontend) and EC2 deployments
High-Level Diagram
.png?sv=2026-02-06&spr=https&st=2026-05-31T01%3A25%3A44Z&se=2026-05-31T01%3A48%3A44Z&sr=c&sp=r&sig=T%2F4fsKn7oPkdVTOQRkZMugc87jq3JakuL4Rz2lJjsEE%3D)
graph TB
subgraph "User Interface Layer"
A[Web Browser] --> B[Next.js Frontend]
B --> C[React Components]
B --> D[Authentication UI]
B --> E[Analytics Dashboard]
B --> F[Chat Interface]
end
subgraph "API Gateway & CDN"
G[Vercel Edge Network] --> H[Next.js API Routes]
H --> I[CORS Middleware]
H --> J[Request Validation]
end
subgraph "Backend Services"
K[FastAPI Application] --> L[Authentication Middleware]
K --> M[Request Processing]
K --> N[Response Generation]
M --> O[LangGraph Workflow Engine]
O --> P[Intent Detection Agent]
O --> Q[Query Generation Agent]
O --> R[Query Execution Agent]
O --> S[Answer Generation Agent]
end
subgraph "AI & ML Services"
T[OpenAI GPT-4] --> U[Natural Language Processing]
V[LangSmith] --> W[LLM Call Tracing]
X[Sentence Transformers] --> Y[Semantic Search]
end
subgraph "Data Storage Layer"
Z[Supabase PostgreSQL] --> AA[User Profiles]
Z --> BB[Thread Memory]
Z --> CC[Analytics Cache]
DD[Google BigQuery] --> EE[Film Analytics Data]
DD --> FF[Box Office Data]
DD --> GG[User Behavior Data]
HH[CacheTools] --> II[Session Data]
HH --> JJ[Query Results]
HH --> KK[Authentication Tokens]
end
subgraph "External Integrations"
LL[AWS SES] --> MM[Email Notifications]
NN[Sentry] --> OO[Error Tracking]
PP[Vercel Analytics] --> QQ[Performance Monitoring]
end
subgraph "Infrastructure"
RR[Docker Containers] --> SS[Backend Services]
TT[Vercel Platform] --> UU[Frontend Hosting]
VV[Google Cloud] --> WW[BigQuery Service]
XX[Supabase Cloud] --> YY[Database & Auth]
end
A --> G
G --> K
K --> T
K --> V
K --> X
K --> Z
K --> DD
K --> HH
K --> LL
K --> NN
RR --> K
TT --> B
VV --> DD
XX --> ZDetailed component architecture
graph TD
subgraph "Frontend Components (Next.js 14)"
A[App Router] --> B[Layout Components]
A --> C[Page Components]
A --> D[API Routes]
B --> E[Theme Provider]
B --> F[Auth Provider]
B --> G[Error Boundary]
C --> H[Enhanced Dashboard]
C --> I[User Management]
C --> J[Analytics Board]
C --> K[Settings Page]
D --> L[/api/ask]
D --> M[/api/enhanced-chat]
D --> N[/api/fun-fact]
D --> O[/api/analytics-dashboard]
end
subgraph "Backend Services (FastAPI)"
P[Main Application] --> Q[Authentication Middleware]
P --> R[CORS Middleware]
P --> S[Sentry Middleware]
Q --> T[JWT Token Validation]
Q --> U[User Context Extraction]
Q --> V[Permission Checking]
P --> W[API Endpoints]
W --> X[POST /ask-new]
W --> Y[POST /ask-fast]
W --> Z[GET /health]
W --> AA[POST /analytics-dashboard]
end
subgraph "AI Workflow Engine (LangGraph)"
BB[Workflow Orchestrator] --> CC[State Management]
BB --> DD[Agent Coordination]
BB --> EE[Error Handling]
CC --> FF[Conversation State]
CC --> GG[Memory Context]
CC --> HH[User Preferences]
DD --> II[Intent Detection Agent]
DD --> JJ[Query Generation Agent]
DD --> KK[Query Execution Agent]
DD --> LL[Answer Generation Agent]
end
subgraph "Data Management Layer"
MM[Memory Manager] --> NN[Thread Memory]
MM --> OO[Context Analysis]
MM --> PP[Follow-up Detection]
QQ[Schema Loader] --> RR[BigQuery Schema]
QQ --> SS[Table Metadata]
QQ --> TT[Query Optimization]
UU[Cache Manager] --> VV[In memory cache Operations]
UU --> WW[Query Caching]
UU --> XX[Session Storage]
end
subgraph "External Service Integrations"
YY[OpenAI Client] --> ZZ[GPT-4 API]
YY --> AAA[Embedding API]
YY --> BBB[Token Counting]
CCC[Supabase Client] --> DDD[Database Operations]
CCC --> EEE[Authentication]
CCC --> FFF[Real-time Subscriptions]
GGG[BigQuery Client] --> HHH[Query Execution]
GGG --> III[Schema Discovery]
GGG --> JJJ[Cost Management]
end
graph TD
subgraph "Frontend (Next.js)"
A[User Interface] --> B[Authentication]
A --> C[Dashboard]
A --> D[Chat Interface]
A --> E[Admin Panel]
B --> F[Supabase Auth]
C --> G[Analytics Components]
D --> H[Streaming Chat]
E --> I[User Management]
end
subgraph "Backend (FastAPI)"
J[API Endpoints] --> K[Authentication Middleware]
J --> L[Request Processing]
J --> M[Response Generation]
L --> N[LangGraph Workflow]
N --> O[Intent Detection]
N --> P[Query Generation]
N --> Q[Query Execution]
N --> R[Answer Generation]
Q --> S[BigQuery Client]
R --> T[Memory Manager]
end
subgraph "Data Layer"
U[Supabase PostgreSQL] --> V[User Profiles]
U --> W[Thread Memory]
U --> X[Analytics Data]
Y[BigQuery] --> Z[Film Data]
Y --> AA[Analytics Tables]
BB[in-memory Cache] --> CC[Session Data]
BB --> DD[Query Cache]
end
subgraph "External Integrations"
EE[OpenAI API] --> FF[GPT-4 Model]
GG[LangSmith] --> HH[Tracing & Monitoring]
II[Sentry] --> JJ[Error Tracking]
KK[AWS SES] --> LL[Email Notifications]
end
Components
Backend (FastAPI)
Technology Stack:
· FastAPI with Python 3.13
· LangGraph for workflow orchestration
· OpenAI GPT-4 for natural language processing
· BigQuery for data storage and querying
· Supabase for user management and memory
Key Modules:
· `app/main.py`: Main FastAPI application
· `app/asknew/`: New optimized workflow implementation
· `app/core/`: Core functionality (state, graph, memory)
· `app/agents/`: AI processing agents
· `supabase_utils/`: Supabase integration utilities
API Endpoints:
· `POST /ask-new`: Main streaming chat endpoint with LangGraph workflow
· `GET /health`: Health check
· `GET /analytics-dashboard`: Analytics data
· `POST /title-generation`: Dynamic title generation
Data Layer
Supabase PostgreSQL
Core User Management
user_profiles - User information, roles, and preferences
user_regions/datamarts - Many-to-many relationship between users and regions
regions /datamarts- Available data regions/datamarts
Authentication & Authorization
auth.users - Supabase built-in user authentication table
Chat & Memory System
chat_sessions - Chat session management
chat_threads - Individual conversation threads
chat_messages - Individual chat messages
thread_memory_enhanced - Enhanced conversation memory with embeddings
Analytics & Insights
generated_insights - AI-generated insights storage
insight_descriptions - Insight metadata and descriptions
insight_queries - Stored insight queries
user_insight_datamarts - User access to specific datamarts for insights
Configuration & Settings
app_config - Application configuration settings
datamarts - Available datamarts/datasets
active_llm - Active LLM model configuration
Prompts & Customization
additional_prompts - System-wide additional prompts
datamart_additional_prompts - datamart-specific additional prompts
Token Management
user_token_usage - User token consumption tracking
Admin & Logging
admin_actions - Admin action audit log
api_request_logs - API request logging
Suggested Questions
suggested_questions - Pre-defined suggested questions for users
Key Features:
· Row Level Security (RLS) policies
· Real-time subscriptions
· Automatic user profile creation
· Memory management with embeddings
BigQuery
Datasets:
· `usheru_data_mart`: Main film analytics dataset
· Multiple region-specific datasets
· Optimized for analytical queries
Features:
· Cost-controlled queries
· Query validation and dry runs
· Automatic schema detection
· Result caching
Caching Layer
FRONTEND CACHING (TypeScript/Next.js)
Session Token Cache
Location: lib/auth-utils.ts
Type: In-memory variable
TTL: 2 minutes (120,000ms)
Purpose: Reduces repeated Supabase authentication calls
Implementation:
let sessionTokenCache: { token: string | null, timestamp: number } | null = null
const SESSION_TOKEN_CACHE_TTL = 120 * 1000 // 2 minutes
Request Deduplication Cache
Location: lib/backend-api.ts
Type: Map<string, Promise>
TTL: 5 minutes (300,000ms)
Purpose: Prevents duplicate API requests
Implementation:
const ongoingRequests = new Map<string, { promise: Promise<BackendResponse>, timestamp: number }>()
const REQUEST_CACHE_TIMEOUT = 300000 // 5 minutesNext.js API Route Caching
Location: next.config.mjs
Type: HTTP headers
Purpose: Disable caching for streaming responses
Implementation:
headers: [{
key: 'Cache-Control',
value: 'no-store, no-cache, must-revalidate, proxy-revalidate'
}]
BACKEND CACHING (Python/FastAPI)
Authentication Cache
Location: supabase_utils/auth_cache.py
Type: TTLCache (cachetools library)
TTL: 5 minutes
Max Size: 1000 entries
Purpose: Cache authentication results
Implementation:
_auth_cache = TTLCache(maxsize=1000, ttl=300) # 5 minutes
User Profile Cache
Location: supabase_utils/auth_cache.py
Type: TTLCache
TTL: 10 minutes
Max Size: 1000 entries
Purpose: Cache user profile data
Implementation:
_profile_cache = TTLCache(maxsize=1000, ttl=600) # 10 minutes
Token Usage Cache
Location: supabase_utils/auth_cache.py
Type: TTLCache
TTL: 1 minute
Max Size: 1000 entries
Purpose: Cache token usage records
Implementation:
_token_cache = TTLCache(maxsize=1000, ttl=60) # 1 minute
Datamart Access Cache
Location: supabase_utils/auth_cache.py
Type: TTLCache
TTL: 5 minutes
Max Size: 1000 entries
Purpose: Cache datamart access permissions
Implementation:
_datamart_cache = TTLCache(maxsize=1000, ttl=300) # 5 minutesThread Memory Cache
Location: app/core/thread_memory_manager.py
Type: Dictionary
TTL: 5 minutes
Purpose: Cache recent conversation context
Implementation:
self._thread_cache: Dict[str, List[ThreadMemoryItem]] = {}
self._cache_ttl = 300 # 5 minutes
LLM Response Cache
Location: app/core/llm_cache.py
Type: Dictionary with LRU-like eviction
Max Size: 1000 entries
Purpose: Cache LLM responses and configurations
Implementation:
class LLMCache:
def __init__(self):
self.response_cache: Dict[str, Any] = {}
self.config_cache: Dict[str, Any] = {}
self.max_cache_size = 1000
PERSISTENT STORAGE CACHING
Supabase PostgreSQL
Location: thread_memory_enhanced table
Type: Database with indexes
Purpose: Persistent conversation memory
Features:
JSONB storage for flexible data
Vector embeddings for semantic search
Row Level Security (RLS)
Performance-optimized indexes
BigQuery Result Caching
Location: Query execution layer
Type: BigQuery native caching
Purpose: Cache query results
Features:
Automatic result caching
Cost optimization
Query deduplication
Integrations
LangSmith
Purpose: LLM call tracing and monitoring
Configuration:
# LangSmith setup
LANGCHAIN_TRACING_V2 = True
LANGCHAIN_ENDPOINT = "https://api.smith.langchain.com"
LANGCHAIN_API_KEY = os.getenv("LANGCHAIN_API_KEY")
LANGCHAIN_PROJECT = "usheru-sql-agent"Sentry
Purpose: Error tracking and performance monitoring
Features:
Automatic error capture
Performance monitoring
Custom context and tags
Release tracking
Configuration:
# Sentry setup
SENTRY_DSN = os.getenv("SENTRY_DSN")
SENTRY_ENVIRONMENT = os.getenv("SENTRY_ENVIRONMENT", "production")
traces_sample_rate = 0.1
profiles_sample_rate = 0.1AWS SES
Purpose: Email notifications and insights delivery
Features:
Weekly insights emails
User notifications
HTML email templates
Recipient management
API Endpoints
For more details, see usheru TrackAI API
Database Schema
Supabase Tables
user_profiles
CREATE TABLE user_profiles (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID REFERENCES auth.users(id) ON DELETE CASCADE,
email TEXT NOT NULL,
full_name TEXT,
is_admin BOOLEAN DEFAULT FALSE,
is_blocked BOOLEAN DEFAULT FALSE,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);thread_memory_enhanced
CREATE TABLE thread_memory_enhanced (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL,
thread_id TEXT NOT NULL,
namespace TEXT NOT NULL,
memory_key TEXT NOT NULL,
memory_data JSONB NOT NULL,
embedding VECTOR(384),
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
UNIQUE(user_id, thread_id, namespace, memory_key)
);
BigQuery Schema
Film Analytics Tables
`movies`: Film metadata and information
`box_office`: Revenue and performance data
`ratings`: User and critic ratings
`genres`: Genre classifications
`actors`: Cast and crew information
Sql Query Processing
Query Generation Process:
Natural Language Input : User asks question in plain English
Intent Detection : AI determines if it's a data query, greeting, or out-of-scope
Schema Analysis : System loads relevant BigQuery table schemas
SQL Generation : LangChain + GPT-4 generates BigQuery-compatible SQL
Query Validation : Dry run to validate syntax and estimate cost
Execution : Query runs with safety limits (1GB, 30s timeout)
Result Processing : Data formatted and insights generated
Safety Features:
Query cost limits (1GB per query)
Timeout protection (90 seconds)
Result size limits (100 rows max)
SQL injection prevention
Read-only access enforcement
Example Query Flow:
-- User asks: "What are the top 5 movies by revenue?"
-- Generated SQL:
SELECT
title,
worldwide_gross,
release_year
FROM `project.dataset.box_office`
ORDER BY worldwide_gross DESC
LIMIT 5Performance Optimizations:
Schema caching for faster query generation
Query result caching in In-memory query cache
Parallel processing for complex queries
Cost-aware query optimization
Security & Access
Authentication Flow
User Registration/Login
Supabase Auth handles user authentication
JWT tokens for API access
Row Level Security (RLS) for data access
API Authentication
Bearer token in Authorization header
Token validation on each request
User context extraction
Admin Access
Role-based access control
Admin dashboard restrictions
User management capabilities
Security Features
CORS Configuration : Properly configured for cross-origin requests
Input Validation : Pydantic models for request validation
SQL Injection Prevention : Parameterized queries
Rate Limiting : Request deduplication and caching
Error Handling : Secure error messages without sensitive data
Deployment
📘 For detailed deployment instructions, see:
Observability
LangSmith Integration
Features:
LLM call tracing
Token usage tracking
Performance monitoring
Error analysis
Configuration:
@traceable(name="ask_fast_workflow", tags=["ask-fast", "optimized"])
def process_with_tracing(question: str, session_id: str):
return workflow.process(question, session_id, user_id, datamart)Sentry Monitoring
Features:
Error tracking and alerting
Performance monitoring
Custom context and tags
Release tracking
Implementation:
# Sentry middleware
@sentry_monitor
def process_request(request):
# Request processing logic
pass
# Custom error capture
capture_exception(e, tags={"endpoint": "ask_new"})Logging
Structured Logging:
Request/response logging
Performance metrics
Error tracking
User activity
Performance Optimization
Frontend Optimizations
Code Splitting : Dynamic imports for components
Image Optimization : Next.js image optimization
Caching : Request deduplication and caching
Streaming : Server-Sent Events for real-time updates
Backend Optimizations
Query Optimization : BigQuery query optimization
Caching : cachetools for frequently accessed data
Memory Management : Efficient memory usage
Async Processing : Non-blocking operations
Database Optimizations
Indexing : Proper database indexes
Query Caching : Result caching
Connection Pooling : Efficient connection management
RLS Optimization : Row Level Security performance
Known Risks & Technical Debt
For more details, see usheru TrackAI Troubleshooting.
Appendix
Complete Technology Stack
Frontend Technologies
Framework : Next.js 14 with App Router
Language : TypeScript
UI Library : React 18
Styling : Tailwind CSS
Components : Radix UI
State Management : React Context + Hooks
Authentication : Supabase Auth
Deployment : Vercel
Backend Technologies
Framework : FastAPI (Python 3.13)
AI/ML : LangGraph, LangChain, OpenAI GPT-4
Database : Supabase PostgreSQL, Google BigQuery
Caching : in-memory cachetools, In-Memory (TTLCache)
Monitoring : Sentry, LangSmith
Email : AWS SES
Deployment : Docker, AWS EC2
Infrastructure & DevOps
Frontend Hosting : Vercel Edge Network
Backend Hosting : AWS EC2 with Docker
Database : Supabase Cloud (PostgreSQL)
Analytics : Google BigQuery
CDN : Vercel Edge Functions
Monitoring : Sentry, LangSmith, Vercel Analytics
CI/CD : GitHub Actions (implied)
External Services
AI Services : OpenAI GPT-4, Sentence Transformers
Database Services : Supabase, Google BigQuery
Monitoring : Sentry, LangSmith
Email : AWS SES
Authentication : Supabase Auth
Analytics : Vercel Analytics
System Capabilities Summary
Core Features
Natural Language to SQL: Convert user questions to BigQuery queries
Real-time Chat Interface: Streaming responses with Server-Sent Events
User Authentication: Secure login/signup with role-based access
Analytics Dashboard: Interactive data visualizations
Admin Panel: User management and system administration
Memory System: Conversation context and follow-up detection
Multi-region Support: Configurable data sources
Performance Monitoring: Comprehensive observability
Flow Diagram
![]()

Additional resources
Original architecture document called usheru SQL-Agent Flow
Watch the supporting videos here.