Back to Home
Technical Design Document

AI-Powered Real Estate Search Platform

A comprehensive technical design document outlining the architecture, AI integration, and implementation strategy for Rufio's decision intelligence platform.

Jordan Allen
~15 min read

Document Metadata

Project NameAI-Powered Real Estate Search Platform
Doc TypeTechnical Design Document (TDD)
AudienceHiring Manager, Senior Engineer, Architect
StatusActive
Last Updated2026-01-13
OwnerJordan Allen
ScopeCovers conversational search agent, preference wizard, LLM-planned search pipeline, multi-modal embeddings, taste learning, match scoring, collaboration. Excludes: Agent-facing CRM, listing management, MLS data ingestion pipeline.
CodebaseKey paths: /app/lib/search, /app/lib/scout, /app/lib/matching, /app/lib/wizard, /src/mastra

1. Problem Statement

Home buyers spend months browsing listings through filter interfaces, yet can't articulate what they want in checkbox form. The core problem: discovery is mismatched to how preferences actually work.

Specific Problems

  • Filter explosion: Stack enough constraints and you get zero results; relax them and you're overwhelmed
  • Preferences emerge through exposure: A buyer thinks they need 4 bedrooms until they see a brilliantly designed 3-bedroom
  • No learning: Viewing 200 listings teaches the system nothing—each session starts fresh
  • Listings don't answer real questions: "Will this work for remote work?" → "4 bed / 3 bath"
  • Visual preferences are inexpressible: "Modern but warm, not sterile" has no filter

Before / After

BeforeAfter
47 listings viewed before shortlist12 listings viewed (74% reduction)
4.2 min to first relevant result1.3 min (3.2x faster)
Every session starts from scratchSystem learns and improves with each interaction
"Modern home" returns random results89% semantic query accuracy

2. Goals and Non-Goals

2.1 Goals

  • Enable natural language property search with semantic understanding
  • Learn buyer preferences from both explicit feedback and implicit behavior
  • Deliver explainable recommendations that users can interrogate
  • Provide sub-second search latency for interactive refinement
  • Support multi-stakeholder collaboration (couples, families, agents)
  • Scale to full MLS inventory (millions of listings)

2.2 Non-Goals

  • Not a CRM for agents — focused on buyer experience, not lead management
  • Not a listing platform — consumes MLS data, doesn't manage listings
  • Not a transaction system — stops at discovery, no offers/contracts
  • Not a mortgage calculator — basic affordability only, no loan origination
  • Not optimizing for UI polish in v1 — function over form initially

Phase Scope

PhaseIncludedExcluded
v1Search, wizard, matching, Scout agentVoice input, real-time streaming
v1.5Collaboration, comparison sessionsAgent marketplace
v2MLS integrations, alertsOffer management

3. System Overview

The system comprises four layers with distinct responsibilities:

3.1 High-Level Architecture

┌─────────────────────────────────────────────────────────────────┐
│                      CLIENT LAYER                                │
│    Next.js 15 + React 19 + Tailwind + Radix UI                  │
│    Scout Chat • Wizard • Property Cards • Comparison Trays       │
└─────────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────────┐
│                 AGENT LAYER (Mastra + LLM)                       │
│  ┌───────────────┐  ┌───────────────┐  ┌───────────────┐       │
│  │    Scout      │  │   Planner     │  │    Judge      │       │
│  │    Agent      │  │    Agent      │  │    Agent      │       │
│  │  (12 tools)   │  │  (QueryIR)    │  │  (QA loop)    │       │
│  └───────────────┘  └───────────────┘  └───────────────┘       │
│  ┌───────────────┐  ┌───────────────┐                           │
│  │    Match      │  │    Property   │                           │
│  │   Scorer      │  │    Explainer  │                           │
│  └───────────────┘  └───────────────┘                           │
└─────────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────────┐
│                    RETRIEVAL LAYER                               │
│    Elasticsearch 9.x: BM25 + kNN + Script Scoring               │
│  ┌───────────────┐  ┌───────────────┐  ┌───────────────┐       │
│  │  Description  │  │   Amenity     │  │   Location    │       │
│  │   Vectors     │  │   Vectors     │  │   Vectors     │       │
│  └───────────────┘  └───────────────┘  └───────────────┘       │
│  ┌───────────────┐  ┌───────────────┐                           │
│  │    Image      │  │    RRF        │                           │
│  │   Vectors     │  │   Fusion      │                           │
│  └───────────────┘  └───────────────┘                           │
└─────────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────────┐
│                      DATA LAYER                                  │
│    PostgreSQL (Drizzle ORM) • Redis (sessions) • GCS (images)   │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐              │
│  │  Properties │  │    User     │  │   Scout     │              │
│  │  + Vectors  │  │  Profiles   │  │   Memory    │              │
│  └─────────────┘  └─────────────┘  └─────────────┘              │
└─────────────────────────────────────────────────────────────────┘

3.2 Core Subsystems

SubsystemResponsibilityKey Files
Preference WizardCaptures 100+ profile fields via guided flow/app/lib/wizard/
LLM-Planned SearchQuery → Classification → Planning → Execution → QA/app/lib/search/
Scout AgentConversational interface with 12 tools/app/lib/scout/
Match Scorer8-category weighted scoring (0-100)/app/lib/matching/
Taste LearningEvent capture + feature aggregation + MMR/app/lib/scout/tools/personalization/

3.3 Request Flow

  1. User completes wizard → Profile stored with computed weights
  2. User queries Scout: "Modern homes with good light under $800K"
  3. Heuristic check: Simple query? Skip LLM classification (saves 500ms)
  4. Planner Agent: Generates QueryChips + vector weights
  5. Vector generation: 4 embeddings (description, amenity, location, image)
  6. ES hybrid search: BM25 + kNN + script scoring
  7. Judge Agent: Evaluates top 3 results; if quality < 0.6, revise query
  8. Taste blending: MMR re-rank with user preference vector
  9. Match scoring: 8-category breakdown per property
  10. Results returned with explanations

4. Architecture Overview

4.1 Major Components

ComponentResponsibilityInputsOutputsOwns
Preference WizardStructured preference captureUser answersProfile + weightsQuestion flow, validation
Query ClassifierComplexity + intent detectionRaw queryComplexity score, intentClassification heuristics
Planner AgentQuery → structured IRQuery + contextQueryChips + weightsChip schema
Vector GeneratorText/image → embeddingsText, image URLs1024-dim vectorsEmbedding API calls
ES Query BuilderIR → Elasticsearch DSLQueryIR + vectorsES queryQuery construction
Judge AgentResult quality evaluationQuery + top resultsQuality score, revisionsEvaluation criteria
Scout AgentConversational interfaceUser message + scopeResponse + actionsTool orchestration, memory
Match ScorerProperty-profile alignmentProperty + profileScore 0-100 + breakdown8 sub-scorers
Taste EnginePreference learningUser eventsTaste vectorEvent aggregation, decay

4.2 Multi-Modal Embedding Architecture

Four distinct embedding spaces capture different property aspects:

SpaceDimensionsSourceCaptures
Description1024Listing textStyle, condition, lifestyle fit
Amenity1024Feature listKitchen quality, garage, pool
Location1024Address + enrichmentWalkability, schools, commute
Image1024Property photosVisual style, light, condition

Fusion: RRF (Reciprocal Rank Fusion) with 60% text weight, 40% image weight.

4.3 Communication Patterns

Sync: API → Agent → Tools → Database (request-response within 2s target)

Async: Scout memory persistence, taste event logging (fire-and-forget)

Retry: LLM calls have 3 retries with exponential backoff; ES queries timeout at 5s

4.4 LLM-Planned Search Sequence

┌──────┐     ┌───────────┐     ┌─────────┐     ┌───────────┐
│ User │────▶│ Classifier│────▶│ Planner │────▶│ Validator │
└──────┘     └───────────┘     └─────────┘     └───────────┘
                                                    │
┌──────┐     ┌───────────┐     ┌─────────┐     ┌────▼──────┐
│Result│◀────│   Judge   │◀────│   ES    │◀────│  Vectors  │
└──────┘     │ (QA loop) │     │ Search  │     │ Generator │
            └───────────┘     └─────────┘     └───────────┘

If Judge scores results < 0.6, revision handler adjusts query and retries (max 2 iterations).

5. Key Design Decisions

Decision Index

IDDecisionAreaStatus
D1RRF over linear fusionSearchAdopted
D24 embedding spacesRetrievalAdopted
D3Heuristic-first classificationLatencyAdopted
D48-category match scoringExplainabilityAdopted
D5Event-based taste learningPersonalizationAdopted
D6Scope-based agent memoryContextAdopted

D1: RRF over Linear Fusion

  • Context: Need to combine BM25 lexical scores with kNN vector scores
  • Alternatives: Linear weighted sum, learned fusion weights, interleaving
  • Chosen: Reciprocal Rank Fusion (RRF) with k=60
  • Why: RRF is position-based and robust to score distribution variance. Doesn't require normalization. Well-tested in production search systems.
  • Tradeoffs: Can't tune importance as precisely as learned weights; ignores score magnitude

D2: Four Embedding Spaces

  • Context: Properties have multiple semantic axes (text, visuals, location, features)
  • Alternatives: Single unified embedding, late fusion only
  • Chosen: 4 separate 1024-dim embeddings + RRF fusion
  • Why: Different embedding models excel at different domains. Allows per-axis weighting based on query type.
  • Tradeoffs: 4x embedding cost; more complex indexing; harder to debug

D3: Heuristic-First Classification

  • Context: Most queries are simple ("homes under 500k in Seattle") but LLM classification adds 500ms
  • Alternatives: Always classify via LLM, rule-based only
  • Chosen: Heuristic check first; skip LLM if confidence > 90%
  • Why: 70% of queries are simple. Saves 500ms latency for majority case.
  • Tradeoffs: May misclassify edge cases; heuristics need maintenance

D4: 8-Category Match Scoring

  • Context: Users need to understand why a property matches (or doesn't)
  • Alternatives: Single score, 3-tier (good/okay/poor), vector similarity only
  • Chosen: 8 weighted categories: Budget, Structure, Location, Schools, Lifestyle, Visual, Investment, Policy
  • Why: Maps to how buyers actually think. Enables filtering by category. Supports partial matches.
  • Tradeoffs: Complex weight tuning; users may disagree with category importance

D5: Event-Based Taste Learning

  • Context: Preferences should improve without forcing explicit feedback
  • Alternatives: Explicit ratings only, collaborative filtering
  • Chosen: Capture all events (save=1.0, hide=-0.5, view=0.1) + recency decay + feature aggregation
  • Why: Rich signal without friction. Adapts to evolving taste.
  • Tradeoffs: Cold start problem; noisy signals from accidental clicks

D6: Scope-Based Agent Memory

  • Context: Scout needs different context when discussing a specific property vs. general search
  • Alternatives: Single global thread, ephemeral memory, topic detection
  • Chosen: 6 scopes (global, property, collection, area, compare, tour) with separate threads
  • Why: Clean context separation. No cross-contamination. Enables scope-specific prompts.
  • Tradeoffs: Can't reference across scopes; more threads to manage

6. Code-Level Mapping

6.1 Directory Structure

/app
  /lib
    /search               # LLM-Planned Search V2
      llm-planned-orchestrator.ts   # Main pipeline
      query-classifier.ts           # Complexity detection
      planner-agent.ts              # QueryIR generation
      chip-validator.ts             # Chip validation
      es-query-builder.ts           # ES DSL construction
      judge-agent.ts                # Result QA
      revision-handler.ts           # Query revision loop
      vector-service.ts             # Embedding generation
      fusion.ts                     # RRF implementation
      visual-cues.ts                # Image query detection
    /scout                # Conversational Agent
      agent.ts                      # Mastra agent definition
      tools/                        # 12 tool implementations
        actions/properties.ts       # save/hide/note
        personalization/taste.ts    # taste vector
        personalization/rank.ts     # MMR blending
    /matching             # Match Scoring
      match-scorer.ts               # 8-category scorer
      sub-score-calculators.ts      # Category implementations
    /wizard               # Preference Wizard
      types.ts                      # Question schemas
      database-sync.ts              # Profile persistence
    /db
      schema/                       # Drizzle schemas
        profiles.ts                 # User profile (100+ fields)
        properties.ts               # Property + vectors
        scout.ts                    # Scout threads/messages
        collaboration.ts            # Comments/sessions
/src
  /mastra
    index.ts                        # Mastra configuration
    tools.ts                        # Tool definitions

6.2 Key File Responsibilities

FileLinesResponsibility
llm-planned-orchestrator.ts~800Orchestrates 8-stage search pipeline
agent.ts~300Scout agent with Mastra memory + tools
match-scorer.ts~600Computes 0-100 match with breakdown
taste.ts~200Event aggregation + feature extraction
vector-service.ts~400Text + multimodal embedding API calls
es-query-builder.ts~500Builds ES DSL from QueryIR

6.3 Key Interfaces

// QueryIR - Intermediate Representation
interface QueryIR {
  chips: QueryChip[];           // Extracted search parameters
  vectors: QueryVectors;        // 4 embedding types
  weights: WeightProfile;       // Per-vector importance
  hardFilters: ESFilter[];      // Must-match constraints
  softPreferences: string[];    // Nice-to-have features
}

// ScoutTasteEvent - Preference Signal
interface ScoutTasteEvent {
  kind: 'view' | 'save' | 'hide' | 'shortlist' | 'note';
  propertyId: string;
  weight: number;               // Action importance
  meta: PropertyMeta;           // Extracted features
  createdAt: Date;
}

// MatchResult - Scoring Output
interface MatchResult {
  score: number;                // 0-100 overall
  breakdown: CategoryScore[];   // 8 categories
  reasons: string[];            // Human-readable explanations
}

7. Failure Modes & Edge Cases

7.1 Search Failures

FailureDetectionMitigationUser Impact
Empty results with filtersResult count = 0Progressive filter relaxationWider but relevant results
Visual query for vacant landProperty type = land95% image weight penaltyCorrect ranking
Negated queries ("no pool")NOT chip detectedConvert to dealBreakerHard exclusion
LLM planning timeout>5s responseFall back to keyword searchDegraded but working
Judge rejects all resultsMax revisions reachedReturn best-effort + warning"May not match intent"

7.2 Agent Failures

FailureDetectionMitigationUser Impact
Tool execution timeout>10s per toolReturn partial + error"Action incomplete"
Memory context overflow>127k tokensTokenLimiter processorOlder context trimmed
LLM rate limit429 responseBackoff + fallback modelSlower but functional
Idempotency violationDuplicate cmdIdReturn cached resultNo double-save

7.3 Data Integrity

  • Orphaned events: Property deleted but taste events remain → Ignored in aggregation
  • Stale embeddings: Property updated but vectors unchanged → Nightly re-embedding
  • Profile mismatch: Profile updated mid-session → Re-fetch on next search

8. Tradeoffs & Constraints

8.1 Speed vs. Accuracy

The system is optimized for perceived responsiveness over theoretical optimality:

DecisionRationale
Heuristic-first classificationMost queries are simple ("homes in Greenville under 400k"). Skip LLM classification when regex + keyword detection suffices.
Single-pass planningRevision loops only trigger when Judge detects poor results. Most queries succeed on first pass.
RRF over learned fusionRRF with k=60 is robust across query types without training data. Learned fusion would require labeled datasets we don't have.
Pre-computed image embeddingsMultimodal embedding at ingest time, not query time. Trades freshness for latency.

8.2 UX vs. Engineering Cost

FeatureUX WinCostDecision
8-category scoringExplainable matches ("87% match because...")8 separate computations per propertyWorth it
Streaming responsesPerceived speed during agent responsesSSE infrastructureWorth it
Undo capabilityUser confidence to experimentEvent sourcing for reversibilityWorth it
Real-time collaborationCouples search togetherWebSocket infrastructureDeferred (polling for now)

8.3 Constraints Accepted

  • Cold start: New users get generic results until wizard completion or interaction history builds. The wizard mitigates this by front-loading preference capture.
  • Photo quality bias: Professional photography scores higher in visual matching. This reflects buyer perception reality.
  • Context window ceiling: Agent memory uses Mastra's 127k token limit. Very old conversation context gets trimmed by the TokenLimiter processor.
  • Online-only: All search and scoring requires network. Acceptable for the target use case (active home search).

8.4 Technical Debt

  • Hardcoded RRF weights (60/40): Works well empirically but not A/B tested. Abstraction exists for future tuning.
  • Single embedding provider: Voyage AI only. Interface abstraction exists for provider swapping.
  • Polling-based collaboration: Comments and sessions use polling. WebSocket upgrade planned for real-time sync.

9. Security, Safety & Misuse

9.1 Authentication & Authorization

LayerMechanism
Identity providerStytch B2B (magic links + OAuth)
Session managementJWT access tokens with refresh rotation
API authorizationBearer token validation on all routes
Data isolationPostgreSQL row-level security policies per user

9.2 Data Boundaries

  • Agent memory isolation: Each user's conversation history and preferences are scoped by userId. The agent cannot access other users' data.
  • MLS compliance: Listing data is cached locally per MLS terms. No redistribution or scraping.
  • Profile data: Wizard answers and taste events stored in user-scoped tables with RLS.

9.3 Agent Safety

  • Tool scope: Agent tools can only modify the authenticated user's data (saves, hides, notes).
  • No external actions: Agent cannot send emails, make API calls to external services, or access filesystem.
  • Conversation context: System prompt is fixed; user messages cannot override agent instructions.

9.4 Fair Housing Compliance

  • No protected class filtering: Search does not filter by race, religion, familial status, or other protected classes.
  • School data disclosure: School ratings are shown for transparency but not used for algorithmic steering.
  • Budget-based pricing: Price recommendations based on stated budget range, not demographic inference.

10. Observability & Debug

10.1 What Gets Logged

DataWhereWhy
API requests/errorsVercel logsError tracking and debugging
Search pipeline tracesConsole (dev)See each stage: classify → plan → validate → execute → judge
Agent conversation turnsPostgreSQL (messages table)Conversation replay for debugging and memory

10.2 Debug Playbook

SymptomFirst CheckCommon Cause
Empty search resultsES query structure in logsOver-constrained filters from planner
Slow agent responsesTool execution durationDatabase N+1 or slow ES query
Inconsistent match scoresProfile freshnessStale cached profile data
Agent confusionContext window usageOld context trimmed, missing relevant history

10.3 Search Trace Structure

Each search request logs a trace showing pipeline stage durations:

{
  traceId: "abc123",
  pipeline: "llm-planned-v2",
  stages: [
    { name: "classify", result: "complex" },
    { name: "plan", filters: 3, semantic: true },
    { name: "validate", passed: true },
    { name: "execute", hits: 47 },
    { name: "judge", revision: false }
  ]
}

When revision: true, the Judge triggered a re-plan, and stages repeat.

11. Evolution & Open Questions

11.1 Planned Improvements

  • Real-time collaboration: WebSocket upgrade to replace polling for comments and comparison sessions. Currently functional but not real-time.
  • Learned fusion weights: A/B test infrastructure to tune RRF weights per query type instead of fixed 60/40.
  • Multi-agent routing: Specialist agents for specific tasks (tour planning, investment analysis) with automatic routing.

11.2 Open Technical Questions

  • Cross-user taste transfer: Can users with similar profiles bootstrap cold-start faster? Requires privacy-preserving similarity computation.
  • Explanation fidelity: Match score reasons are generated post-hoc. How do we verify they reflect actual model behavior?
  • Preference stability: When should the system resist preference drift vs. adapt? A user viewing 10 ranches doesn't necessarily want ranches.
  • Late vs. early fusion: RRF fuses text and image results post-retrieval. Would jointly-trained embeddings perform better?

11.3 Known Limitations

  • US-only: MLS data access is US-specific. International would require different data sources.
  • Residential focus: Commercial properties would need different embedding strategy and scoring dimensions.
  • English only: LLM prompts and agent instructions are English. Internationalization is possible but not implemented.

12. Appendix

A. Search Evaluation Progression

Automated eval suite tracks search quality across 5 persona-based test cases (Family, Investor, Luxury, First-Time, Remote Worker). Results from a single development iteration:

IterationPass RateMedianOverall@10Constraint ViolationsKey Fix
10%0.0744Baseline - planner generating invalid ES queries
20%0.00Fixed constraint violations (valid queries, no results)
320%58.00Added semantic boosting, Luxury case passes
4100%94.40Tuned filter relaxation, all cases pass

Metrics explained:

  • MedianOverall@10: Median match score of top 10 results (0-100 scale from 8-category scorer)
  • Constraint Violations: Properties returned that violate hard constraints (e.g., over budget, wrong city)
  • Pass threshold: MedianOverall@10 ≥ 70, zero constraint violations

B. Test Cases

PersonaQueryFinal MedianOverall
Family Modern"modern family home with updated kitchen near good schools"93.1
Investor"rental property with good cash flow potential"93.1
Luxury"luxury home with high-end finishes and gourmet kitchen"99.2
First-Time"move-in ready starter home good value"93.1
Remote Worker"quiet home with dedicated office space and good internet"93.1

C. Glossary

TermDefinition
RRFReciprocal Rank Fusion - method to combine ranked lists from different retrievers
MedianOverall@10Median match score of top 10 search results
SearchPlanStructured representation compiled from user profile + wizard answers
TasteEventUser action (save, hide, view, dwell) that signals preference
ScopeAgent conversation context (global, property, collection, area, compare, tour)
JudgeLLM that evaluates search results and decides if revision is needed