Documentation Index
Fetch the complete documentation index at: https://docs.intelliq.dev/llms.txt
Use this file to discover all available pages before exploring further.
Hybrid Search Implementation
IntelliQ implements a sophisticated hybrid search system that combines the strengths of both keyword-based (lexical) search and semantic (vector) search to provide highly relevant results when users search for quizzes.What is Hybrid Search?
Hybrid search combines two powerful search methodologies:- Keyword Search (Full-Text Search): Matches specific words or phrases in content, excellent for finding exact matches.
- Semantic Search (Vector Search): Understands the meaning and context of a query, finding conceptually related content even when keywords don’t match.
Architecture Overview
The hybrid search implementation in IntelliQ follows these key steps:- User Input: The user enters a search query in the UI
- Query Processing: The query is processed in two ways:
- As text for keyword search
- Converted to a vector embedding for semantic search
- Dual Search Execution: Both search methods run in parallel in PostgreSQL
- Result Fusion: Results are combined using Reciprocal Rank Fusion (RRF)
- Result Presentation: The final ranked results are returned to the user
Technical Implementation
Database Function
At the core of our hybrid search is a PostgreSQL function that performs both search types and combines the results:Key Components Explained
1. Embedding Generation
We use OpenAI’s text-embedding-3-small model to generate embeddings for both quizzes and search queries:2. Full-Text Search
The full-text search component uses PostgreSQL’s built-in text search capabilities:to_tsvector: Converts text to a searchable formatwebsearch_to_tsquery: Parses the user’s query into a format suitable for searchingsetweight: Assigns different weights to different fields (title, description, topics)ts_rank_cd: Ranks results based on relevance
3. Semantic Search
The semantic search component uses pgvector’s similarity search:<#>operator: Calculates the inner product distance between embeddings- Smaller distances indicate higher similarity
4. Reciprocal Rank Fusion (RRF)
RRF combines the rankings from both search methods:rank_ix: The position of each result in its respective listrrf_k: A constant (default: 60) that smooths the impact of high rankingsfull_text_weightandsemantic_weight: Control the relative importance of each search method
API Implementation
Our API endpoint handles the search request, generates the embedding, and calls the database function:Frontend Implementation
The frontend provides a seamless search experience:Benefits of Hybrid Search
- Improved Relevance: Finds both exact keyword matches and conceptually related content
- Better Recall: Captures results that might be missed by either method alone
- Enhanced User Experience: Users find what they’re looking for even if they don’t use exact terminology
- Flexibility: Weights can be adjusted to favor either keyword or semantic search
Performance Considerations
-
Indexing: Both search methods use appropriate indexes:
- GIN index for full-text search
- HNSW index for vector search
- Asynchronous Embedding Generation: Embeddings are generated in the background to avoid slowing down quiz creation
- Pagination: Results are paginated to limit the amount of data transferred
- Caching: Frequently searched queries could be cached (future enhancement)
Future Enhancements
- Personalized Ranking: Adjust result ranking based on user preferences and history
- Multi-language Support: Extend search capabilities to multiple languages
- Faceted Search: Allow filtering of search results by various attributes
- Query Expansion: Automatically expand queries to include related terms
- Performance Optimization: Further optimize the search algorithm for larger datasets