Kuldeep Singh Sidhu's picture
6 3

Kuldeep Singh Sidhu

singhsidhukuldeep

AI & ML interests

๐Ÿ˜ƒ TOP 3 on HuggingFace for posts ๐Ÿค— Seeking contributors for a completely open-source ๐Ÿš€ Data Science platform! singhsidhukuldeep.github.io

Recent Activity

posted an update 2 days ago
Fascinating deep dive into Swiggy's Hermes - their in-house Text-to-SQL solution that's revolutionizing data accessibility! Hermes enables natural language querying within Slack, generating and executing SQL queries with an impressive <2 minute turnaround time. The system architecture is particularly intriguing: Technical Implementation: - Built on GPT-4 with a Knowledge Base + RAG approach for Swiggy-specific context - AWS Lambda middleware handles communication between Slack UI and the Gen AI model - Databricks jobs orchestrate query generation and execution Under the Hood: The pipeline employs a sophisticated multi-stage approach: 1. Metrics retrieval using embedding-based vector lookup 2. Table/column identification through metadata descriptions 3. Few-shot SQL retrieval with vector-based search 4. Structured prompt creation with data snapshots 5. Query validation with automated error correction Architecture Highlights: - Compartmentalized by business units (charters) for better context management - Snowflake integration with seamless authentication - Automated metadata onboarding with QA validation - Real-time feedback collection via Slack What's particularly impressive is how they've solved the data context challenge through charter-specific implementations, significantly improving query accuracy for well-defined metadata sets. Kudos to the Swiggy team for democratizing data access across their organization. This is a brilliant example of practical AI implementation solving real business challenges.
posted an update 5 days ago
Exciting breakthrough in neural search technology! Researchers from ETH Zurich, UC Berkeley, and Stanford University have introduced WARP - a groundbreaking retrieval engine that achieves remarkable performance improvements in multi-vector search. WARP brings three major innovations to the table: - A novel WARP SELECT algorithm for dynamic similarity estimation - Implicit decompression during retrieval operations - An optimized two-stage reduction process for efficient scoring The results are stunning - WARP delivers a 41x reduction in query latency compared to existing XTR implementations, bringing response times down from 6+ seconds to just 171 milliseconds on single-threaded execution. It also achieves a 3x speedup over the current state-of-the-art ColBERTv2 PLAID engine while maintaining retrieval quality. Under the hood, WARP uses highly-optimized C++ kernels and specialized inference runtimes. It employs an innovative compression strategy using k-means clustering and quantized residual vectors, reducing index sizes by 2-4x compared to baseline implementations. The engine shows excellent scalability, with latency scaling with the square root of dataset size and effective parallelization across multiple CPU threads - achieving 3.1x speedup with 16 threads. This work represents a significant step forward in making neural search more practical for production environments. The researchers have made the implementation publicly available for the community.
posted an update 6 days ago
Exciting Research Alert: Remining Hard Negatives for Domain Adaptation in Dense Retrieval Researchers from the University of Amsterdam have introduced R-GPL, an innovative approach to improve domain adaptation in dense retrievers. The technique enhances the existing GPL (Generative Pseudo Labeling) framework by continuously remining hard negatives during the training process. Key Technical Insights: - The method leverages domain-adapted models to mine higher quality hard negatives incrementally every 30,000 steps during training - Uses MarginMSE loss for training with data triplets (Query, Relevant Doc, Hard Negative Doc) - Implements mean pooling over hidden states for dense representations with 350 token sequence length - Combines query generation with pseudo-labels from cross-encoder models Performance Highlights: - Outperforms baseline GPL in 13/14 BEIR datasets - Shows significant improvements in 9/12 LoTTE datasets - Achieves remarkable 4.4 point gain on TREC-COVID dataset Under the Hood: The system continuously refreshes hard negatives using the model undergoing domain adaptation. This creates a feedback loop where the model gets better at identifying relevant documents in the target domain, leading to higher quality training signals. Analysis reveals that domain-adapted models retrieve documents with higher relevancy scores in top-100 hard negatives compared to baseline approaches. This confirms the model's enhanced capability to identify challenging but informative training examples. This research opens new possibilities for efficient dense retrieval systems that can adapt to different domains without requiring labeled training data.
View all activity

Organizations

MLX Community's profile picture Social Post Explorers's profile picture C4AI Community's profile picture

singhsidhukuldeep's activity

upvoted an article 7 months ago
view article
Article

Making LLMs lighter with AutoGPTQ and transformers

โ€ข 39
upvoted 2 articles 9 months ago
view article
Article

LLM Comparison/Test: Llama 3 Instruct 70B + 8B HF/GGUF/EXL2 (20 versions tested and compared!)

By wolfram โ€ข
โ€ข 61
view article
Article

Train custom AI models with the trainer API and adapt them to ๐Ÿค—

By not-lain โ€ข
โ€ข 33