Kuldeep Singh Sidhu

singhsidhukuldeep

https://singhsidhukuldeep.github.io

AI & ML interests

😃 TOP 3 on HuggingFace for posts 🤗 Seeking contributors for a completely open-source 🚀 Data Science platform! singhsidhukuldeep.github.io

Recent Activity

posted an update 2 days ago

Fascinating deep dive into Swiggy's Hermes - their in-house Text-to-SQL solution that's revolutionizing data accessibility! Hermes enables natural language querying within Slack, generating and executing SQL queries with an impressive <2 minute turnaround time. The system architecture is particularly intriguing: Technical Implementation: - Built on GPT-4 with a Knowledge Base + RAG approach for Swiggy-specific context - AWS Lambda middleware handles communication between Slack UI and the Gen AI model - Databricks jobs orchestrate query generation and execution Under the Hood: The pipeline employs a sophisticated multi-stage approach: 1. Metrics retrieval using embedding-based vector lookup 2. Table/column identification through metadata descriptions 3. Few-shot SQL retrieval with vector-based search 4. Structured prompt creation with data snapshots 5. Query validation with automated error correction Architecture Highlights: - Compartmentalized by business units (charters) for better context management - Snowflake integration with seamless authentication - Automated metadata onboarding with QA validation - Real-time feedback collection via Slack What's particularly impressive is how they've solved the data context challenge through charter-specific implementations, significantly improving query accuracy for well-defined metadata sets. Kudos to the Swiggy team for democratizing data access across their organization. This is a brilliant example of practical AI implementation solving real business challenges.

posted an update 5 days ago

Exciting breakthrough in neural search technology! Researchers from ETH Zurich, UC Berkeley, and Stanford University have introduced WARP - a groundbreaking retrieval engine that achieves remarkable performance improvements in multi-vector search. WARP brings three major innovations to the table: - A novel WARP SELECT algorithm for dynamic similarity estimation - Implicit decompression during retrieval operations - An optimized two-stage reduction process for efficient scoring The results are stunning - WARP delivers a 41x reduction in query latency compared to existing XTR implementations, bringing response times down from 6+ seconds to just 171 milliseconds on single-threaded execution. It also achieves a 3x speedup over the current state-of-the-art ColBERTv2 PLAID engine while maintaining retrieval quality. Under the hood, WARP uses highly-optimized C++ kernels and specialized inference runtimes. It employs an innovative compression strategy using k-means clustering and quantized residual vectors, reducing index sizes by 2-4x compared to baseline implementations. The engine shows excellent scalability, with latency scaling with the square root of dataset size and effective parallelization across multiple CPU threads - achieving 3.1x speedup with 16 threads. This work represents a significant step forward in making neural search more practical for production environments. The researchers have made the implementation publicly available for the community.

posted an update 6 days ago

Exciting Research Alert: Remining Hard Negatives for Domain Adaptation in Dense Retrieval Researchers from the University of Amsterdam have introduced R-GPL, an innovative approach to improve domain adaptation in dense retrievers. The technique enhances the existing GPL (Generative Pseudo Labeling) framework by continuously remining hard negatives during the training process. Key Technical Insights: - The method leverages domain-adapted models to mine higher quality hard negatives incrementally every 30,000 steps during training - Uses MarginMSE loss for training with data triplets (Query, Relevant Doc, Hard Negative Doc) - Implements mean pooling over hidden states for dense representations with 350 token sequence length - Combines query generation with pseudo-labels from cross-encoder models Performance Highlights: - Outperforms baseline GPL in 13/14 BEIR datasets - Shows significant improvements in 9/12 LoTTE datasets - Achieves remarkable 4.4 point gain on TREC-COVID dataset Under the Hood: The system continuously refreshes hard negatives using the model undergoing domain adaptation. This creates a feedback loop where the model gets better at identifying relevant documents in the target domain, leading to higher quality training signals. Analysis reveals that domain-adapted models retrieve documents with higher relevancy scores in top-100 hard negatives compared to baseline approaches. This confirms the model's enhanced capability to identify challenging but informative training examples. This research opens new possibilities for efficient dense retrieval systems that can adapt to different domains without requiring labeled training data.

View all activity

Organizations

singhsidhukuldeep's activity

New activity in maxiw/hf-posts 3 months ago

Update Request

#2 opened 3 months ago by

singhsidhukuldeep

New activity in TechxGenus/Mistral-Large-Instruct-2407-AWQ 7 months ago

The model can be started using vllm, but no dialogue is possible.

#2 opened 7 months ago by

SongXiaoMao

Adding chat_template to tokenizer_config.json file

#3 opened 7 months ago by

singhsidhukuldeep

Script request

#1 opened 7 months ago by

singhsidhukuldeep

New activity in casperhansen/mistral-large-instruct-2407-awq 7 months ago

Requesting script

#1 opened 7 months ago by

singhsidhukuldeep

New activity in open-llm-leaderboard/open_llm_leaderboard 7 months ago

Increasing upper limit of `Select the number of parameters (B)` to support larger open-source models like `meta-llama/Meta-Llama-3.1-405B-Instruct`

#858 opened 7 months ago by

singhsidhukuldeep