Retrieval-Augmented Spatial Query System with NVIDIA Embeddings and Google AI Studio

This system uses a Retrieval-Augmented Generation (RAG) approach to dynamically convert natural language queries into SQL statements that operate on a PostGIS database containing vector layers and their metadata.

1. Technologies Used

Google Agent Development Kit (ADK), Google AI Studio (Gemini models), Retrieval-Augmented Generation (RAG), NVIDIA Embedding Models, PostGIS (PostgreSQL with spatial extensions), Layer Metadata Tables, Spatial SQL (e.g., ST_Intersects, ST_Within), Query E

2. Project Description

This architecture forms the foundation of my capstone project, which focuses on building an AI-powered spatial query system using fully open-source technologies. At its core, the system applies a Retrieval-Augmented Generation (RAG) pipeline that combines vector similarity search (powered by NVIDIA embeddings) with LLM-based SQL generation, targeting a PostGIS database of spatial layers and metadata.

The project is currently in development, and future enhancements will include the integration of GeoServer to dynamically publish query results as WMS/WFS services, enabling rich map-based visualizations. It will also support automated generation of charts, graphs, and statistical summaries, providing a complete analytical interface for spatial data.

In addition to system development, a key goal of the project is to benchmark multiple large language models (LLMs)—comparing latency, accuracy, token cost, and overall performance to identify the most effective models for spatial query interpretation and SQL generation.

By leveraging a fully open-source stack, this project aims to deliver a transparent, extensible, and scalable solution for intelligent geospatial analysis.

3. Key Features

4. Why This Project?

Geospatial data is rapidly growing in complexity and volume, yet interacting with it still requires technical expertise in SQL, GIS platforms, and spatial data structures. This project was conceived to bridge the gap between natural language and spatial analysis, enabling users—regardless of technical background—to ask geospatial questions and receive accurate, map-based and statistical answers.

By combining open-source geospatial tools, advanced AI models, and a RAG-based architecture, the project demonstrates how AI can simplify spatial workflows, reduce reliance on manual querying, and promote accessibility in data-driven decision-making.

It also serves as a platform to benchmark the performance of leading LLMs in a spatial context, offering insight into their real-world usability in terms of accuracy, latency, and cost. Ultimately, this project addresses a real need for intelligent, user-friendly spatial analytics tools that are scalable, transparent, and open to all.

Visit Project

Created on: April 30, 2025