Over the past six months, Defog has collaborated with pioneering enterprises that have become some of the first in their industries to implement and scale LLMs for SQL generation. As we've transitioned from experimental to practical applications in enterprises, we've realized that even the best-performing model isn't enough. It requires much more to go the last mile, especially to align a model with the nuanced needs of each business. This guide shares key components to productionize LLMs for SQL generation in enterprises.

The Essential Foundations: Metadata and Golden Queries

Effectively productionizing LLMs into your data workflow fundamentally relies on robust metadata and golden queries. These elements serve as crucial teaching tools for the model, enabling it to comprehend your specific database schema, internal business rules, and assumptions.

Golden queries, in particular, function as benchmarks for the desired output, directing the model towards generating SQL statements that accurately mirror complex business queries. Let's consider an example: "What was the total sales amount on January 1, 2024?"

Without Golden Queries:

SELECT total_sales
FROM sales_data
WHERE date = '2024-01-01';

With Golden Queries:

SELECT SUM(amount) AS total_sales
FROM sales_transactions
WHERE transaction_date = '2024-01-01'
	AND status = 'Completed';

The latter example, informed by a golden query, specifies the exact fields and conditions relevant to the business's needs and also incorporates internal conventions and business logic (e.g., filtering by 'Completed' transactions).

The Path to Excellence: From 90% to Near-Perfect Accuracy

Achieving near-perfect accuracy with LLMs in SQL generation is mundane but vital work. The leap from 90% to 99%+ accuracy involves a meticulous process dependent on granular feedback and iterative training. This stage is where the difference between a useful tool and a transformative solution is made.

The challenge is compounded when detailed documentation of business rules is scarce. In such scenarios, dedicating a team member (often supported by an intern) to oversee the model’s learning process has proven effective for our customers.

We have seen the best results by training the model iteratively, rather than with a one-shot approach. By subjecting it to cycles of feedback and refinement, it learns to navigate the intricacies and edge cases unique to your enterprise's data landscape.

As a result, the typical progression in accuracy goes something like this, with the final stage taking the most time:

Base model: ~90% accuracy

Enhanced with high-quality metadata and golden queries: ~95% accuracy

Refined through iterative training and feedback: 99%+ accuracy

Strategic Implementation: Start Small, Aim High

Initiating a pilot implementation requires a strategic approach, even with the dataset you choose. Pick a well-defined, yet challenging dataset to lay the groundwork for success. This helps test the model’s capabilities to the fullest and is crucial for building trust in the tool within your organization.

The pilot should be designed with minimal integrations, focusing on demonstrating value through tangible improvements in workflow efficiency. When the model is performing reliably on your pilot dataset, present it to end-users and let them test it.

AI capabilities increase with more access to data, but so does complexity. If you can prove substantial value with a small dataset, it will be easier to secure resources and scale to other areas and teams down the line.

So much of our attention is on the magic of AI models, but it’s the mundane aspects like metadata, granular feedback, and project leadership that make it a reality.

← More blogs

A Practical Guide to Implementing Text-to-SQL LLMs in Enterprises

The Essential Foundations: Metadata and Golden Queries

The Path to Excellence: From 90% to Near-Perfect Accuracy

Strategic Implementation: Start Small, Aim High