Update

Open-sourcing SQLCoder-70B, the state of the art in text to SQL generation

Making our our text to SQL model 30 percentage points more accurate over 5 months

Rishabh Srivastava, Wendy Aw, Wong Jing Ping

Defog Staff

Jan 30, 2024

Open-sourcing SQLCoder-70B

We are thrilled to open-source SQLCoder-70B today. SQLCoder-70B is the largest in the SQLCoder family of models. In gets 93% accuracy on text to SQL tasks and schemas not seen in training, outperforming GPT-4, Claude, and CodeLlama-70B by some margin.

You can download it on Huggingface, explore it on Github, or interact with a live demo on our website.

Creating SQLCoder

SQLCoder-70B was created by fine-tuning CodeLlama over our hand-curated dataset. It is our most capable model, and forms a key component in our agents product that powers complex workflows in healthcare, finance, and government.

We hope that the community with benefit from the powerful open-source model geared towards data analysis.

Results

SQLCoder-70B performs exceptionally well across all categories, and outperforms all general-purpose commercial models in the effort.

|               | date | group_by | order_by | ratio | join | where |
| ------------- | ---- | -------- | -------- | ----- | ---- | ----- |
| sqlcoder-70b  | 96   | 91.4     | 97.1     | 85.7  | 97.1 | 91.4  |
| sqlcoder-34b  | 80   | 94.3     | 85.7     | 77.1  | 85.7 | 80    |
| gpt-4         | 64   | 94.3     | 88.6     | 74.2  | 85.7 | 80    |
| sqlcoder2-15b | 76   | 80       | 77.1     | 60    | 77.1 | 77.1  |
| sqlcoder-7b   | 64   | 82.9     | 74.3     | 54.3  | 74.3 | 74.3  |
| gpt-3.5       | 68   | 77.1     | 74.2     | 34.3  | 65.7 | 71.4  |
| claude-2      | 52   | 71.4     | 74.3     | 57.1  | 65.7 | 62.9  |

Explore the model

Explore SQLCoder-70B on our demo page, email us at founders AT defog.ai, or say hi on X! We are at @defogdata!


← More blogs