Etienne Oosthuysen
- Jun 21, 2023
- 4 min read

ChatGPT on your own data - 'Azure OpenAI on your data'

Updated: Jun 28, 2023

In a previous article I described how you can use ChatGPT to analyse your data. Here, I describe how to achieve this securely within Azure. There are a number of things in Microsoft's evolving AI portfolio, which is in various stages (planned, private preview and public preview), that will allow data workers to leverage the power Generative AI and Large Language Models, notably GPT, to analyse your corporate data. These include Copilot for Microsoft 365, which will bring ChatGPT to (amongst others) Excel, and of course all the planned AI functionality in the form of Copilot for Microsoft Fabric. And now there's Azure OpenAI on your data! Sweet!

Introducing Azure OpenAI on your data

Announced today in public preview, is something pretty significant: Azure OpenAI on your data. Now, Azure OpenAI Services allows you to connect to your data sources and leverage the power of Azure Cognitive Services, particularly a Cognitive Search index.

With Azure OpenAI on your data, when a user provides a prompt, two things will happen:

(a) Azure OpenAI on your data, and a Cognitive Search Index, determines what data to retrieve based on your user prompt and the preceding conversation history,

(b) The retrieved data is then appended to the original prompt and sent as a new prompt to GPT (within Azure) which uses this information to provide a completion.

All of this is possible over .txt, .md, .html, Microsoft Word, Microsoft PowerPoint, and PDF. And through an easy to deploy app.

What about security

But what about security as this is after all ChatGPT accessing your data? Not really. Yes, it is ChatGPT (or more accurately, gpt-35-turbo and GPT-4 language models) but it is entirely hosted within Azure, and all data therefore remains within the Azure OpenAI service and therefor within the Azure backbone. This is described in more detail here.

Why is Azure OpenAI on your data different than those other things in Microsoft's evolving AI portfolio?

Copilot is due to release within Microsoft 365 applications and within Fabric in the hopefully not too distant future. This means its Generative AI role will enhance the productivity within those technologies. That's awesome if you are a data worker using those technologies.

Azure OpenAI on your data, on the other hand, will allow you to step beyond the confines those technologies and create something really bespoke, yet in an amazingly simple way.

How easy is this really?

For this test drive, I will use a dataset within my Azure Data Lake. It contains over 16,000 records of video games with sales greater than 100,000 copies for the period 1980 through to 2016 and is stored as .txt in a comma separated format.

The original dataset used is available here.

Here are the steps I followed:

Set-up Azure OpenAI on your data and the Cognitive Search Index

A) In my OpenAI Service I selected BYO Data preview:

B) I then selected my data source - Azure Blob Storage. You can upload a file, use an existing Azure Cognitive Search index, or point to your data in Azure Data Lake (Blob Storage). I am sure each of these options have their own pros and cons, but that is a discussion for another time.

C) I added my data source information and allowed Azure OpenAI on your data to create an Azure Cognitive Search index as part of the set up for me.

Data source and Cognitive Search index settings

D) Once the set up and index completed, all that was left to do was to craft a system message. I used "You are an AI assistant, and you are useful for answering questions from video game sales. The dataset has the columns Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales. All sales are in millions. Please answer questions by parsing through all dialogue."

E) The final step was to play around with the session settings and parameters, after which time I was able to do a test drive within the playground chat session.

Test drive in the playground chat

The first prompt, the response and the code is shown below:

Deploy the app

This literally involved a single page configuration:

The final product

Here is a recording of some of my prompt interactions:

Conclusion

There are clearly some issues, both with accuracy of the results returned (this seemed somewhat minor, but inconsistent), and its ability to return results on the first go. BUT this product is still in public preview, and it seems as if issues are expected, especially when you read MSFT material stating, "If you receive incorrect answers, report it as a quality bug". So, this is simply too early for an unequivocal judgement which will have to wait until later in the public preview and once it goes into general availability in the hopefully not too distant future.

However, if it does what I think it can, then this has the potential to move conversational AI and data analysis forward quite a bit. My gut feel is that Azure OpenAI for your data will provide a skeleton only, and that's okay, as it will remove or expedite some of the lower value work. Yet what is truly exciting is what organisations can build on top of this foundation.

Ref: https://learn.microsoft.com/en-us/azure/cognitive-services/openai/concepts/use-your-data