How will ChatGPT change data analytics
Updated: Mar 18
ChatGPT, Open AI and Generative AI is all the rage, and I must admit, pretty impressive. But to the data professionals out there, what is its link with data analytics? And let me be clear, there definitely is a link and a disruption on the near horizon.
But to understand that, lets first look at some key concepts. I then discuss some examples of the role (aka disruption) Generative AI, such as ChatGPT, will play in data analytics.
Update 16 March 2023 - Microsoft announced Azure Open AI which gives users access to GPT-3 and DALL-2 inside their tenancies. More on thos soon in a dedicated article.
What is Generative AI
Generative AI refers to a class of machine learning algorithms that can generate new, previously unseen data that is similar to a given training dataset. These algorithms can be used to generate images, text, audio, or other types of data. There are a few different types of generative models, including:
Generative Adversarial Networks (GANs): GANs consist of two neural networks, a generator and a discriminator, that are trained together in a competitive process. The generator learns to create new data samples, while the discriminator learns to differentiate between the generated samples and the real ones.
Variational Autoencoders (VAEs): VAEs use an encoder-decoder architecture to model the probability distribution of the input data. The encoder maps the input data to a lower-dimensional representation, called a latent space, and the decoder generates new data samples by mapping the points in the latent space back to the original data space.
Autoregressive models: Autoregressive models like Transformer or PixelCNN generate new data by predicting one element of the data at a time.
These models can be used for a variety of tasks, such as image synthesis, text generation, data augmentation, and many other areas where data generation is required.
ChatGPT, Developed by OpenAI, is a type of Generative AI, specifically, it is a type of autoregressive model.
ChatGPT is a pre-trained language model that uses a transformer architecture to generate human-like text and responses to natural language prompts, and it can also perform other natural language understanding tasks. It is trained on a massive corpus of text and can generate text that is similar to the text it was trained on. The model is trained to predict the next word in a sentence, given the previous words in the sentence, which is referred to as the autoregressive property. It can be fine-tuned on a specific task like question answering, language translation, etc.
What is data analytics?
Data analytics is the process of examining, cleaning, transforming, and modelling data to extract useful information and insights. This can include descriptive statistics, data visualization, and machine learning techniques. Data analytics can be used in various industries, such as finance, healthcare, and marketing, to make data-driven decisions and improve business performance.
What role can Generative AI, including ChatGPT, play in data analytics?
The role of ChatGPT in data analytics ranges from being used to successfully explain data and insights, to generate very realistic synthetic data, and even to help code.
In the context of ChatGPT, the model can be used in data analytics to generate natural language explanations or summaries of data, making it easier for non-technical stakeholders to understand and act on the insights. It can also be used to automate the process of generating reports, or to generate human-like responses in chatbots or virtual assistants.
Moreover, ChatGPT can be used to generate synthetic data that can be used to augment existing datasets for training machine learning models. This can help to improve the performance of the models and also to reduce the amount of data that is needed. Additionally, generative models can be used to generate new, unseen data samples, which can be used for data exploration and visualization, anomaly detection, and other tasks in data analytics.
Even more impressive is that ChatGPT can be used to help you code. I recently tried this by giving it a DAX language expression challenge. DAX of course stands for data analytics expressions, a query language used in Power BI. It came back, pretty spot on, including an explanation. This is impressive when you consider the limited context I gave it. It even gave an explanation of the query logic.
The code, in this instance was correct, and only required tweaking to replace into it the actual attributes. In cases where it is not 100% correct, it could be used to generate the initial code which is then available for immediate debugging and therefore expedite the solution tremendously.
Conclusion - what are the likely impacts of ChatGPT on data analytics?
Some schools, universities and education authorities have blocked ChatGPT due to the possibility that it could be used to cheat in assessments or homework. But there is a growing upswell of supporters comparing it to when calculators were first used in school, when some felt that it would lead to a deterioration of maths skills (it did not). Similarly, satellite navigation did not make us dumber as we no longer had to read maps, they simply made getting from point A to point B much more effective. ChatGPT and Generative AI is real, it is here, and opposing it will likely be futile. So the trick will be to assess its potential impact, and use, and adjust practices and policies accordingly (also see a previous article on ethics in AI).
The same with data analytics. For years, data analytics have been moving away from large monolith data warehouse and reporting heavy ecosystems, towards something that can quickly bring insights to those who need it. AI, previously a separate field to data analytics, have become in intrinsic part of it, and this, is simply the next BIG WAGE OF CHANGE, albeit a big one! A tsunami maybe?
(Very) Likely impacts are:
Generative AI models such as ChatGPT may move data analytics even further away from traditional reporting as users may increasingly rely on simply asking the model to interpret large and complex data and the patterns within.
Data engineers and data scientists will increasingly rely on Generative AI to expedite the code and models required within the data analytic workloads. It may also put these fields in the hands of less formal data engineers, scientist and statisticians.
Hypothesis and sandbox environments will be easier to spin up due to the ease at which synthetic data can be produced.
From the horse's mouth - what does ChatGPT say?
When asked "how does ChatGPT provide better data analytics?", this was the response:
A user can obtain visual representations of data.
A user can ask for summary statistics of your data.
A user can quickly obtain easy to understand explanations of complex data relationships.
A user can ask for predictions or forecasts based of data.
My prediction, we could well be seeing data analytics being commoditised even further than in recent years, and at an accelerated pace if we see a more pervasive use of ChatGPT in data analytics.
In subsequent articles I will show some quick and targeted example of how ChatGPT can be used in common data analytics. In this article I start the scratch the surface of using ChatGPT to analysing data - Analyzing data using ChatGPT (makingmeaning.info)
As a matter of interest - I used ChatGPT to critique this article
Here is the response:
The article provides a good overview of the concepts of generative AI, specifically ChatGPT, and its applications in data analytics. The author touches on how ChatGPT can be used to generate natural language explanations, synthesize data, and even help with coding. The article also acknowledges the potential impact of ChatGPT and how it may change the way data analytics is practiced.