Demystifying OpenAI concepts - GPT, DALL-E, Codex, Copilot, OpenAI API, Playground, and Prompting
Updated: 4 days ago
The world of artificial intelligence can be complex and overwhelming, especially when it comes to understanding the various concepts related to GPT (Generative Pretrained Transformer) technology. In this blog post, I will demystify the most important concepts related to GPT, including OpenAI, GPT, ChatGPT, OpenAI Service in Azure, Copilot and more. Use this article as a cheat sheet that provide a clear and concise understanding of these essential GPT concepts.
Concepts discussed are listed below with links to the article section if you want to jump ahead. I conclude with a simple image which is my interpretation of how this all fits together:
What or who is OpenAI ?
It is firstly useful to understand what OpenAI is: it is a research institute that specializes in artificial intelligence (AI) and its applications. Their stated goal is to "promote and develop safe and beneficial AI that benefits all of humanity."
Some of the main services they created are:
GPT (Generative Pre-trained Transformer), a large language model (LLM) developed by OpenAI that uses machine learning algorithms to generate human-like text in response to user inputs. It can be used for a wide range of natural language processing (NLP) tasks such as language translation, question answering, and text generation.
Codex - a model that converts natural language into code. This is also known as GitHub CoPilot.
DALL-E - a model that can produce images based on a natural language description.
What is LLM?
A large language model (LLM) is an artificial intelligence system designed to understand and generate human-like text based on a given input. These models are trained on vast amounts of text data, allowing them to learn grammar, syntax, facts, and some reasoning abilities from the patterns present in the data. As a result, they become capable of generating coherent, contextually relevant, and sometimes even creative responses to text inputs.
One of the most well-known large language models is OpenAI's GPT series (Generative Pre-trained Transformer). There are two types of LLM's:
Base LLM - trained to predict the next word, based on its training of large amounts of information. If you for example stated "Once upon a time there was a dog", the LLM may complete this as "Once upon a time there was a dog that lived in a kennel with other dogs.". But if you stated, "what is the capital of South Australia", then it may respond with "what is the weather in South Australia" and "what is the population of South Australia" as those text sets could quite possibly be listed often together in the information on which the LLM was trained.
Instruction tuned LLM - trained for follow instructions. With an instruction tuned LLM, the question "what is the capital of South Australia" will likely receive a response of "The capital of South Australia is Adelaide". An instruction tuned LLM normally starts as a base LLM, and then trained through:
Inputs and outputs via instructions, and then
RLHF (reinforcement learning through human feedback) and learning from this human feedback.
Most applications which use API calls to interface with LLM's have a specific purpose. If you for example created an application for your business that questions from the data of your business, you would for example not want it to respond about chicken recipes. So you would likely want to employ instructed tuned LLM's rather than use base LLM's. And what is important here are clear and specific instructions via your prompts (for more information on this, please see prompt engineering).
What is GPT?
GPT is a large language model (LLM) which is a Generative Pre-trained Transformer, developed by OpenAI. The model learns the relationships between words in a sentence or sequence of text and is pretrained on large amounts of text data and can be fine-tuned on specific NLP tasks, allowing them to perform with high accuracy and efficiency. GPT had a number of iterations:
GPT-1, released in 2018, was the first version of the GPT model. It was trained on a large volume of text data and achieved state-of-the-art performance on several natural language processing tasks.
GPT-2, released in 2019, was is a larger and more powerful version of GPT-1. It was trained on an even larger volume of text data and was able to generate human-like text that is often difficult to distinguish from text written by humans.
GPT-3, released in 2020, with 175 billion parameters, making it one of the largest AI models ever developed. It can perform a wide range of natural language processing tasks, including language translation, question-answering, and text completion, all with impressive performance.
GPT-3.5, is a cross between GPT-3 and GPT-4. One of the main goals here was to increase the model speed.
GPT-4, improvements include a greater ability to process more nuanced instructions, which is an improvement over GPT-3, which often made logic and other reasoning errors when faced with more complex prompts. Another key distinction between GPT-3 and GPT-4 lies in their size. GPT-3 boasts 175 billion parameters, while GPT-4 takes it to, allegedly, 1 trillion parameters.
What are Parameters?
Parameters represent the relationship between input and output through weights which is learned during training. It is essentially the knowledge and understanding the model has acquired from the text data it was trained on which is essential for the model's ability to perform well on specific NLP tasks.
What is ChatGPT and is it the same as GPT?
ChatGPT is a specific implementation of the GPT architecture, which has been fine-tuned and trained on specifically conversational data, such as conversational text data held in online chat logs, so that it could learn how to generate human-like responses to text inputs. ChatGPT is designed to be used for conversational interactions. ChatGPT is therefore well-suited for use in chatbots and other conversational AI applications. In essence, chatGPT allows you to interact with GPT-3.5, or GPT-4 in real-time through a chat interface.
What is OpenAI Service in Azure?
Azure OpenAI is a partnership between Microsoft Azure and OpenAI to provide access to OpenAI's AI models (GPT-3.5, Codex, and DALL-E), including ChatGPT, and tools through Microsoft's cloud computing platform. It allows developers and organizations to leverage and combine the power of OpenAI's AI models, with Azure resources (technologies) for various applications such as natural language processing, computer vision, data analysis, and reinforcement learning, coding, and many other exciting tasks on the near horizon. It is also worth noting the position of these Micosoft Offerings via OpenAI Service in Azure with regards to the OpenAI API which is discussed later on in this article.
What is Copilot, specifically for Microsoft 365 applications?
Built on GPT-4, Copilot is ChatGPT embedded into Microsoft 365 applications, Word, Excel, PowerPoint, Outlook, and Teams, and an orchestration engine working behind the scenes to combine GPT-4, with the Microsoft 365 applications and your business data in the Microsoft Graph (Teams, SharePoint and OneDrive). See this summary in the article What is Copilot for M365.
Note that there are other Copilots too, such as the GitHUB Copilot (see Codex).
What is Microsoft Graph (from the perspective of 'business data')?
Copilot for M365 can work with business data in Microsoft Graph. Graph is a platform and set of APIs that provide access to data and intelligence from Microsoft 365 services, such as SharePoint, OneDrive, and Teams. This means that Copilot can access things like Users and groups, Teams data, Tasks, Files, Mail, Meetings and calendars, etc.
What is Codex?
Codex is a deep learning algorithm which has been trained on a vast amount of publicly available source code and uses natural language processing (NLP) techniques to analyse and understand the context of the code being written.
What is DALL-E?
DALL-E is a model that uses a combination of neural networks and transformers, trained on a large dataset of image-text pairs to generate its images from textual descriptions.
Summary of differences between GPT, Codex and DALL-E:
A Generative Pre-trained Transformer large language model (LLM)
Trained on large volumes of data
An implementation of GPT and uses natural language processing (NLP) techniques to analyse and understand the context of the text
Fine-tuned and trained on specifically conversational data
A deep learning algorithm and also uses natural language processing (NLP) techniques to analyse and understand the context of the code
Trained on a vast amount of publicly available source code
A combination of neural networks and transformers and also uses natural language processing (NLP) techniques to analyse and understand the context of the image
Trained on a large dataset of image-text pairs
What is Prompt Engineering?
Prompt engineering is a technique used in natural language processing (NLP) that involves crafting a specific prompt or set of prompts to guide a language model's generation of text in a desired direction. It is used to guide the model's output towards the desired outcome. Prompt Engineering is important to ensure GPT, DALL-E and Codex are more useful and accurate. One could say that 'prompting' is how you "program" a model.
This is specifically important for instruction tubed LLM's which requires us being very clear and specific. Please see the full article on Prompt Engineering.
What is the OpenAI API?
The OpenAI API supports tasks that involves understanding or generating natural language (GPT), code (Codex), or images (DALL-E). It includes a wide variety of models (as is describe here) as well as the ability for you to fine tune custom ones. Three important concepts of the API are:
Prompts - the way you "program' your model. See Prompt Engineering.
Tokens - text are processed by breaking it down into tokens and 1token is approx. 4 characters. The word 'hamburger' would equate to 3 tokens, and the word 'car' would be 1.
Models - many models are available, each designed for a slightly different purpose and at a different cost. Models are described here.
Does Microsoft offerings (OpenAI Service in Azure) hit the OpenAI API:
It is worth noting that when working within Azure resources, this is what is stated about the OpenAI API: "Azure OpenAI Service gives customers advanced language AI with OpenAI GPT-4, GPT-3, Codex, and DALL-E models with the security and enterprise promise of Azure. Azure OpenAI co-develops the APIs with OpenAI, ensuring compatibility and a smooth transition from one to the other. With Azure OpenAI, customers get the security capabilities of Microsoft Azure while running the same models as OpenAI. Azure OpenAI offers private networking, regional availability, and responsible AI content filtering." - What is Azure OpenAI Service? - Azure Cognitive Services | Microsoft Learn
What is the OpenAI Playground?
The playground is simply an easier interface to get to the OpenAI API. It is designed to be accessible to a wide range of users of varying technical capabilities and can therefore be used to develop new ideas and applications for AI technology.
Note that the OpenAI Playground is also available in OpenAI Service in Azure.
What is Microsoft Fabric?
Microsoft Fabric aims to empower data and business professionals alike. It's lake-centric and open, seamlessly integrating data from diverse sources and formats, such as Azure Data Lake, Amazon S3, and Google Storage2, into a single logical data lake.
And with the inclusion of AI Copilot, users gain a smart assistant that can help build data pipelines, generate code, construct machine learning and business models, produce insights, govern data, and even monitor data in real-time, triggering actions and notifications.
See the full article here.
These are only some of the concepts and only high-level descriptions of each. But it will hopefully allow you to navigate this exciting domain, just a little bit easier. Below is my interpretation of how this all fits together.