ChatGPT Code Interpreter – this, changes everything

Etienne Oosthuysen
Jul 9, 2023
2 min read

Updated: Jan 8, 2024

I have been waiting for access to Code Generator in ChatGPT for some time and was finally able to activate it this morning (note that it is still only available to a subset of all ChatGPT users). Officially, Code Interpreter, a plugin for ChatGPT will support data analysis, image conversations, code editing and more. A real side kick for data analysts.

Important update 8 Jan 2024 - Code Interpreter is renamed Advanced Analysis.

What will Code Interpreter do – let's highlight two exciting capabilities

Code Interpreter can help users write, execute and then test code. Its current focus is Python, but this will be extended in future. This functionality will also be baked into Github copilot.
Code interpreter can also act as a copilot for data analysts. This is the focus of this article, and it's genuinely exciting.

Be my data analyst!

The initial simple stuff

In the video below I get Code Interpreter to do some basic data analysis stuff:

First, I upload a dataset. My good old, trusted dataset about Video Game Sales across multiple years, genres, publishers, etc. from Kaggle. And then on to some simple stuff.
I ask a simple question of the data.
I then get Code Interpreter to do some initial analysis.
I then use Code Interpreter to analyse the data by another dimension.
I finally get Code Interpreter to visualise the result.

And what about accuracy?

On the left are results from Code Interpreter and on the right, those in a manual pivot table. Completely accurate.

Now something somewhat more complex

In the second video:

I get Code Interpreter to visualise the result again by a second dimension. Note how I watch the code generation in real time.
I then ask Code Interpreter to highlight any notable trends.
I then ask Code Interpreter some topical questions of my data.

And how does it fair with multiple tasks?

Dare I say, can it generate a final report with multiple metrics?

Voila!

Conclusion

Admittedly, I've only tested this on a relatively small dataset of 16K records, but it performed well, demonstrating accuracy and impressive execution. I also tested this on unstructured data and received equally impressive results.

In addition to this test drive, a good stress test would be:

Uploading a larger, more complex dataset with some data quality issues.
Performing more comprehensive and in-depth analysis on such a dataset.

It is of course not yet clear how this model will be integrated into corporate environments such as Azure, or what the plans are to incorporate it into production data pipelines. For instance, there might be a need for the ability to lock in useful queries and automate their outputs over live data pipelines.

However, this indeed seems like a significant advancement in the realm of data analysis and an exciting indicator of what is to come. A side kick of data analysts!

A word of caution

Avoid loading confidential data into the Code Interpreter as this is an open ChatGPT platform. I suspect that a version designed for Azure is likely in the works.