ChatGPT and Bard analyse: Dickens – A Christmas Carol

A Christmas Carol

S-branch traditionally shares a festive message with its followers during the holiday season. Last year, we treated you to a bloopers video from our self-paced online training course, “i2 Analyst’s Notebook – Creating Charts.” This year, we aim to infuse a bit more holiday spirit while highlighting the impactful work undertaken at S-branch.

This year I set myself the task of analysing: Charles Dickens – A Christmas Carol, using nothing more than a browser, a text editor and freely available online tools.

Back in 2018, I wrote a blog post about analysing and understanding unstructured text. It focussed on using a freely available Natural Language Processing tool to process and analyse some online documents. I then used PowerBI and Neo4J to explore this data further. Things have moved on since then, including the biggest and most hyped technology of the last 12 months: AI Chatbots. ChatGPT was released in November 2022, if you’re anything like me, you will have been using it ever since. This year I set myself the task of analysing: Charles Dickens – A Christmas Carol, using nothing more than a browser, a text editor and freely available online tools.

Desired Result

The question I want to ask is: “Who are the characters and how are they linked together?” I wanted to answer the question using Network Analysis Visualisation. Remember, I can only use freely available online tools. I chose Gephi which is a visualisation and exploration software for all kinds of graphs and networks, it’s also open-source and free. Gephi also provides a lightweight browser version called Gephi Lite, which I chose to use.

Gephi Lite comes with a few example graph files. One example is a character analysis of Les Miserables. I took this as inspiration. I wanted to achieve something similar without a) reading the Dickens book myself and b) writing a line of code.

Bard

To gather the necessary data, I turned to Google’s experimental tool, Bard. Despite its experimental status, Bard’s ability to reat and analyse documents proved invaluable. I pointed Bard at this text and started asking questions.

Bard reading the document

Bard was able to tell me which characters appeared in which chapters, I then asked it to put this information into a CSV format. Admittedly Bard initially summarised this information but after a quick clarification, I was able to get the data I wanted. The other great thing about Bard, is it helpfully puts the data into a Google Sheet

Generated CSV
Generating a CSV
Simplified CSV
Simplifying the CSV
Generated Google Sheet

Lastly, I asked Bard to add a column, counting how many times a character appears in a chapter. As before, it generated a Google Sheet for me too.

Adding a count to the CSV.

At this stage, I had the data I needed. I now needed to convert this data into a Gephi format (gexf), that could be opened in Gephi Lite. The gexf format is essentially an XML format, so I’d need to get Bard to generate code. I tried at length to get Bard to generate valid gexf, with no luck. Again, Bard is in an experimental stage and things are improving all the time.

Chat GPT

I switched over to ChatGPT and copied the data from the CSV/Google Sheet. I then asked it to generate the data in a gexf format. It’s worth considering your wording when asking an AI Chatbot to generate code. The resulting code was copied into a text editor (I used NotePad++) and saved as a gexf file. ChatGPT managed to generate a valid gexf file straight away.

ChatGPT generating a gexf file.

The resulting file in Gephi Lite looked very much like the image below. The only addition I made was adding colour to the nodes, Green for Characters and Grey for Chapters. I asked ChatGPT to do this, but on the first attempt it failed (although it did generate a valid gexf). I had to look at the Les Miserables example and “trained” ChatGPT on how to generate colour. It learnt quickly and was able to generate a valid gexf file with colour.

An AI generated gexf file

Gephi allowed me to run a Force Directed layout on the data. It also has a useful feature where highlighting a chapter highlights the related characters.

Focusing on the characters

I wanted to take the analysis a step further and focus on the characters. For this, I returned to Bard. Bard was able to bring back the original CSV, I then asked it to bring the data back in a different way.

Characters mentioned togeather.

I then took this data, copied it into ChatGPT and asked it to generate another Gephi file.

ChatGPT generating a character relationship Gephi chart

The resulting Gephi chart looked very much like the one below. It’s worth noting that I did ask ChatGPT to size each node depending on how many connections it had. While it generated the data well, I had to “train” it to use the correct format. It’s worth mentioning that it managed to generate a valid gexf file every time I asked it!

The final generated Gephi file
“Training” ChatGPT

Summary

Reflecting on this journey, it’s evident how technology has advanced in five years. When I wrote the original post on NLP I had to write code and install Java to get similar results. The example above is silly and perhaps a little oversimplified, but I was surprised how quickly I could get results.

While this exercise explored the potential of using Bard for document analysis, it also revealed concerns about consistency. Repeatedly asking the same question yielded varying responses, raising doubts about its suitability for real-world data analysis. Additionally, sharing sensitive data with any AI chatbot is inherently risky. Therefore, it might be more prudent to utilize ChatGPT’s script generation capabilities on a prepared data format (rather than giving it data) instead. This way, visualization wouldn’t require repeatedly exposing actual data to Bard.

S-branch is a consultancy company focusing on data analytics software and technology, specifically for looking for Fraud and Crime. We offer consultancy on products in this space and have helped a number of our clients get the most out of their data. If you need help with your data or would like to get out of your existing software investment, please contact us.

Leave a Reply

Your email address will not be published. Required fields are marked *