Unveiling the Marvel of Generative AI: Where Technology Meets Magic
At the time when science fiction writer Arthur C. Clarke formulated the third part of Clarke’s Three Laws—'any sufficiently advanced technology is indistinguishable from magic’—little did we know that the world of technology would soon tread upon the very realms of enchantment. From OpenAI’s ChatGPT to Google’s Bard and Midjourney’s image generator, Generative AI has become one of the seven world wonders. But what really lies at its core?
What is Generative AI?
Generative AI is a type of artificial intelligence technology that has the capacity to generate numerous types of content based on the data it has been trained on. Generative AI technologies essentially utilize the plethora of data they are trained on to generate content across multiple formats, including text, video, images, speech, and graphics, often within a matter of split seconds. While some are built for text-based content, we have others built for image generation. Those used for generating textual content are called Large Language Models (LLMs) - you may have heard of these terms used to describe OpenAI’s GPT-4/ChatGPT, or Google’s Bard. Generative AI technology can also be used to generate imagery as we see with image generators such as Midjourney and Stable Diffusion, audio, e.g. VALL-E, or even code - as we see with Copilot.
I can already imagine you’re thinking in your mind, “if this isn’t magic, I don’t know what else is”. Well, a little surprise! Generative AI actually isn’t magic. As a matter of fact, it’s very far from magic (of course, you knew this, right?)
Generative AI is made possible by some recent advances in artificial intelligence technology such as Transformers and Diffusion Models. Yes, I know these technical jargons can sound a bit intimidating, so let’s go through a basic breakdown of how this artificial intelligence technology actually works.
How Does Generative AI Work?
∙ The Prompt
Think of a prompt as a query. It’s the first step of the process of utilizing Generative AI technologies and is the part provided by the user. This prompt could come in various forms, such as text, image, video, or any other input that the AI system has been trained (or built) on. Let’s focus on the text bit in this case though – seeing that’s the most common today. With generative AI technology, all you have to do is, input your content or request in whatever form, (even when your input has grammar errors in it) This is possible because most times, AI systems trained on billions and trillions of data can still make sense out of text input that normally wouldn’t make any sense to a basic search engine system. How you construct your prompt may also affect the quality of your result. For example, you might specify that the AI provides results in bullet form, in steps, or as a very short text. There is an emerging discipline called prompt engineering which captures how prompts might be crafted and refined to generate high quality answers from the AI system. Who knows, prompt engineers might be the most important job in the future!
∙ The AI Algorithm
Now, I like to consider this stage as where the heavy lifting happens. The algorithm is the heart of Generative AI applications. After the AI system has received a prompt, it then makes use of a sophisticated algorithm to learn from the vast amount of data it has to provide a response. Consider the algorithm like those mathematical formulas that you disliked back in the day. The Algorithm is a complex mathematical model that has been trained on vast amounts of data. At this stage of the process, the Generative AI technology essentially uses the initial prompt provided to map out patterns, relationships, and relevant features from the data it has been trained on.
∙ The Magical Result!
And, voila! Here’s where the magic happens! The Generative AI system has received a prompt, done the heavy lifting in the backend with the algorithm it’s been trained on, and then provides (most times), a human-like response or output, in this case. The most interesting part of this is that Generative AI systems have the capability to make sense out of incomplete, incorrect, or even incoherent prompts given to it. For example, if the prompt given is just "Once upon a time…" the Generative AI can use the trained algorithm to produce a captivating story with characters, plot twists, and a conclusion.
Popular Examples of Generative AI Applications
Now that we know what Generative AI really is, are there other Generative AI applications beyond the popular ChatGPT? Of course, there are! Let’s do a dive into some of them, including the well-known ChatGPT.
1. Gemini:
Publicly released in March 2023, Gemini is Google’s response to OpenAI’s ChatGPT conversational generative AI chatbot. Google’s Gemini is built on the Pathways Language Model 2 (PaLM 2) and is generally used for text-to-text content formats. At the time of writing this article (August 2023), it is currently available in 46 languages and 238 countries.
2. ChatGPT:
Officially released in November 2022, ChatGPT (which I’m sure you already know) is another text-based generative AI chatbot developed by OpenAI. It’s trained to engage in human-like conversations and provide answers to queries ranging from code debugging, business plan writing, and even some form of basic therapy (tread carefully here though) ChatGPT is built on large language models such as the Generative pre-trained Transformer 3.5 and 4 (GPT-3.5 and GPT-4). One very interesting fact about ChatGPT is how it is able to remember and continue earlier conversations it has had with the user.
3. Midjourney:
Unlike Google’s Gemini, Midjourney is for image generation. The first version of the generative AI system was released in February 2022 and it basically uses text prompts to generate artistic images for the user. Midjourney is currently only accessible through a Discord server through which users can prompt the AI system and it in turn generates an artistic-looking image.
4. Stable Diffusion:
Introduced in August 2022, Stable Diffusion is a deep learning, text-to-image generative AI model originally developed through the collaborative efforts of CompVis Group and Runway. Stable Diffusion is primarily utilized to generate photo-realistic images from textual prompts.
Parting Thoughts
As you can see, in this captivating realm of Generative AI where technology converges with magic, the potential for disruption definitely knows no bounds. With the ability to revolutionize the way we currently do things—the most being, getting started from a white canvas—Generative AI becomes an invaluable ally for those grappling with the dreaded creative block. Whether it's crafting a story, generating an artwork, or even kickstarting a comprehensive document, Generative AI empowers unhindered creative expression, and unlocks endless possibilities.