Generative AI - How it works under the hood for image prompt
Generative AI is all over the world, and we can witness that many countries have started to deep dive into this field now. Yes, Generative AI! After ChatGPT's invention, there are many theories about job cuts, like whether AI will cut down jobs or increase opportunities for humans. Though it is a debate, it is worth having.
When you type a prompt in "ChatGPT" or any other AI-driven chatbot, how does it process your response and send back the results with 90-95% accuracy? As an end user, all you worry about is typing the question and getting an answer, right? Have you ever wondered to ask yourself what's going on in the backend? How does it process your prompt? What are the components involved to validate your prompt and build the response? In this article, let's dive in.
For the purpose of this article, I will take a simple prompt like "Create an image that shows a horse running on the seashore." and will explain how a AI Driven Chatbot can be configured using AWS Services
We will cover the architecture that will work in AWS AI Services with a few critical components, i.e.,
- AWS Bedrock
-- Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.
- AWS Lambda (Serverless)
-- Run code without provisioning or managing servers, creating workload-aware cluster scaling logic, maintaining event integrations, or managing runtimes.
- AWS S3 (for storage)
-- Amazon S3 is an object storage service that offers industry-leading scalability, data availability, security, and performance.
The moment you type this "prompt", the REST API (from the application) will send this "event" from the application side and pass this event to the "Lambda" function via an API gateway that is configured in AWS. The Lambda function will trigger the Python program (configured with "Bedrock - Stability AI" parameters). Once the Lambda function triggers Bedrock, it in turn passes the prompt and the necessary configuration to Stability AI services.
Stability AI then processes this prompt and returns the response in the form of an object. This object is then converted into a base64 image before it gets stored in AWS S3. Once this process is executed successfully, a pre-signed URL gets generated before the results are delivered to the user who entered the prompt.
If you look at it, as a user you just type the prompt, but in the backend, multiple components are invoked to process your prompt and give back results in fractions of a second. Here, scalability and high processing computation are critical, thus all AI projects require GPUs (not CPUs). So by now, I hope you have a high-level picture of how Generative AI works using prompt engineering. Many other components are glued together to execute this process, but I highlighted very few that are critical for this use case. In the next article, we will go through another use case with a detailed architecture review.