Monday, April 1, 2024

Understanding 'Instruct' and 'Chat' Models

In the realm of artificial intelligence, Large Language Models (LLMs) have emerged as veritable powerhouses, capable of processing and generating human-like text with remarkable accuracy. These models, such as OpenAI's GPT series, owe their prowess to rigorous training on colossal datasets harvested from the vast expanse of the internet.

Foundations of LLMs

At the core of LLMs lies an extensive training process that involves exposing the model to billions of words and texts sourced from diverse digital content, ranging from books and articles to websites. This foundational training, devoid of any annotations or labels, provides the model with a raw representation of language, allowing it to discern intricate patterns and meanings—a methodology known as semi-supervised training.

Versatility and Fine-Tuning

Equipped with this wealth of linguistic knowledge, base LLMs exhibit remarkable versatility, capable of performing an array of language-related tasks, including generating conversational responses and crafting content. However, to further enhance their efficacy in handling specific tasks, a process known as fine-tuning comes into play.

Fine-tuning entails subjecting the pre-trained base model to additional training on smaller, specialized datasets relevant to the desired task. Unlike the initial training phase, these datasets come with labels, featuring examples of model-generated responses aligned with specific prompts. For instance, a prompt querying "What is the capital of England?" might elicit a response like "The capital of England is London."

Understanding the Purpose

It's essential to note that the purpose of this labeled data isn't to impart factual knowledge to the model. Instead, it serves as a guide, instructing the model on the expected response format when presented with certain prompts. Fine-tuning thus adjusts the model's parameters to better align with the nuances and requirements of tasks such as question answering, all while preserving its overarching language understanding.

Introducing 'Instruct' and 'Chat' Models

Within the realm of LLMs, two distinct variants have garnered significant attention: 'Instruct' and 'Chat' models. 'Instruct' models emphasize the process of providing explicit instructions or examples to guide the model's behavior. On the other hand, 'Chat' models focus on fine-tuning for conversational tasks, enabling the model to generate contextually appropriate responses in dialogue settings.

Unleashing the Potential

With an 'Instruct' fine-tuned model at our disposal, the possibilities are endless. Prompting questions like "What is the capital of Australia?" should ideally yield responses such as "The capital of Australia is Canberra," showcasing the model's ability to comprehend and respond to specific queries with precision and accuracy.

In essence, the evolution of LLMs, coupled with fine-tuning techniques like 'Instruct' and 'Chat' models, heralds a new era of artificial intelligence—one where machines not only understand language but also engage with it in a manner akin to human interaction. As we delve deeper into this fascinating domain, the potential for innovation and discovery knows no bounds.

Wednesday, March 27, 2024

Setting Up a Multizone Resiliency Environment with IBM Cloud and Terraform

 

In this blog post, we will discuss how to set up a multizone resiliency environment using Terraform and IBM Cloud services. We will cover the steps to create a Virtual Servers on VPC with auto-scale feature. Reference architecture that we will automatically provision can be found at : https://cloud.ibm.com/docs/pattern-vpc-vsi-multizone-resiliency?topic=pattern-vpc-vsi-multizone-resiliency-web-app-multi-zone


Step 1: Clone the necessary repositories
To start, we need to clone the required repositories. We will be using the IBM Cloud Terraform module for VPC landing zone and autoscale, and the terraform-ibm-landing-zone-vsi-autoscale as the parent composite module.

% git clone git@github.ibm.com:client-solutioning/pattern-vpc-vsi-multizone-resiliency.git


Step 2: List available branches

Before we proceed, let's list all available branches in the repository.


% git branch -a


Clone the virtual server autoscale module with specific branch called "init-module"

% git clone --branch init-module https://github.com/terraform-ibm-modules/terraform-ibm-landing-zone-vsi-autoscale.git


Step 3: Update IBM Cloud Plugins


You may need to update IBM Cloud plugins such as VPC infrastructure plugin in my case.


ibmcloud plugin update vpc-infrastructure




Step 4: Create a Terraform variables file


To pass commonly used parameter values easily, we can create a Terraform variables file called terraform.tfvarsIn this file, we can store the API key, public/private SSH key, and other parameters that we will use throughout the deployment process. Otherwise, you need type those information repeatedly when you run `terraform plan` command.


vi terraform.tfvars













































Step 5: Generate SSH public and private keys


We'll generate SSH public and private keys using the "ssh-keygen" command if they haven't been generated previously. These keys are essential for connecting to IBM Cloud services.


$ ssh-keygen -t rsa -b 4096 -C "changwoo.jung@ibm.com"

$ pbcopy < ~/.ssh/id_ed25519.pub

  # Copies the contents of the id_ed25519.pub file to your clipboard


(base) changwoojung@Changwoos-MacBook-Pro .ssh % ssh-keygen -t rsa -b 4096 -C "changwoo.jung@ibm.com" 

Generating public/private rsa key pair.

Enter file in which to save the key (/Users/changwoojung/.ssh/id_rsa): /Users/changwoojung/.ssh/id_rsa_ibmcloud

Enter passphrase (empty for no passphrase): 

Enter same passphrase again: 

Your identification has been saved in /Users/changwoojung/.ssh/id_rsa_ibmcloud

Your public key has been saved in /Users/changwoojung/.ssh/id_rsa_ibmcloud.pub


You can find SSH public key at ~/.ssh/id_rsa_ibmcloud.pub and SSH private key at ~/.ssh/id_rsa_ibmcloud



After ensuring all necessary files are in place, we'll initialize Terraform using the command terraform init. Following initialization, we'll proceed to plan and apply our deployment using terraform plan and terraform apply, respectively.


Step 6: Initialize Terraform Now that we have all the necessary files, let's initialize Terraform.


% terraform init





























Step 7: Plan and apply


% terraform plan


% terraform apply


Upon successful execution, automatically provisioned resources can be viewed at IBM Cloud Resources.




Step 8: Find the hostname for web tier load balancer
Finally, we'll locate the hostname for the web tier load balancer by accessing the URL: http://4a3b1bc4-us-east.lb.appdomain.cloud/.

With these steps completed, you've successfully established a multizone resiliency environment using IBM Cloud and Terraform.




Tuesday, March 12, 2024

Enhancing Large Language Model (LLM) Development with LangChain: Simplifying Complexities and Streamlining Workflows

 


LangChain (https://github.com/langchain-ai/langchain), an open-source framework, is rapidly gaining recognition as a go-to solution for LLM application development. By offering a streamlined approach to common tasks in LLM applications, LangChain ensures that developers can write more efficient and cleaner code. It does not inherently introduce new capabilities to LLMs but rather simplifies the implementation process. One of the most significant challenges in LLM development lies in handling the intricate "orchestration" required for these models.

LangChain addresses this issue by providing a comprehensive API, enabling developers to manage various aspects more effectively: 
  • Prompt Templating: LangChain simplifies the process of creating and managing prompt templates for diverse use cases, ensuring consistency and reducing manual effort. 

  • Output Parsing: The framework offers built-in functionality for parsing LLM output, allowing developers to extract specific information with ease and accuracy. 

  • Sequence Management: LangChain streamlines the creation and management of a series of calls to multiple LLMs, enabling more efficient workflows and reducing coding overhead. 

  • Session State Maintenance: With LangChain, managing session state between individual LLM calls becomes effortless. This memory-based support ensures that context remains consistent throughout the application flow. 

  • RAG Support: LangChain provides native support for RAG (Reject, Accept, and Grace) patterns, ensuring developers have greater control over their applications' decision-making capabilities. 

This typical summarization use case requires a lot of “orchestration” and “utility” code. LangChain provides an API to simplify implementation










Expanding the Capabilities of Generative AI with Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) represents an innovative advancement in generative artificial intelligence, enabling us to leverage data "beyond the model's scope." This refers to external information that was not integrated into the model during training. By integrating RAG, we can add essential guardrails to the generated output and minimize instances of hallucination. 

RAG offers valuable applications across various generative AI use cases, such as: 

  • Question-answering systems
  • Text summarization
  • Content generation 

To better grasp RAG's functionality, consider a human interaction analogy. Imagine providing a document to an individual and requesting they generate an answer based on the information within that document.

The two primary components of RAG are: 

  1. Retrieval step: In this stage, we search through extensive knowledge bases (documents, websites, databases, etc.) to identify data relevant to the model's instructions. 

  2. Generation step: Similar to traditional generation use cases, this phase generates a response or content based on the information retrieved during the initial step. The crucial distinction lies in how the retrieved information is utilized – it serves as context and is incorporated into the prompt provided for the generative model.

In the realm of vector databases, unstructured data is transformed and stored as numeric representations, which are commonly referred to as embeddings for AI applications. Embeddings are derived by employing embedding models such as word2vec. Consider the illustrative example of a semantic search for the term "computer". Observe the closest matching words in the table below, with distances measured numerically since the search was executed utilizing vector (numeric) data. Feel free to test additional words on this platform. https://projector.tensorflow.org/











Tuesday, March 5, 2024

Harnessing the Power of Generative AI for Business

In the ever-evolving landscape of computer science, Artificial Intelligence (AI) has seen its fair share of enthusiasm and skepticism over the decades. The early 2010s brought a wave of excitement as IBM Watson's triumph on Jeopardy! against human champion Ken Jennings (2011), followed by Google's AlphaGo defeating world champion Lee Se-dol in Go around 2016, fueled the belief in AI's potential applicability beyond imagination. At that time, I didn't envision AI being utilized for general domain Q&A or problem solving. However, with the emergence of foundation models like ChatGPT in 2022, my perspective was broadened significantly. 

While businesses entertain the idea of adopting these capabilities, it is crucial to recognize that enterprise application varies greatly from personal exploration. Security concerns, model understanding, explainability, and data privacy are non-negotiable. To effectively integrate AI into Enterprise applications, it must be designed with four key aspects: Openness, Targeted Application, Trustworthiness, and Empowerment.

  • Openness: Openness highlights the significance of making AI technologies accessible to all. By sharing knowledge, research, and tools, we foster collaboration and innovation. Open AI frameworks, datasets, and algorithms encourage a broader range of individuals and organizations to contribute to AI development.

  • Targeted Application: Targeted Application: Targeted application refers to deploying AI in specific problem areas where it can yield significant positive impacts. Sectors such as healthcare, education, climate change, or poverty alleviation offer opportunities for AI optimization and tackling complex challenges. Examples include talent/HR productivity enhancement, customer service through conversational AI, and app modernization using automated code generation and transformation capabilities.

  • Trustworthiness: Trust is essential for successful AI adoption. Transparency and accountability are vital when designing and deploying AI systems. Explainable algorithms enable users to understand decision-making processes, addressing biases and ensuring fairness. Data privacy and security safeguard individuals' rights and prevent misuse.

  • Empowerment: The future of AI lies in empowering individuals rather than replacing them. AI technologies should augment human capabilities, enhancing efficiency and effectiveness. They can foster creativity, productivity, and better decision-making abilities. 

Benchmarking LLMs in the Wild

 

Have you ever wanted to compare the results of different LLM models from the same prompt? Enter https://chat.lmsys.org/, a powerful tool for comparing the performance of different LLM models on the same task. With this tool, you can easily input multiple LLM models and their corresponding outputs, and then view a comparison of their results side-by-side. You can easily compare the performance of different LLM models on the same task, and gain valuable insights into their strengths and weaknesses. 

Here is the example that I used the same prompt from my previous post.


Explain me the following command.

podman run -d -p 3000:8080 --network slirp4netns:allow_host_loopback=true -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main 



Happy modeling! 


Local LLM deployment framework - Ollama

Are you looking for an easy way to run LLM on your local environment? Ollama seems like a straightforward option. Open-WebUI (formerly known as Ollama-webui) even gives you a chatGPT-like web interface! To get Ollama up and running, you just need to follow a few simple steps.

Step 1: Install Ollama locally and deploy your preferred LLM. For example, you can run the command

 $ ollama run llama2 

to download the llama2 model. Ollama supports a variety of models available on ollama.com/library.

Step 2: Run Open-WebUI to get a web interface. To do this, run the command 

$ podman machine start, then 

$ podman run -d -p 3000:8080 --network slirp4netns:allow_host_loopback=true -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main.

This will start the Open-WebUI container and map port 3000 on the host to port 8080 in the container, allowing incoming HTTP traffic on port 3000 to be forwarded to the container. Now, open your browser and navigate to http://localhost:3000 to access the web interface. 


To validate how my local LLM environment works, I used the above podman run command as an input to see, whether it can explain. So my simple prompt was;

Input: 

Explain me the following command.

podman run -d -p 3000:8080 --network slirp4netns:allow_host_loopback=true -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main 


And here is the response! What would be your reaction on the response?

Response: 

The podman run command is a Docker-like command for running containers in the Podman environment. Here's a breakdown of the options used in the command you provided:
  • -d: Run the container in detached mode, meaning it will run in the background and not interact with the terminal. 
  • -p 3000:8080: Map port 3000 on the host to port 8080 in the container, allowing incoming HTTP traffic on port 3000 to be forwarded to the container.
  • --network slirp4netns:allow_host_loopback=true: Use the slirp4netns network plugin, which allows for host-to-container loopback connections. This is useful when developing applications that need to communicate with other containers or the host machine.
  • -v open-webui:/app/backend/data: Mount a volume from the open-webui directory inside the container to the /app/backend/data directory on the host. This allows data to be shared between the container and the host.
  • --name open-webui: Set the name of the container.
  • -restart always ghcr.io/open-webui/open-webui:main: Restart the container if it crashes or is terminated, using the ghcr.io/open-webui/open-webui:main image. The :main part of the image name indicates that this is the main image for the application. 
The ghcr.io domain is a container registry service provided by Google Cloud. It allows you to store and manage Docker images, including the open-webui image used in the command. 
In summary, the command runs a container using the slirp4netns network plugin, maps port 3000 on the host to port 8080 in the container, mounts a volume from the open-webui directory inside the container to the host's /app/backend/data directory, sets the name of the container, and restarts the container if it crashes or is terminated using the ghcr.io/open-webui/open-webui:main image.

Pretty cool, right?

  • Ollama - https://github.com/ollama/ollama
  • Open-WebUI (formally known as Ollama-webui) - https://github.com/open-webui/open-webui
Btw, I came across another local LLM studio for "discover, download, and run local LLMs" as well: https://lmstudio.ai/, fast moving space uh?


Some podman related commands that you may need to use for any needed debugging.
  • podman machine list
  • podman machine stop
  • podman machine start
  • podman container list
  • podman container rm open-webui
  • ollama serve
  • lsof -i:3000