A Coding Implementation to Construct a Conversational Analysis Assistant with FAISS, Langchain, Pypdf, and TinyLlama-1.1B-Chat-v1.0


RAG-powered conversational analysis assistants tackle the restrictions of conventional language fashions by combining them with info retrieval methods. The system searches by particular information bases, retrieves related info, and presents it conversationally with correct citations. This strategy reduces hallucinations, handles domain-specific information, and grounds responses in retrieved textual content. On this tutorial, we’ll exhibit constructing such an assistant utilizing the open-source mannequin TinyLlama-1.1B-Chat-v1.0 from Hugging Face, FAISS from Meta, and the LangChain framework to reply questions on scientific papers.

First, let’s set up the mandatory libraries:

!pip set up langchain-community langchain pypdf sentence-transformers faiss-cpu transformers speed up einops

Now, let’s import the required libraries: 

import os
import torch
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.chains import ConversationalRetrievalChain
from langchain_community.llms import HuggingFacePipeline
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import pandas as pd 
from IPython.show import show, Markdown

We’ll mount drive to save lots of the paper in additional step:

from google.colab import drive
drive.mount('/content material/drive')
print("Google Drive mounted")

For our information base, we’ll use PDF paperwork of scientific papers. Let’s create a perform to load and course of these paperwork:

def load_documents(pdf_folder_path):
    paperwork = []


    if not pdf_folder_path:
        print("Downloading a pattern paper...")
        !wget -q https://arxiv.org/pdf/1706.03762.pdf -O consideration.pdf
        pdf_docs = ["attention.pdf"]
    else:
        pdf_docs = [os.path.join(pdf_folder_path, f) for f in os.listdir(pdf_folder_path)
                   if f.endswith('.pdf')]


    print(f"Discovered {len(pdf_docs)} PDF paperwork")


    for pdf_path in pdf_docs:
        strive:
            loader = PyPDFLoader(pdf_path)
            paperwork.prolong(loader.load())
            print(f"Loaded: {pdf_path}")
        besides Exception as e:
            print(f"Error loading {pdf_path}: {e}")


    return paperwork




paperwork = load_documents("")

Subsequent, we have to cut up these paperwork into smaller chunks for environment friendly retrieval:

def split_documents(paperwork):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200,
        length_function=len,
    )
    chunks = text_splitter.split_documents(paperwork)
    print(f"Break up {len(paperwork)} paperwork into {len(chunks)} chunks")
    return chunks


chunks = split_documents(paperwork)

We’ll use sentence-transformers to create vector embeddings for our doc chunks:

def create_vector_store(chunks):
    print("Loading embedding mannequin...")
    embedding_model = HuggingFaceEmbeddings(
        model_name="sentence-transformers/all-MiniLM-L6-v2",
        model_kwargs={'machine': 'cuda' if torch.cuda.is_available() else 'cpu'}
    )


    print("Creating vector retailer...")
    vector_store = FAISS.from_documents(chunks, embedding_model)
    print("Vector retailer created efficiently!")
    return vector_store


vector_store = create_vector_store(chunks)

Now, let’s load an open-source language mannequin to generate responses. We’ll use TinyLlama, which is sufficiently small to run on Colab however nonetheless highly effective sufficient for our job:

def load_language_model():
    print("Loading language mannequin...")
    model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"


    strive:
        import subprocess
        print("Putting in/updating bitsandbytes...")
        subprocess.check_call(["pip", "install", "-U", "bitsandbytes"])
        print("Efficiently put in/up to date bitsandbytes")
    besides:
        print("Couldn't replace bitsandbytes, will proceed with out 8-bit quantization")


    from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, pipeline
    import torch


    tokenizer = AutoTokenizer.from_pretrained(model_id)


    if torch.cuda.is_available():
        strive:
            quantization_config = BitsAndBytesConfig(
                load_in_8bit=True,
                llm_int8_threshold=6.0,
                llm_int8_has_fp16_weight=False
            )


            mannequin = AutoModelForCausalLM.from_pretrained(
                model_id,
                torch_dtype=torch.bfloat16,
                device_map="auto",
                quantization_config=quantization_config
            )
            print("Mannequin loaded with 8-bit quantization")
        besides Exception as e:
            print(f"Error with quantization: {e}")
            print("Falling again to straightforward mannequin loading with out quantization")
            mannequin = AutoModelForCausalLM.from_pretrained(
                model_id,
                torch_dtype=torch.bfloat16,
                device_map="auto"
            )
    else:
        mannequin = AutoModelForCausalLM.from_pretrained(
            model_id,
            torch_dtype=torch.float32,
            device_map="auto"
        )


    pipe = pipeline(
        "text-generation",
        mannequin=mannequin,
        tokenizer=tokenizer,
        max_length=2048,
        temperature=0.2,
        top_p=0.95,
        repetition_penalty=1.2,
        return_full_text=False
    )


    from langchain_community.llms import HuggingFacePipeline
    llm = HuggingFacePipeline(pipeline=pipe)
    print("Language mannequin loaded efficiently!")
    return llm


llm = load_language_model()

Now, let’s construct our assistant by combining the vector retailer and language mannequin:

def format_research_assistant_output(question, response, sources):
    output = f"n{'=' * 50}n"
    output += f"USER QUERY: {question}n"
    output += f"{'-' * 50}nn"
    output += f"ASSISTANT RESPONSE:n{response}nn"
    output += f"{'-' * 50}n"
    output += f"SOURCES REFERENCED:nn"


    for i, doc in enumerate(sources):
        output += f"Supply #{i+1}:n"
        content_preview = doc.page_content[:200] + "..." if len(doc.page_content) > 200 else doc.page_content
        wrapped_content = textwrap.fill(content_preview, width=80)
        output += f"{wrapped_content}nn"


    output += f"{'=' * 50}n"
    return output


import textwrap


research_assistant = create_research_assistant(vector_store, llm)


test_queries = [
    "What is the key idea behind the Transformer model?",
    "Explain self-attention mechanism in simple terms.",
    "Who are the authors of the paper?",
    "What are the main advantages of using attention mechanisms?"
]


for question in test_queries:
    response, sources = research_assistant(question, return_sources=True)
    formatted_output = format_research_assistant_output(question, response, sources)
    print(formatted_output)

On this tutorial, we constructed a conversational analysis assistant utilizing Retrieval-Augmented Technology with open-source fashions. RAG enhances language fashions by integrating doc retrieval, lowering hallucination, and guaranteeing domain-specific accuracy. The information walks by organising the surroundings, processing scientific papers, creating vector embeddings utilizing FAISS and sentence transformers, and integrating an open-source language mannequin like TinyLlama. The assistant retrieves related doc chunks and generates responses with citations. This implementation permits customers to question a information base, making AI-powered analysis extra dependable and environment friendly for answering domain-specific questions.


Right here is the Colab Notebook. Additionally, don’t neglect to comply with us on Twitter and be part of our Telegram Channel and LinkedIn Group. Don’t Overlook to hitch our 85k+ ML SubReddit.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its recognition amongst audiences.

Leave a Reply

Your email address will not be published. Required fields are marked *