Introduction-to-LLM-and-RAG

Lab: Building a Classification Pipeline Using a Generative AI

Objective
The goal of this practical work is to build a sentiment classification pipeline using a generative AI and leveraging the dataset available on Kaggle: IMDB Movie Ratings Sentiment Analysis. You will experiment with tailored prompts to produce consistent and usable outputs in JSON format.

General Guidelines

Purpose of the Assignment
Design a pipeline that:
- Accepts a text input (excerpts from movie reviews).
- Generates a sentiment classification (positive or negative) using a generative AI.
- Returns the response in a structured JSON format.
Optional Evaluation
Once the pipeline is built, you can evaluate its quality by comparing its predictions with the true labels in the dataset.
Customization Encouraged
This assignment is intentionally open-ended: you are encouraged to explore different approaches and demonstrate creativity.

Resources to Assist You

To succeed in this assignment, consider researching and familiarizing yourself with the following concepts:

Python: for data manipulation and interacting with APIs.
JSON: to structure outputs in a standardized format.
Prompt Engineering for Classification: to design prompts tailored to your task.
Dataset Handling: using tools like Pandas and NumPy.
Model Evaluation: metrics such as precision, recall, and F1-score.

Tips

Iterate Quickly: Test your prompts frequently and adjust them based on the results.
Be Methodical: Start with simple examples before generalizing to the entire dataset.
Handle Unexpected Scenarios: Prepare your code to manage cases where the AI produces unexpected responses.
Explore AI Tools: Some generative models perform better than others for specific tasks. Experiment with different providers if necessary.