Lab: Building a Classification Pipeline Using a Generative AI
Objective
The goal of this practical work is to build a sentiment classification pipeline using a generative AI and leveraging the dataset available on Kaggle: IMDB Movie Ratings Sentiment Analysis. You will experiment with tailored prompts to produce consistent and usable outputs in JSON format.
General Guidelines
- Purpose of the Assignment
Design a pipeline that:
- Accepts a text input (excerpts from movie reviews).
- Generates a sentiment classification (positive or negative) using a generative AI.
- Returns the response in a structured JSON format.
-
Optional Evaluation
Once the pipeline is built, you can evaluate its quality by comparing its predictions with the true labels in the dataset.
- Customization Encouraged
This assignment is intentionally open-ended: you are encouraged to explore different approaches and demonstrate creativity.
Resources to Assist You
To succeed in this assignment, consider researching and familiarizing yourself with the following concepts:
- Python: for data manipulation and interacting with APIs.
- JSON: to structure outputs in a standardized format.
- Prompt Engineering for Classification: to design prompts tailored to your task.
- Dataset Handling: using tools like Pandas and NumPy.
- Model Evaluation: metrics such as precision, recall, and F1-score.
Tips
- Iterate Quickly: Test your prompts frequently and adjust them based on the results.
- Be Methodical: Start with simple examples before generalizing to the entire dataset.
- Handle Unexpected Scenarios: Prepare your code to manage cases where the AI produces unexpected responses.
- Explore AI Tools: Some generative models perform better than others for specific tasks. Experiment with different providers if necessary.