Python Webscrape Grocery Comparison Reordered Part 2

I did the initial video and article on this topic but wasn’t happy with the results, so decided to look for another solution.

The issue was looping through and numbering the items, if you did this in the initial list then each instance of a product/shop became unique and so you ended up with rows and rows of items but only one price per line. You could, on reflection, as you know the number of shops that each product will have a price for (in my case 6) be able to do a double loop and just number on the first loop and iterate through the 6 shops then break back to the next loop for the update of the number.

Considering this method now, it may be a simpler way of doing it.

The method I’m using allows for the original list in a file called ‘gxxx.csv’ to be ordered and using it to get the product ID s to create the URL , and doing concatenation of the name,brand, size and unit together to make up the ‘item’ then creating another column, called ‘product’ that is the same as the ‘item’ column but also has a iterated number at the beginning of the string.

The URL is then requested and the Dataframe created and then pivoted to give the original list that is sorted alphabetically. I then merge the Dataframe with a new dataframe created from the ‘gxxx.csv’ file and they are matched on the ‘item’ column, and then the id & item column are deleted and the ‘product’ column from the ‘gxxx.csv’ file is renamed ‘item’ (just because the PDF creator uses this column name in the script following building the correct dataframe) and then the PDF is created.

As the Dataframe will be sorted , it will be sorted numerically as that is the first digit of the ‘item’ string so you can cluster the data together.

So I can get a user ordered list and be able to cluster all the items that I want together.

End comment

This was the original solution I wanted, a simple list with like items adjoining each other so easy to compare.

It still is not the most easily readable list but it works for me.

I am wondering about revisiting the script to see if it can be done more efficiently , but maybe that can be done another day.

The method that might be interesting is having a streamlit app with clusters of items that can be selected and a list created and emailed. You could just have a table in the app but having a pdf file may be more useful if you are off-line or have poor wifi. (although that may limit you downloading the pdf file as well).