Python Webscrape Grocery Comparison Tool

There is a NZ Grocers app (https://grocer.nz/) that compares prices of products in supermarkets in NZ.
I wanted to develop a list of items that would vary in price and do a price comparison across different local supermarkets to find:
1/ Where the best prices were to be had for specific products
2/Where some short supply products would be available.

The original app creates a list but it is very long and hard to compare similar products, so I wanted to adjust the data into a table with the product as a line item with the stores as columns so you could compare like with like in a shorter list.
I had a few issues with the data output using pandas. After using pandas pivot it ordered the data alphabetically and it was hard to compare similar items as they w”ere ordered by the first letter of their name, so if you had a product “chicken breast skinless” and another skinless chicken breast” the first would be ordered under “C” and the 2nd under “S” so hard to compare directly with an alphabetically ordered list.
I decided to break the list into separate groups and create a method to create individual URL’s for each group of items that I wanted to review, so that instead of one long lookup list I ended with multiple lookup lists that were appended to each other, so the smaller groupings made it easier to compare similar items in the same group, regardless of the alphabetical order.
To do this I needed separate files of the product id’s so that I could concatenate them, along with the initial URL and the preferred shop list so that there were multiple GET requests that would have data in sub-groups. I also needed to add headers to each sub-group so that you could differentiate between each group.
Making separate lists allowed for adding & removing items from each list , to keep the list relevant.
To make sure the list was available I created a PDF table and then emailed the resultant file to myself daily using Windows task scheduler running on my server and it emailed me the current list at 5am each morning so it is available to me when I go shopping and allows me to see where best to shop.
One anomaly that occurred was that the productID’s were 16 digits long, when the CSV file was uploaded into Excel, the number was converted into Scientific notation, and when converting it back to normal integer structure Excel rounded the last digit to a ‘0’ so the product ID was wrong and the Get request failed on a number of items.
To resolve this issue I used Bard to write a python script to simplify the original long get request URL into simple lists after exporting results to a CSV.

What I initially thought was a simple exercise ended up being a bit more complex than I thought and I decided to make it a bit more flexible that it needed to be. Overall I‘m pleased with the results and think ity can be adapted by others reasonably easily, apart from getting the headers correct which can be a bit fiddly.
You can download the code here

For the PDF file to work you may need the DejaVuSansCondensed.ttf font file & you need to give a path to where the file is located.

Ordering items for grouping

The one thing I’m not that happy about is ordering the items – It would be good to be able to insert a number or something at the front of the item to be able to be sort them into a grouping that was more convenient.

I’ll need to think about that to see if there is a better way to do this.

End comment

I am pleased with the result of the process. In some ways its not much different from other Get requests that I’ve done for other APIs and sending emails to myself with information.

Building the URL’s from scratch so that they can be constantly updated is an extra addition to the process.

I’ve since added some extra items to different lists and it works fine.

I did spend a bit of time trying to get tle list to have both productID and Product name, so it was easy to see items that you may want to add/delete.