Neo4j upload from CSV
I had an excel sheet with some property data. I isolated it to one property for this exercise.
The property has A Site, A block, A Unit and 8 rooms and 3 cupboards (denoted by room name they were in with a letter ‘a’ at the end)
In the first part of the process I set up Nodes for all of these elements.
In this exercise I wanted to take data from a CSV file to populate the Graph Database.
- Ground items such as letter boxes, fences etc, I wanted to link to the site node.
- Exterior works such as roof, exterior cladding and exterior painting etc I wanted to link to the Block node
- The Unit node seems to be a placeholder for the rooms as the unit does not seem to contain any data in the spreadsheet
- All of the elements in the rooms need to be linked to the specific room node.
I used this article as a resource for setting up my code. About half way down there is a video showing the process.
I have spent most of the day trying to get it to run on my machine.
I had a few issues on the setup. Trying to locate the CSV file.
I initially created a directory inside C:\neo4j\files\site.csv, and used the string ‘file:///C:/node4j/files/site.csv’ in the code to test it. Lost of whirling inside http://localhost:7474/browser/ but it would not find it.
I put the path in Firefox browser and it found the file, no problem.
In the end I closed the browser down and re-started it and it couldn’t find the file. It was using a path something like file:///C:/node4j/import/C:/node4j/files/site.csv which messed up the search path, so in the end I put the csv file in the C:/node4j/import directory and used the filepath in the code as:
FROM ‘file:///C:/r1.csv’ AS line
which worked. So now Cypher could find the file. I think it has the neo4j/import in the PATH variable, I may have to check that sometime. Anyway, that worked.
I spent a massive time debugging and so decided that in the Excel File I would filter site, block, each individual room all separately and create a specific CSV file for each of these instances ( you have to start somewhere and I spent so long debugging the import that I thought I would simplify the process (but make it more tedious))
Next I kept on getting error after error in the original code that I had modified, but after constant debugging I have the following code that works:
- MATCH (a:site {code:”DALA190-GRD”})
- LOAD CSV WITH HEADERS
- FROM ‘file:///C:/site.csv’ AS line
- CREATE (e:element {Name:line.Name, Id: toInteger(line.guid)})
- set
e.type = line.Type,
e.baselife = toInteger(line.Baselife),
e.baselifeLower =toInteger(line.BaselifeLower),
e.baselifeUpper =toInteger(line.BaselifeUpper),
e.c1 = toInteger(line.c1),
e.c2 = toInteger(line.c2),
e.c3 = toInteger(line.c3),
e.c4 = toInteger(line.c4),
e.c5 = toInteger(line.c5),
e.r1 = toInteger(line.r1),
e.r2 = toInteger(line.r2),
e.r3= toInteger(line.r3),
e.r4 = toInteger(line.r4),
e.r5= toInteger(line.r5),
e.unit=line.Unit,
e.unitRate= toFloat(line.Unit_Rate),
e.quantity= toFloat(line.Quantity) - CREATE (e)-[r:BELONGS_TO]->(a)
- RETURN r
Going through it:
Item 1. Match identifies the Note that I want to link all the data to. This is the specific site, block or room.
Items 2, 3 is the loading CSV (with headers and pointing to the file
Item 4. This held me up for ages, it creates a node, I think with 2 labels , Name & Id(this being the Guid in the excel sheet). I was trying to get all the data in here, but it would not let me. I then learned that I hat to use the SET command to bring all the other data into the node (going back to this Video in the previous post about neo4j).
Item 5. Hurray, the set command. I copied the original exercise too well and had [] square brackets in the lines e.baselifeLower =toInteger(line.BaselifeLower[1]),thinking the [1] was the number of digits in the integers (it wasn’t it was a reference to a broken up date element and the positions of the elements in the array). [several hours gone on that!!]
Item 6. I also created the relationship between the new element nodes and the site/block/room the elements were to attach to.
Item 7. Just showing the relationships created.
End comment
I think using Cypher rather than Python made sense for the learning and debugging. I was debugging neo4j and not python, so learning about the database rather than the relationsahip betweeen python and the database.
I can now test my code out from Python as the next step.
It took me a long time to get this the way I wanted it and I did a lot of deleting of the Element part of the schema.
The next step is to start writing queries on this data to see how it responds.
One last thing. There are these objects unattached to anything. They are gray, like the “elements” but only have an “id” from neo4j. I haven’t been able to get rid of them so far. I may need to find individual “id”s and get rid of them that way.