Category: Data Extraction

Regular Expressions/Regex for data cleaning

A regular expression, regex or regexp A regular expression, regex or regexp is, in theoretical computer science and formal language theory, a sequence of characters that define a search pattern. Usually this pattern is then used by string searching algorithms for “find” or “find and replace” operations on strings, or for input validation. From Wikipedia.

Orange 3. Text Mining basic exploration

A few words of jargon in the Text Mining area. Corpus. In linguistics, a corpus or text corpus is a large and structured set of texts. They are used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory. Token. Tokenization is the process of demarcating and

Asset Tiger 2. Testing features Search, Delete, Export/Import & Data Structure

After writing the initial post which I wrote whilst setting up and populating the database I had a further play, using the Mobile App, or rather, I didn’t get very far at all. The company who set up the free Asset Management Site make Bar Code tags, so the accessing of data would be by

Asset Tiger 1. free Online Asset Management Service with Free mobile app

The AlternativeTo website popped up Asset Tiger as a free asset management tool when I typed in Alternatives to OpenMAINT. It is a cloud based tool and has a mobile app so that you can use it on your smartphone or tablet which is good too, as its free (the OpenMAINT one isn’t). Asset Tiger

Orange free Data Mining Tool

I was looking through 101-useful-websites article and came across AlternativeTo.net and used it to look up alternatives to say “Revit” and “AutoCad” and other tools I use. I then typed in KNIME which I use for data mining, data analysis and it came up with Orange as a free alternative.  So I looked at some

Excel web scraping and connecting to database. Dynamic updating of Excel imported data

After writing the previous post on web scraping and the example given of using Google Spreadsheet to obtain a web table I started wondering about Excel. I was aware that it had a web connection so decided to explore those. This first video shows how to automate the data scraping  from websites into Excel: Then

Google searching, free Web Scraping tools and free Extract Tables from PDF tool.

I have been looking at some of the free Data Science and Cognitive Computing Courses and was following the Data Journalism: First Steps, Skills and Tools course. The Google Searching video was on improving your searches in Google using the following: quotes to get specific key words eg “data science on construction” using – sign

Python 8. Jupyter Notebook

There is an interactive tutorial here running Python. This instance is running from another website so could be slow. I found this video extremely good for setting up (I used the pip install method not anaconda) and how to use it. The Jupyter Notebook The Jupyter Notebook is an open-source web application that allows you