If your business users utilize data that is critical to their success then you need to help provide a pleasant way for them to manage that data. GUI tabular data management programs like MS Excel and LibreOffice or OpenOffice are obvious choices for their ease of use alone. In practice, that ease of use is guaranteed to cause problems for them down the road, and usually at horrible times. Wondering what is a good option if you just stuck with CSV data, I posted here. The replies were informative and helpful, and my take away is that the best option for managing tabular data, if you are not bound to the applications mentioned above, is to use a database… no surprise. How do you do that though while still making it easy for the business and at a reasonable cost? The answer is SQLite.
It uses files that you may share and version. It is a real RBDMS. It runs on every OS. There are graphical management tools. It works well with R.
In practice, there are important details, and it is great to know that there is a realistic and practical solution to manage those details.
This link via irreal is another “must read” if you’ve never done systems work before (coming from a system person myself, not a data person).
You mean… there is actually work involved?!
This is a valuable article.
A huge amount of effort is spent cleaning data to get it ready for data analysis,
but there has been little research on how to make data cleaning as easy and effective
as possible. This paper tackles a small, but important, subset of data cleaning: data
Tidy Data is a must-read paper.
Last semester (Fall 2011) I taught “ENTERPRISE DATA MODELING” at Carroll University. Carroll is a great school and teaching the class was a lot of fun.
A mentor of mine shared that “A teacher’s job is to create an environment in which learning is likely to occur.”. Thank you for sharing that.
SchemaSpy is a Java-based tool that analyzes the metadata of a schema in a database and generates a visual representation of it in a browser-displayable format. It lets you click through the hierarchy of database tables via child and parent table relationships as represented by both HTML links and entity-relationship diagrams. It’s also designed to help resolve the obtuse errors that a database sometimes gives related to failures due to constraints.
This is an excellent tool in its own right; and if nothing else for its beautiful use of Graphviz.
See the example(s) here.