A database on your desktop?

If your business users utilize data that is critical to their success then you need to help provide a pleasant way for them to manage that data. GUI tabular data management programs like MS Excel and LibreOffice or OpenOffice are obvious choices for their ease of use alone. In practice, that ease of use is guaranteed to cause problems for them down the road, and usually at horrible times. Wondering what is a good option if you just stuck with CSV data, I posted here. The replies were informative and helpful, and my take away is that the best option for managing tabular data, if you are not bound to the applications mentioned above, is to use a database… no surprise. How do you do that though while still making it easy for the business and at a reasonable cost? The answer is SQLite.

It uses files that you may share and version. It is a real RBDMS. It runs on every OS. There are graphical management tools. It works well with R.

In practice, there are important details, and it is great to know that there is a realistic and practical solution to manage those details.

Ten Simple Rules for Reproducible Computational Research

This link via irreal is another “must read” if you’ve never done systems work before (coming from a system person myself, not a data person).

Tidy Data

A huge amount of effort is spent cleaning data to get it ready for data analysis,
but there has been little research on how to make data cleaning as easy and effective
as possible. This paper tackles a small, but important, subset of data cleaning: data
“tidying”.

— Wickham
Tidy Data is a must-read paper.

SchemaSpy: A Graphical Database Schema Metadata Browser

SchemaSpy is a Java-based tool that analyzes the metadata of a schema in a database and generates a visual representation of it in a browser-displayable format. It lets you click through the hierarchy of database tables via child and parent table relationships as represented by both HTML links and entity-relationship diagrams. It’s also designed to help resolve the obtuse errors that a database sometimes gives related to failures due to constraints.

This is an excellent tool in its own right; and if nothing else for its beautiful use of Graphviz.
See the example(s) here.