One of the recurring gripes I hear about in the data science, statistics, and computational biology domains is the lack of seamless workflows for cleaning, analysis, and visualization. Currently OpenRefine doesn’t expose a SQL interface. This makes workflows with tools like Scikit, R Studio, Knime, Pandas, Jupyter, and others cumbersome for round tripping.
I think we could do better post-4.0 and add a Java SQL interface or make progress towards alternative storage configurations (something like Apache Gora or more current Java DB technology). This would also make future features such as data joins and multi-way merging a reality without us having to do a lot of the built in coding but instead allow extensions and tools to read and write to the DB storage layer configured.