Drafting the 2024 User Survey

I started drafting the question for the 2024 user survey here. I worked based on the question from the 2022 survey with small modifications. The document is open for comments and suggestions. Please add any question you think is relevant. I would like to have a final version of the question by the end of the month.

I am also keen to explore how we can better collect feature ideas. Do we want to continue using an open-ended question or use a platform like alloursideas (as suggested in Results of two user surveys for Wikimedia Commons users of OpenRefine ).

We will most likely add a section regarding the Mission, Vision, and Value, as we work with the selected consulting company (I will share more on this soon).

Thank you for your feedback.

I am updating the High-level tasks you do with OpenRefine to break it down into two questions

  • one on the feature used. I tried to list the main feature but a second look is welcome. I am not sure how granular we want to go.
  • a second question on to understand the larger workflow OpenRefine is used for. What is done before and after OpenRefine.

Someone suggested changing those questions to a Likert scale, which makes sense to me.

@zoecooper, is there any question you want to ask via the survey?

Thank you for tagging me @Martin!

I've left a couple comments in the google doc itself, I hope they're helpful.

A few questions I'm interested in:

  • Do you save the operation history?
  • How do your workflows vary from one project to another? Do you tend to use the same operations/functions on all your projects?
  • Do you have a methods for saving your workflows for future reference? And if so, do you share them with others?
1 Like

@zoecooper thanks for your feedback, I answered in the document directly.

Regarding your additional question, should we make them conditional to those who ranked at least 3 out of 5 (so at least occasionally or once per month or in 50% of their project) to the following question

How often do you use the following features? Creating repeatable workflows

Going through the current question, I realized that the question "How often do you use the following features? Working with very large datasets" is not very precise.

How should we define what constitutes a large dataset? We can measure it either in terms of the number of rows and file size. Alternatively we can ask which user has edited the RAM allocation of OpenRefine as a proxy.

@tfmorris @thadguidry @antonin_d @abbe98, are there any questions you want to add to the survey?

I am also interested in hearing from you about the best way to collect feature requests, we can either

  • ask an open question
  • use a prioritization platform like allourideas (as described here)
  • or use a different approach.

Did we ask:

Is OpenRefine's installation on your work computer done by your IT staff or by yourself?

  • IT Staff
  • Myself
  • N/A (not used at work)

This would tell us quite a bit, like how many use it in their job and also who controls the installation environment. This will help us with packaging concerns.

The other thing that I'd like to see partitioned in the data is the work environment OS and the personal environment OS.

OpenRefine is used at work with the OS being:

  • Windows
  • Mac
  • Linux
  • N/A (I only use it personally)

OpenRefine is used personally and my OS is:

  • Windows
  • Mac
  • Linux
  • N/A (I only use it at work)

Also, we don't seem to have a way to clearly see if they use OpenRefine at work or personally through a percentage of time used. We only asked "mainly" with one choice? It would be better to simply ask for a percentage of work and personal use through 2 questions instead:

Percent of time OpenRefine is used at work:

  • 0-25
  • 25-50
  • 50-75
  • 75-100

Percent of time OpenRefine is used personally:

  • 0-25
  • 25-50
  • 50-75
  • 75-100

I really like the way Allourideas works. Simple and smart.

I took a pass through and left some comments in the document, but I was feeling pretty strongly anchored by the previous surveys, so I probably should have done my own cleansheet version first and then compared it to what's there.

I agree with Thad's comments about work vs non-work usage. One way to capture that might be to expand the "How often do you use OpenRefine?" question to cover both cases separately.

I like the binary voting methodology of allourideas, but I think the quality of the results will depend heavily on getting it seeded well and also filtering/post-processing the results. The previous survey had people adding duplicates of pre-existing items and the reported results didn't filter low frequency results.

Some other thoughts:

  • inputs/outputs/transformations are important and we should make sure we capture them well
  • do we care more about what users organizations do or what their role/function is?
  • do we care about organization size (maybe not, but it seems like it could inform about the environment users are working in)
  • why is there only a single choice which covers all of "for-profit" when OSM gets its own item?
  • similar to above, I feel that the granularity/range of answers for some of the other questions is off. e.g. do we care more about users with 1 vs 6 months experience than the entire interval between 2 and 14 years? Previous years results may help inform how these should be skewed.

I wish we had a "Click here to generate an anonymized usage report for the last 12 months" button. That would allow the survey to be focused on qualitative questions. Perhaps next survey...


Thank you all for your feedback.

I made the following changes to the survey

  • I rewrote thequestion regarding Professional or non-professionalcasual user to use a slider
  • I changed the question regarding where OpenRefine is installed by who installed it with the option for the hosted version. We can infer whether it is run locally or hosted from the answer.

I added the following questions

  • Which OS are you using
  • With which browser do you use OpenRefine?
  • Question to capture the input format
  • Question to capture the output format generated
  • Questions @zoecooper regarding how users manage workflow

I would prefer to know the user's role rather than the organization they work for. For instance, I would prefer to know if you are a librarian rather than whether you work in a public library or a university.

I started a document here to list the suggestions for the Allourideas platform. I used answers from the previous survey (minus WikiCommons-specific questions) and results from my user interviews. However, I feel like the list of entry can be endless with the 407 open feature request issues. Note that with all our ideas, we have the option to moderate user suggestions before they are added to the survey.

The majority of 407 feature request issues (sorted by those with lots of thumbs up) appear to me, if I take a step back and look wholistically through categorical grouping... seems to be around 2 categorical areas:

  1. More power and control for adhoc editing the grid in general (inserting rows, adding/removing rows, add blank columns). Almost like many are asking, can you just make OpenRefine work like a spreadsheet sometimes? Wondering if that's perhaps a new 3rd mode? Rows/Records/Spreadsheet? Feels weird, but that's what I'm seeing in many of those top ranked issues.
  2. Reconciliation options and general quality of life improvements for Reconciliation.

@Martin I'm sorry I thought I replied - yes, I think that's a good idea!

1 Like