Drafting the 2024 User Survey

I started drafting the question for the 2024 user survey here. I worked based on the question from the 2022 survey with small modifications. The document is open for comments and suggestions. Please add any question you think is relevant. I would like to have a final version of the question by the end of the month.

I am also keen to explore how we can better collect feature ideas. Do we want to continue using an open-ended question or use a platform like alloursideas (as suggested in Results of two user surveys for Wikimedia Commons users of OpenRefine ).

We will most likely add a section regarding the Mission, Vision, and Value, as we work with the selected consulting company (I will share more on this soon).

Thank you for your feedback.

I am updating the High-level tasks you do with OpenRefine to break it down into two questions

  • one on the feature used. I tried to list the main feature but a second look is welcome. I am not sure how granular we want to go.
  • a second question on to understand the larger workflow OpenRefine is used for. What is done before and after OpenRefine.

Someone suggested changing those questions to a Likert scale, which makes sense to me.

@zoecooper, is there any question you want to ask via the survey?

Thank you for tagging me @Martin!

I've left a couple comments in the google doc itself, I hope they're helpful.

A few questions I'm interested in:

  • Do you save the operation history?
  • How do your workflows vary from one project to another? Do you tend to use the same operations/functions on all your projects?
  • Do you have a methods for saving your workflows for future reference? And if so, do you share them with others?
1 Like

@zoecooper thanks for your feedback, I answered in the document directly.

Regarding your additional question, should we make them conditional to those who ranked at least 3 out of 5 (so at least occasionally or once per month or in 50% of their project) to the following question

How often do you use the following features? Creating repeatable workflows

Going through the current question, I realized that the question "How often do you use the following features? Working with very large datasets" is not very precise.

How should we define what constitutes a large dataset? We can measure it either in terms of the number of rows and file size. Alternatively we can ask which user has edited the RAM allocation of OpenRefine as a proxy.

@tfmorris @thadguidry @antonin_d @abbe98, are there any questions you want to add to the survey?

I am also interested in hearing from you about the best way to collect feature requests, we can either

  • ask an open question
  • use a prioritization platform like allourideas (as described here)
  • or use a different approach.

Did we ask:

Is OpenRefine's installation on your work computer done by your IT staff or by yourself?

  • IT Staff
  • Myself
  • N/A (not used at work)

This would tell us quite a bit, like how many use it in their job and also who controls the installation environment. This will help us with packaging concerns.

The other thing that I'd like to see partitioned in the data is the work environment OS and the personal environment OS.

OpenRefine is used at work with the OS being:

  • Windows
  • Mac
  • Linux
  • N/A (I only use it personally)

OpenRefine is used personally and my OS is:

  • Windows
  • Mac
  • Linux
  • N/A (I only use it at work)

Also, we don't seem to have a way to clearly see if they use OpenRefine at work or personally through a percentage of time used. We only asked "mainly" with one choice? It would be better to simply ask for a percentage of work and personal use through 2 questions instead:

Percent of time OpenRefine is used at work:

  • 0-25
  • 25-50
  • 50-75
  • 75-100

Percent of time OpenRefine is used personally:

  • 0-25
  • 25-50
  • 50-75
  • 75-100

I really like the way Allourideas works. Simple and smart.

I took a pass through and left some comments in the document, but I was feeling pretty strongly anchored by the previous surveys, so I probably should have done my own cleansheet version first and then compared it to what's there.

I agree with Thad's comments about work vs non-work usage. One way to capture that might be to expand the "How often do you use OpenRefine?" question to cover both cases separately.

I like the binary voting methodology of allourideas, but I think the quality of the results will depend heavily on getting it seeded well and also filtering/post-processing the results. The previous survey had people adding duplicates of pre-existing items and the reported results didn't filter low frequency results.

Some other thoughts:

  • inputs/outputs/transformations are important and we should make sure we capture them well
  • do we care more about what users organizations do or what their role/function is?
  • do we care about organization size (maybe not, but it seems like it could inform about the environment users are working in)
  • why is there only a single choice which covers all of "for-profit" when OSM gets its own item?
  • similar to above, I feel that the granularity/range of answers for some of the other questions is off. e.g. do we care more about users with 1 vs 6 months experience than the entire interval between 2 and 14 years? Previous years results may help inform how these should be skewed.

I wish we had a "Click here to generate an anonymized usage report for the last 12 months" button. That would allow the survey to be focused on qualitative questions. Perhaps next survey...


Thank you all for your feedback.

I made the following changes to the survey

  • I rewrote thequestion regarding Professional or non-professionalcasual user to use a slider
  • I changed the question regarding where OpenRefine is installed by who installed it with the option for the hosted version. We can infer whether it is run locally or hosted from the answer.

I added the following questions

  • Which OS are you using
  • With which browser do you use OpenRefine?
  • Question to capture the input format
  • Question to capture the output format generated
  • Questions @zoecooper regarding how users manage workflow

I would prefer to know the user's role rather than the organization they work for. For instance, I would prefer to know if you are a librarian rather than whether you work in a public library or a university.

I started a document here to list the suggestions for the Allourideas platform. I used answers from the previous survey (minus WikiCommons-specific questions) and results from my user interviews. However, I feel like the list of entry can be endless with the 407 open feature request issues. Note that with all our ideas, we have the option to moderate user suggestions before they are added to the survey.

The majority of 407 feature request issues (sorted by those with lots of thumbs up) appear to me, if I take a step back and look wholistically through categorical grouping... seems to be around 2 categorical areas:

  1. More power and control for adhoc editing the grid in general (inserting rows, adding/removing rows, add blank columns). Almost like many are asking, can you just make OpenRefine work like a spreadsheet sometimes? Wondering if that's perhaps a new 3rd mode? Rows/Records/Spreadsheet? Feels weird, but that's what I'm seeing in many of those top ranked issues.
  2. Reconciliation options and general quality of life improvements for Reconciliation.

@Martin I'm sorry I thought I replied - yes, I think that's a good idea!

1 Like

It is time for a final review to finalize the survey. I accepted all the changes, so it's now easier to read. Bocoup will share their questions regarding OpenRefine's mission, vision, and values by next week. The survey is quite long, but I guess it is okay since we do it every two years. I would appreciate feedback from those who have more experience.

We still need to finalize the list of questions we want to pre-seed in the all our ideas platform. I am keen to hear from the development team (@antonin_d @thadguidry @tfmorris @abbe98 ...) on how to phrase those so we have actionable feedback (for example, what does Better reconciliation mean)?

By the way, all our ideas are being relaunched with a new design with a lot of generative AI features here https://all-our-ideas.citizens.is/. I suggest we launch it with all the disabled generative AI features.

I would like to start distributing the survey by the end of next week (May 3rd, 2024).

Thank you

Bocoup indicated that they do not plan to use the community survey for their project.

If there are no further comments or suggestions, we will proceed to distribute the survey through LimeSurvey.

When I distributed the final survey for review to a limited group of people, I received feedback challenging its structure and purpose. I believe these concerns are valid, and I want to ensure the survey is useful. My goal is to create an actionable survey for all community members, whether developers, designers, or community managers. Therefore, I am reopening the discussion about the survey design based on the feedback received.

I organized the feedback into four categories.

    1. Why are we doing this survey?
    1. Should we collect more demographics?
    1. Are we looking for current usage vs what the user wants?
    1. Other suggested questions

Initial Survey Design

You can review the initial survey design in the screenshot of the Google Form survey below. I also copy pasted the question in this new Google Doc for those who want to comment in line.

1. Why are we doing this survey?
I personally use the results for the following purposes:

  1. To gain a better understanding of our user demographic when presenting the community to partners or funders.
  2. To understand the general usage of OpenRefine, including which features, extensions, and reconciliation services are being used, in order to see if there are any changes. Over the last 5 surveys, no major shifts have occurred.
  3. To identify individual contributors who may not be vocal on the forum or GitHub. I contacted several of them during my 2023 interviews to recruit them for the advisory committee and invite them to the Barcamp.
  4. The 2022 survey helped to identify feature requests that have been implemented since.

2. Should we collect more demographics?

The survey is light regarding demographic information on age, educational level, and geographic location. If we collect that information, how will we use it?

3. Are we looking for current usage vs what the user wants?
During the initial survey design, part 2 was extended to get into extreme details (and lightly) details to understand how often different features are used and the level of expertise of the user on such features. The feedback I received is that parts 2 to 4 are overwhelming.

  • Are those information valuable to the development or design contributors?
  • If so, how do we make those sections shorter?
  • An alternative will be to anonymously track usage (with the user's consent) directly in OpenRefine. This would be a totally different project but should yield more accurate data.

On the other hand, do we want to ignore current usage to focus on what users need and want? For this, we set up a separate survey in the Allourideas platform to collect feature requests. I set it up as the last question since I am concerned that many users may not complete the survey if we redirect them too early to AllOurIdea. The feature question opens in a new tab and starts a flow of 20 prompts. By the time the participant answers them, they may have moved on, and they may not return to complete the survey. I am open to any suggestion to address this.

To facilitate feedback, I am creating a poll in Discourse, feel free to comment to add details on your vote. If you're reading this from your email, open the thread in your browser to vote.

What should we focus on
  • Current usage using the survey
  • Current usage using an automated collection
  • Feature request using allourideas
0 voters

4. Other suggested questions

I also received the following questions.

  1. How did you learn to use OpenRefine? (Book, web resource, in-person training, etc, etc)
  2. What is your prefered way of learning about OpenRefine?
  3. Is it important to you to have the OpenRefine UI translated into your native language?
  4. What languages would you like to see the OpenRefine interface translated into?
  5. Do you prefer paid professional support or free peer support?
  6. Tell us how you use OpenRefine in your workflows
  7. How often do you need to work with non-Latin based alphabets? What additional capabilities do you need to help with this?
  8. How much memory (RAM) do you allocate to OpenRefine? (Leave blank if you use the default): ____ MB
  9. If you've increased the maximum number of text facet choices, what value did you increase it to? (Leave blank if you use the default)
  10. Thinking specifically about OpenRefine as a tool, what do you see as its biggest weakness?
  11. Thinking about the OpenRefine ecosystem as a whole, what do you see as its biggest weakness? Biggest strength?
  12. What feature would you most like to see implemented/improved in OpenRefine?

What do you mean with "automated collection" precisely?

My interest in the outcomes of this survey is to find out what (current) OpenRefine users generally need most urgently, to help decide what further development of the tool should focus on.

I notice, when interacting ad hoc with users in person, that different contributors to OpenRefine see and prioritize very different issues that they consider important to work on. I would like a survey like this to produce solid data around what users really find major impediments (pain points) to their day-to-day use, and the major functionalities that are key to the success of their day-to-day work. I would like to hear from as many users as possible, so not the 100-200 respondents of previous surveys; I would like to see 1000-2000 responses or more.

@Sandra thank you for the review and chipping in and your vote.

What do you mean with "automated collection" precisely?

In a message on the forum or github (which I cannot locate) we mention the option of letting user share they usage data of OpenRefine. We will received an anonymous stream of log regarding which feature is used and when. Often this can be done by third party services. This is a significant effort to deploy the infrastructure and update OpenRefine to share the metrics and get user consent to do so.

Thanks. I totally see how such automated data collection is not trivial in many ways, but I think it would be extremely valuable if collected on a larger scale.

In a comment in the draft questions doc, I also gave the following input (edited a bit further):

I am extremely interested in solid data on what features users currently use, in order to check the observations I made while training many folks in the last years with broader data points. I'm interested in data to check if the usage and behavior of the users I've supported so far - mainly beginners and small organizations in the cultural sector, and (broadly) Wikimedia and Wikibase users, is very different from other OpenRefine users' usage patterns. Knowing this will help Wikimedia/Wikibase contributors and organizations understand where OpenRefine is very aligned with the (broad) Wikimedia use case and where that's less so.

As I wrote above, I'm also interested in the question about what users need, but with a focus on urgency and the importance to many of them (pain points, major needs and impediments, major requests that will unlock key functionalities crucial for many: e.g. if 50% of users want to work with large datasets in 50% of their projects but are currently totally unable, that's a huge need and pain point, and if 70% of users want flexible reconciliation with any random dataset but only in less than 5% of their projects, that can be argued to be a less urgent thing to work on). I think the AllOurIdeas part of the survey addresses this?

1 Like

I will be hosting an office hour next week on 2024-05-28T14:00:00Z2024-05-28T15:00:00Z to provide another opportunity for feedback on the survey structure.

The meeting will be held in the following room: https://meet.greenhost.net/OpenRefine-community-meetup.

Please react with a :+1: if you plan to attend. If you would like to discuss the survey structure but are unable to make it next week, please let me know, and we can arrange another time to connect.

1 Like

I may be able to attend - I will be out of office (so no conflicts!) but not sure if I will be away from my computer. Will a calendar invitation also be shared?