At Formtoro, we're obsessed with clean data.

Clean data is the quickest way to unlock insights and automate things.

We're big fans of automation: it means less work, fewer errors, and more time to work on things that matter.

As a smaller company and having worked for more than a few smaller ecommerce related companies, time and resources are consistently tight.

Minor changes add up pretty quickly.

What is the difference between clean data and dirty data?

Clean data comes directly from the source, aka zero-party data. However, one important caveat is that the data must be finite—no open-ended questions.

An excellent example of clean data is a multiple-choice question.

22u0Fp9I5l_1UGqnAinA6meDaqXWE6UqnyvwGVh68a1DLHGXayKUKNm-9N3VpJUS0jov4uvvjB73P0w2E6vhZsvVhke8VJ0sB8Kba7DuqiSquCPgt7pep_de45ugUN4pcYTPt3Yv=s0

For every person that answers that question, there is only one answer per person. So in the aggregate, we can find out which flavor is preferred among the sample size that answered the question.

The caveat is that a user could select multiple answers, but even with multiple responses, they would be finite, meaning the values will always be fixed to the options provided.

Contrast this with dirty data.

Dirty data is data that can come directly from the source as well. However, the distinction is that the data is not finite and often is, in many cases, open-ended.

An excellent example of dirty data is a short answer question or a product review.

VuAl2Ujvo_bGCbMVjwZX-0Whu4oRvvzVyk64-NhoaQ9N3CydY4nNdoL1FmI1LdTCVPvgWNdkQCazoqPb2iXfYuvruKkbUDcDZoCN0mXk6vcTUFYkFhjDF94ECuAKbfEPzwYHeVHf=s0

Do you get more value out of the second answer than the first? Sometimes, but it comes with a tradeoff, in that the extra information and the format become hard to categorize.

We've come a long way with Natural Language Processing and sentiment analysis, but at the end of the day, we still need to read, comprehend and categorize the feelings of the person.

Why clean data is better

Marketers, by and large, love collecting dirty data. It's their default, the mental thought process is more data = better.

The reality is that more dirty data = mess.

Data is never 100% accurate, so you're looking for trends and similarities rather than the exact answers to questions.

You can do more with trends than you can with exact answers at scale, and at the end of the day, we need to protect the finite resource that is time.

How do you collect clean data?

These are some common ways that brands currently collect data:

  • Emails with a link to a survey
  • Post-purchase surveys
  • Quizzes on websites

And where we are starting to create real traction:

  • Sign-ups via popups

The percentage of people that fill out these various data collection methods comes down to completion rates.

Sign-ups via popup complete at between 2-7% subscription rate, with 95% of subscribers completing the survey.

Surveys via email convert at around 4% of subscribers, with 70% of people completing the survey

Post-purchase surveys convert at about 30% of checkouts and an average of 2% of website traffic.

Quizzes, when clicked, have about a 15% sign-up rate.

So there are many ways to collect clean zero party data but the highest converting all happen at the point of intention. Usually, after someone has already committed to doing something.

Signing up, clicking a quiz button, checking out, opening an email requires clicking a button.

All these have a point of intention being made, so the user is already committed to taking action. If you can align your ask with a reward to create forward action during the point of intention, your results will go up.

How do you use clean data?

We discussed earlier that clean data leads to automation, it also leads to personalization, and when you combine the two, you're well on your way to creating a revenue engine.

Clean data allows you to drop people into custom flows, will enable you to retarget them via ads, and will enable you to understand better who your customers are and what they might like.

If you can track your data all the way through, you can apply the data points back to how the customer first came to find you and use them to create audiences that you can use for advertising or other activities. Clean data makes for great lookalike audiences.

So when you go to set up your clean data collection flows, keep the above in mind and make sure the data you collect is relevant to the customer journey, not the company journey.