Why bad analysis is worse than no analysis
Head of Research
Since the data does not have a natural form, it can be displayed in different ways. There is no one and only correct format for that. One visualization will be somewhat more relevant than others. However, there is a problem of corrupting the information in case of its incorrect display. This can happen for two reasons:
- deliberate data manipulation;
- inaccuracy due to lack of knowledge.
False visualization can be divided into groups for several problems. Let's take a look at each of them.
1. Not starting from “0”.
Time magazine has once published the infographics, which compared the brands of cars that were sold over the past 10 years and are still in running condition. Obviously, the Chevrolet reliability was shown here.
However, if the graph was built from zero the difference between the objects would be perceived differently. Columns on the graph would look almost equal. See for yourself.
This graph is no longer so impressive. The difference between cars is insignificant, although it does exist. Therefore, in my opinion, this was a deliberate manipulation of data. Well, even though they showed a 5% scale, informing readers that this is just the tip of the chart. However, people usually glance at the columns by eye and just compare them. So this is one of the oldest and most famous examples of how to manipulate the data in the charts that weren't built from zero.
25 years have passed. In 2017, Google presents new smartphones Pixel 2 and Pixel 2 XL. One of the advantages is a significantly better camera than in the previous model.
Same story. Looking at the chart, it seems like quality has become almost 0.5 times better. And now let's build a graph, starting from zero.
There is a difference indeed, but it does not show such a steep and obvious competitive advantage as they tried to demonstrate. Conclusion: conscious manipulation of data.
2. Uneven intervals in dynamics.
When you build a linear graph, you need to visualize the dates over the same time interval. For example: if you start from 1995, and the next one is 2000, then it is logical to depict 2005, etc. If you have data that is unevenly distributed by dates, then you need to visualize it correctly.
Let's look at the graph of the Russian edition Meduza, which represents an increase in the proportion of the Orthodox population in Russia.
The dynamics is visualized from 1991 to 2016. Firstly the starting point of the graph starts with an unknown period. Secondly the same-sized intervals contain different number of years. The interval between 1991 and 1992 is the same as between 1992 and 1997, and so on. According to the logic of Meduza: 1 year = 5 years = 3 years = 2 years = 6 years = 4 years = 1 year = 3 years.
Here’s how this graph should look like.
In my version, the scale is preserved, all the points with existing information are, and the curve starts in 1991.
Here is another graph from Meduza.
The intervals here are okay, they are equal. But for some reason, the curve of the graph goes beyond 2015. And logically the next point should be 2020. Question: how can Meduza predict the future? Not clear.
It is important to always have clear dates that are visualized for graphs in dynamics. Define the start point and end point. The end point is the end of the trial period or today's date for real-time information. In order to show the forecast correctly, use separate colors, dotted lines, footnotes, etc.
Also this chart has a weirdly constructed vertical scale of numerical distribution. As a rule, it is built with hundreds, five hundred or with thousands. Here we see one step equals 750. Basically, with this scheme it was possible to make it even 756, just to skillfully mock people.
Here’s how this graph should look like.
3. Invalid value of shares.
Here's a spectacular example of graphs that are not built, but drawn. It seems that they did not use a data table here; usually basing on this table a special program builds logical visualization. This one is an example from Adobe Illustrator or another graphics editor. I do not say that graphic tools are prohibited. In this case, you just need to clearly understand the ratio of numbers and shares. After all, 3% cannot occupy a bigger part than 6.4%. Although, judging by this image, everything is possible.
So, when you build pie charts, the so-called donuts, move from the largest values to the smallest. By doing so you systematize information making it easy to comprehend. Start with larger shares then reduce them following a circle.
I suggest this version.
I found the following graph in the video from "Izvestia" just like the previous donut. Probably, they specifically refer to "Business Russia" because they saw a mess with a picture.
If you draw a line and see how the numbers 9, 21 and 80 correlate, it becomes clear that this graph was also not constructed correctly. In this case I do not have a clear understanding of whether this was a deliberate manipulation or a simple mistake.
1. The first case with a zero scale is the simplest and most widespread method of conscious data manipulation. Every year these infographics can be found in well-known campaigns. One of the recent and loudest cases is Donald Trump's election campaign. Look at the graphs that he posted on Twitter at The Washington Post article. In some cases, the information was distorted even not in his favor. That's interesting, right?
2. Graphs with errors in reflected dynamics are more likely to be created due to lack of knowledge. After all, they do not demonstrate the benefits that should be hyperbolized. This is just incorrect information. It's similar to mixing up letters in words: the essence is clear, but text is written with mistakes. When you have data at uneven intervals, it is better to build columns rather than lines. Then they can stand next to each other. This is a commonly used option by the way.
3. The wrong proportions of shares on pie-charts can be explained both by manipulation and by neglecting the rules of construction. The main thing one should keep in mind is that visualization must demonstrate, not "create" the data. It should transmit the information, not distort it.