It all started with this tweet. Time was scarce indeed and I had not created an interactive visualization till now. I was supposed to have been doing some d3 this holidays but I never got round to it.
The dataset provided for the contest has 45k+ rows and that was slightly intimidating for a beginner like me.
So I initially decided to take a small set of the data and use it. Like the top 10 found and fell meteorites size wise. Using Tilemill to geotag these was simple anyway.As you can clearly see above, the size of the red and yellow dots represent the found and fallen meteorites respectively. The relative size of theirs is because of their actual size varies that much with the biggest weighing in at 60 tons. Problem with this graphic was that the size was too big for you to pinpoint the targeted coordinate. Certain impact points hid/overlapped with others. No additional information about the meteorites could be presented.
This led me to the second iteration. Importing a slightly better map into Illustrator(via Export as PDF from Tilemill, SVG didn’t work for me), I played around with labelling the meteorites and this resulted with the map below.
Things seemed a little clearer in the above version, but there were still some problems. The scale was skewed a lot if I considered only these values. The yellow dots were nearly invisible and the red ones overshadowed everything. About that time I came across this visualization. That prompted me to play with the impact sizes a bit more. Iterations were made to get the size right. It was a slow process because most of the dots overlapped and made sections obscure. Finally I came up with this before I slept just past midnight.
Playing with the opacity above gave me an idea about the denser areas. At around this time, when I was discussing these visualizations with @Rasagy, @Hashnuke asked if I wanted the reverse geotagged locations so that I could perhaps map the Countries in a sort of choropleth. He said he would write a script and run it using a reverse geotag service like Google. So we set Sunday as the day to do this.
I continued to work on Tilemill and decided to export the map and host it on Mapbox. Decided to learn Wax so that I could build an interactive visualization. At that moment, most of the visualization that you see at the end was forming in my head. With 2 days left to go for the submissions at Visualizing.org. I decided to give it a go.
Playing around with Wax led me to a bug when the map was fullscreen and led me to file my first issue at Github. After that I got stuck while trying to figure out pivot tables in MS Excel. @Sevenaces helped me realise my stupid mistake and I got the data I need to plot the Column charts.
Next, Akash walked me through setting up a github account, hosting a .io repo there. I installed Github for Windows and everything was simple and intuitive. Git incidentally was something I was meaning to learn from a long time, this project gave me and opportunity to do that. The prototype visualization was up and online on Saturday night but it was quite a long way from finishing.
There were obviously problems with the data set. The section at 0,0 seemed to be awfully dense for a point in the middle of the ocean. This led me to review the dataset. I found that more than 10k rows didnt have coordinate data and some of them had 0,0 instead. I decided to clean these rows out of the geotagging. They were bad locations and did not contribute of anything. The dataset was now slightly above 32k rows. More Tilemill followed. Tilemill kept hanging every now and then and I had to close it every few minutes. Frustrating indeed. The huge dataset could be a possible reason. Figuring out the legend and tooltip design took me some more time. Finally the map was done. More hangups followed and I was finally able to export and host the final map on Mapbox.
The next problem came due to the 32k row .csv file. The big file was throwing errors. We then split the dataset into 3 sections and Akash ran the script on Geonames via nitrous.io. He should really write a post on how he did all that. Here’s the scripts and the processed data. There were about 40 bad locations in the dataset which were removed.
The output of the reverse geocoding was the country code. I wanted the country names. This is how I learnt about vlookup in MS Excel. I also learnt how to fill all the blanks in a table and how to divide a column by a number. These are not as straightforward as you think. Excel hung up on me as well. Lot of times. Remember making everything a table helps a lot when doing Excel operations. I used the country name list from here(It’s missing SS=South Sudan). Finally everything seemed ready. Now all I needed was a good scatterplot example to borrow 😉
A quick search of mbostock’s d3 gallery and I located a scatterplot that I could use. It was simple to understand. I promptly hacked the example to meet my demands. I learnt a bit of d3 along the way.
With the final changes all done, I was done with ‘Meteorites: Earth Impact’. In 3 days I learnt such a lot. It was indeed a wicked journey.