Posts Tagged ‘data visualization’
Disease incidence is usually connected to biological factors such as genetics, eating habits, exercise and so on. But are there are other socioeconomic factors that inﬂuence disease incidence as well? This TED talk from Bill Davenhall inspired us to explore socioeconomic factors that may influence disease incidence.
To explore the connections of socioeconomic factors such as education levels, regional population, income level of the area where you live and the air pollution in terms of toxic levels, we developed a visualization tool called DiseaseTrends. DiseaseTrends allows the exploration of possible correlations between those socioeconomic factors with diabetes prevalence and cancer incidence rates across counties throughout the United States. A user can interactively explore these factors at a county, regional (user defined cluster of counties), state or national level.
When a user explicitly selects a county, we display 5 similar counties based on their socioeconomic factors. The motivation behind this feature is to allow users to identify similar counties that may have varying disease incidence rates, which may in turn lead to further exploration.
As mentioned above, a user can specify regions manually that cross state boundaries. A user defined circular cluster can be specified using Ctrl on PC, Cmd on Mac – then click and drag. Here the user has specified four regions. The maximum prevalence (in red) and the minimum (in green) across the selected region is highlighted in the panels below.
Through this tool, we can easily see the now popular diabetes belt, as shown here
High incidence rates in Native Indian reservations such as Navajo County, Sioux County, Rolette County and Big Horn County too can be seen.
We would like to mention that DiseaseTrends does not imply any causation and can merely hint at possible associations. It is completely up to a researcher in the field of public policy / public health to further investigate the findings.
More details about DiseaseTrends can be found in our paper.
Bernice E. Rogowitz covered fundamentals in human perception and cognition, and discussed how they apply to visualization. She covered a huge array of topics, ranging from the pupil being partially responsible for our depth perception, all the way to color theory and how it relates directly to the biology of the human eye.
The presentation had a great flow, starting at a very high level to give everyone an idea of what questions they would be able to answer at the end. As the talk progressed, she covered detailed biological details of the human eye, and progressed to the intersection of perceptual issues and computer science.
In the biological portion, we learned that there are five layers of cells in the retina, each responsible for different tasks. Much of the interesting stuff happens at the very beginning (photoreceptor distribution) and then further into the process at the ganglion cells. She went over how lateral inhibition is caused by the spatial distribution of the photoreceptors connected to a single ganglion cell, and how this is the reason for several of the optical illusions we perceive. She did a great job of explaining the connections between biology and perceptual issues.
Cultural differences were also addressed. The eye movements we have are actually learned when we learn how to read. Cultures with different reading directions have substantially different reading directions.
The section on the Striate Cortex was especially interesting. This is the first time in the visual system that images from each eye are merged (the point where depth perception occurs). This section sends output to 60% of the brain! This is a huge amount, and makes the visual system incredibly important to the decision making process.
This tutorial had a huge quantity of useful information and was really well put together! She concluded with a great summary of four things to remember:
- There are different response rates for different stimuli, how well do you want to convey magnitude information?
- Color and luminance mechanisms have different spatial sensitivities.
- Certain visual information is perceived “pre-attentively” such as color.
- How the world is perceived depends on what the user is trying to accomplish.
Data visualization is being used for detecting fraud, especially with respect to wire and credit card transactions. Work done at the Charlotte Visualization Center at UNC Charlotte provides some interesting insights into fraud detection. This work was conducted in collaboration with the Bank of America.In the following paper they highlight four visualization techniques that allow for fraud detection.
Scalable and Interactive Visual Analysis of Financial Wire Transactions for Fraud Detection, Remco Chang, Alvin Lee, Mohammad Ghoniem, Robert Kosara, William Ribarsky, Jing Yang, Evan Suma, Caroline Ziemkiewicz, Daniel Kern, Agus Sudjianto, Journal of Information Visualization (IVS).
Search by example: Find accounts with transactions/activity similar to the current account being monitored.
Strings and beads: A line graph based visualization that shows critical events as ‘beads’ on the graph. The use of a log scale for the y-axis is a neat idea and probably allows for improved exploration.
Keyword graph: A graph visualization showing keyword similarity This paper was based on previous work done by the same group titled Wirevis. I would encourage interested readers in reading the original paper as well as the previous paper (Wirevis).
- It is a web-based solution that allows interactive exploration of data for fraud detection.
- Can read a wide variety of file formats (excel/access databases).
- Allows interaction with visualizations such as node-link diagrams, bar charts etc.
You can check out a 10-min video on their website at http://www.centrifugesystems.com/shadowbox/libraries/mediaplayer/Centrifuge-1.8-for-Banking-Fraud-Analysis.flv. As per the company website, it has been used to detect fraud in Bulgaria called the “Bulgarian Money Mule ring”. Seems like a step in the right direction. It would be interesting to see, if they could save and share workspaces for collaborative exploration of data. With their web-based framework, it would make it particularly interesting for investigators located at different locations to immediately access and interact with the current state of the visualization.
Any other companies, products, research papers that you may have heard of that I missed?
Lately, I have been collecting links to videos of talks related to Data Visualization. I found multiple talks for some people and so have categorized them accordingly. I have also tried to provide some context to the individual/group.
I think the first TED talk by Hans Rosling (@hansrosling) got a lot of media attention and made people sit up and appreciate the power of ‘narrative visualization’. He almost make it look like a sport with him serving as the role of a commentator. The title on TED’s website for the talk is “the best stats you’ve ever seen“. I am not sure about that, but it is a very entertaining talk.
It was followed up by an interesting study by information visualization researchers George Robertson, Roland Fernandez, Danyel Fisher, Bongshin Lee and John Stasko in the Infovis 2008 paper titled “Effectiveness of Animation in Trend Visualization.” Here is an interesting excerpt from the abstract of the paper:
Results indicate that trend animation can be challenging to use even for presentations; while it is the fastest technique for presentation and participants find it enjoyable and exciting, it does lead to many participant errors. Animation is the least effective form for analysis; both static depictions of trends are significantly faster than animation, and the small multiples display is more accurate.
Fernanda Viégas and Martin Wattenberg (@wattenberg) (previously at IBM Research) have brought visualization to the masses in through IBM Many Eyes. They have recently started a new venture called FlowingMedia. Here are some links to their talks:
- TEDxSP 2009 – Fernanda Viégas (in Portuguese with english subtitles)
- Stanford CS Department – Democratizing Visualization (wmv)
Manuel Lima (@mslima) of visualcomplexity.com gave an interesting talk at Made by Many. His talk titled Network Visualization in an Age of Interconnectedness was not only an excellent talk, but ended up starting quite a passionate debate which led to Manuel writing a post titled Information Visualization Manifesto. I urge you to read the post and look at the interesting perspectives that infovis experts in the field had to Manuel’s manifesto. Manuel gave another interesting talk at the Creativity and Technology (CaT) 2009: Information Visualization.
Aaron Koblin (@aaronkoblin) has been involved with creating innovative and evocative data visualization pieces such as the New York Talk Exchange, Radiohead’s House of Cards music video (You can see Aaron in the “Making of House of Cards” video), the very entertaining ‘Bicycle built for 2000‘ project and many others.
Making of House of Cards
Links to a couple of Aaron’s talks are below:
Tom Wujec is a fellow at Autodesk. His talk on 3 ways the brain creates meaning provides an amazing insight into our brain. He addresses issues related to why data visualization works and how the brain visualizes data.
Jeff Heer has developed information visualization tools that can be used by developers around the world for creating interactive visualizations of their own data. He is the authors of Prefuse, Flare (Check out the excellent demos) and most recently, Protovis (many great examples online). Lately, he has published an informative articles in the ACM Queue titled A tour through the visualization zoo – Jeffrey Heer, Michael Bostock, Vadim Ogievetsky. He does a great job interviewing Fernanda Viegas and Martin Wattenberg in the ACM Queue. A talk by him at the Stanford HCI seminar can be found here (html link, wmv).
Nicholas Christakis presents a very fascinating talk where he used social data visualization to explore the influence of social networks – “The hidden influence of social networks.” In his talk he says that spreading of obesity is due to your social network. Smoking and even divorce can be linked to the company you keep.
Please let me know if I have missed any interesting data visualization talks that are available online and I will be happy to update the post.
With today’s release of Tableau Public, Tableau Software has opened up infinite possibilites for researchers, corporations and enthusiasts alike to interact, explore and play with their data. More importantly, with Tableau Public one can now have ‘interactive’ visualizations online as opposed to static images. This is a step in the right direction for Data Visualization software, since increasingly one hears from domain experts who want to ‘use’ software and not have to write programs (however small or easy those programs may seem to the developer of the software). Tableau now allows researchers to explore their data and collaborate more effectively instead of having to share static ‘screenshots’ via email.
Such uses of visualization software have already been explored and shown to be hugely successful by the ManyEyes team in their CHI ’08 paper, but the capabilities and strengths of both the products are in somewhat disjoint areas. For example, Tableau focuses on the Business Intelligence community and lacks certain visualizations such as Treemaps or Text visualizations (which ManyEyes seem to do really well). Other interesting and inspiring uses of Tableau Public can be found in their Gallery at http://www.tableausoftware.com/public/gallery. Dont forget to check out the NYC Graffiti workbook that they have online. Detailed training videos can be found at http://www.tableausoftware.com/public/training
ManyEyes – Reader s of this blog already know my fondness for IBM’s Many Eyes. ManyEyes has been a pioneer in this field of online visualization software that facilitates data visualization without the need for programming. Research papers from the Many Eyes team detailing user interactions and unexpected uses of the visualization software can be found at http://bewitched.com/manyeyes.html
Verifiable is another such website that allows online visualization of data. So far the data visualizations that are possible are limited to bar charts, scatter plots and line charts but the trend is definitely promising and I hope they continue to improve the excellent service. A video can be found online at http://verifiable.com/screencast
Swivel is similar to Verifiable, where one can upload data and create online interactive visualizations. Videos for all the features in Swivel can be found at http://www.swivel.com/features. Unfortunately, they have a 15-day free trial that restricts the widespread use of their tools.
As I interact with experts and students from domains as wide as political sciences, biology, economics and so on, I am pleased to hear the awareness that they have for effective visualization but I am sometimes disheartened to have to tell them to learn programming to learn some of our nifty tools. Tableau Public, IBM Many Eyes and others are exceptional in the service that they provide. I envision more research groups, corporate websites and so on posting interactive visualizations with a ‘Powered by Tableau’ icon or something similar in the bottom right corner.
Lately, we have been seeing a high number of ‘bad’ visualizations in media. Over at Infosthetics, they even had a contest to identify the ‘Most Ugly and Useless Infographic‘. It was worth a few chuckles but it definitely made one realize the importance of effective data visualization. It is unfortunate that some people have to make decisions based on such visual representations.
More than just looking at bad visual representations, there seems to be an increasingly constructive trend of redesigning graphs/visualizations that seem to get very popular in media. I wholeheartedly support this endeavor and hope to see more. It is naturally easy to criticize other visualizations, but redesigning it to ‘put your visualization where your mouth is’ takes courage. Here are a few examples.
If you have seen any other interesting visualization critiques, please send them my way and I shall be happy to update this post.
As a new parent, I have always been guilty of driving a compact car when everyone around me keeps telling me that even though SUV’s are bad for the environment they are so much safer in case of an accident. I cringed a bit at every such discussion but thought that maybe they had a point.
But then I thought why not use data visualization to get to the bottom of this and find out what the truth is. Let me preface this by saying that this is my first attempt at visualizing the data I could find for free and any visualization suggestions or data sources that you are aware of will be greatly appreciated.
[Note: No fancy visualizations here :) Only good old bar graphs]
Step 1 – Type of the car vs Fatalities
I first wanted to find out what is the breakdown of car crashes as compared to the type of car. I found that there is extensive data (see data sources below) about car crashes and fatalities. I decided to use fatalities as a measure of how ‘safe’ the car is and so this graph shows the type of car as compared to the fatalities in 2008. I was sad to see that ‘Passenger cars’ were ranked first but happy to see that ‘Light trucks’ were pretty high up too. Minivans, Compact utility and Large Utility vehicles had far fewer fatalities and I was worrying whether my worst fears (SUV/Minivan = safer) were coming true.
Step 2 – Sales for each type of car
But then I thought that the number of accidents obviously is very dependent on the number of cars that get sold per year and if more passenger cars were getting sold, then more of them would be in a fatal accident thus giving it a higher number. So I found out what the car sale numbers were for 2008 (see data source below) and decided to plot that.
Step 3 – Comparing the Fatalities/Sales ratio
Then the next obvious thing to do was to compute a ratio of the number of fatal accident per type of car with the number of cars sold for that type in a year. On computing the ratio, I found something very interesting. Sorting the graph based on this ratio, I found that Compact Utility vehicles had the highest ratio of fatal accidents to sales. If you look at the first graph, you will see that the compact utility vehicles do not have a large amount of fatal accidents to begin with, but then when that number is divided by the total amount of compact utility vehicles sold, we find an interesting insight (much to my relief and joy).
Passenger cars have a lower ratio than Compact utility vehicles, Large utility vehicles and Light trucks. :)
Anyone who has used Tableau has probably already guessed that all these visualizations were created using Tableau Software and so I visualized the Ratio, Fatal Accidents, Sales all in one image. It shows clearly how compact utility vehicles have a high ratio even though trucks and passenger cars have higher fatalities and more cars of those types were sold.
My current data sources are (Please let me know if you are aware of better ones):
Fatality analysis reporting system – http://www-fars.nhtsa.dot.gov/States/StatesCrashesAndAllVictims.aspx
WSJ – Car sales for the year so far – http://online.wsj.com/mdc/public/page/2_3022-autosales.html