Data Visualization In Data Science

  • Uploaded by: Maloy Manna
  • 0
  • 0
  • January 2021
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Data Visualization In Data Science as PDF for free.

More details

  • Words: 1,019
  • Pages: 34
Loading documents preview...
Data Visualization in Data Science Maloy Manna biguru.wordpress.com

linkedin.com/in/maloy

twitter.com/itsmaloy

Synopsis Having data is not enough. Adding context to data is essential to understand the data, find patterns and engage audiences. Data visualization is a key element of data science, the interdisciplinary field which deals with finding insights from data. • In this webinar, we explore the roles of data visualization at different stages of the data science process, and why it is essential. • We also look at how data is encoded visually with shape, size, color and other variables and also the basic principles of visual encoding can be applied to build better visualizations. • We cover narratives, types of bias and maps. • Finally we look at how various tools – both open source and off-the-shelf software that’s used in data science to build effective data visualizations.

Speaker profile Maloy Manna Project Manager - Engineering AXA Data Innovation Lab

• Over 14 years experience building data driven products and services • Previous organizations: Thomson Reuters, Saama, Infosys, TCS

biguru.wordpress.com

linkedin.com/in/maloy

twitter.com/itsmaloy

Contents   

  

Defining Data visualization Data science process Data visualization Visual encoding of data Narrative structures Dataviz Technology & Tools

Defining Data visualization • • •



Visual display of quantitative information Mapping data to visual elements Encoding data with size, shape, color... Storytelling / narrative elements

Defining Data Visualization

Exploratory • •

Find insights Conversation between data and “you”

Explanatory •

Present insights

Data science project life-cycle • •





• •

Acquire data Prepare data Analysis & Modeling Evaluation & Interpretation Deployment Operations & Optimization

Data science process

EDA: Exploratory Data Analysis Data Wrangling Exploratory

Explanatory

Data Visualization

Source: Computational Information Design | Ben Fry

Exploratory data visualization

Data analysis approaches: Classical: Problem > Data > Model > Analysis > Conclusions

EDA: [Exploratory Data Analysis] Problem > Data > Analysis > Model > Conclusions

Bayesian: Problem > Data > Model > Prior distribution > Analysis > Conclusions

EDA = approach, not a set of techniques

Exploratory data visualization Statistical approaches: •

Quantitative •

• • • •

Hypothesis testing Analysis of variance (ANOVA) Point estimates and confidence intervals Least squares regression

Graphical • • •

• • •

Scatter plots Histograms Probability plots Residual plots Box plots Block plots

Exploratory data visualization Graphical • • •

• • •

Scatter plots Histograms Probability plots Residual plots Box plots Block plots

Exploratory data visualization

Graphical analysis procedures: • • • • • •



Testing assumptions Model selection Model validation Estimator selection Relationship identification Factor effect determination Outlier detection

MUST USE for deriving insights from data

Exploratory data analysis

Anscombe's quartet N=11 Mean of X = 9.0 Mean of Y = 7.5 Intercept = 3 Slope = 0.5 Residual standard deviation = 1.237 Correlation = 0.816

Exploratory data analysis

Explanatory data visualization   

Design Engineering Journalism

Explanatory data visualization

Visualization is both an art and science •

Harry Beck's subway map of London

Visual encoding of data Data Types •



Quantitative • Continuous, Discrete Categorical • Nominal, Ordered, Interval

Visual encoding of data Categorical scales and graph design

Visual encoding of data Bandwidth of our senses: [Tor Norretranders]

Visual encoding of data

Data → visual display elements • • •

Position x Position y Retinal variables • •



Size, Orientation (ordered data) Color Hue, Shape (nominal data)

Animation

Visual encoding of data

Ranking visual display elements (framework): 1. 2.

Position along a common-scale e.g. scatter plots Position on identical but non-aligned scales

E.g. multiple scatter plots 3. Length e.g. bar chart 4. Angle & Slope e.g. pie-chart 5. Area e.g. bubbles 6. 7.

Volume, density & color saturation e.g. heat-map Color hue e.g. highlights

Ref. Graphical Perception & graphical methods for analyzing scientific data – William Cleveland & Robert McGill (1985)

Design principles 

Choose the right type of chart • • • •

     

Trends / Change over time → Line charts Distributions → Histograms Summary Information → Table Relationships → Scatter Plots

Get it right in black & white (before adding color) Prefer 2D to 3D for statistical charts Use color to highlight Avoid rainbow palette Avoid chartjunk : “less is more” Try to have a high data-ink ratio

Design principles 

Choose the right type of chart

Ranking

Time-series

Correlation

Nominal comparison

Deviation

Narrative structures

Data Journalism Traditional journalism

Data journalism

• Data around narrative

• Narrative around data

• Linear flow

• Complex, often non-linear flow

• Physical static media

• Online interactive media

Narrative structures

Narrative structures

Narrative structures Bias (and ethics: Don’t lie with data)

Bar-charts must have a zero-baseline  Present data in its context

Narrative structures Bias: Misleading with data 

Selective presentation with line-charts

• Author Bias • Data Bias • Reader Bias

Narrative structures Bias and Errors (statistics): • •

Selection bias e.g. in sampling Omitted-variable bias

Errors: • •

Hypothesis testing Null Hypothesis = default/no-effect state Null Hypothesis H0

Valid

Invalid

Reject

Type I error • False positive

Correct inference • True positive

Accept

Correct inference • True negative

Type II error • False negative

Narrative structures Storytelling: 

Visual narratives have moved from author-driven to viewerdriven with use of highly interactive media for data visualization

Author-driven

Viewer-driven

Author driven

Viewer driven

Strong ordering

Exploratory

Heavy messaging

Ability to ask questions

Need for clarity and speed

Build own story

DataViz Technologies & Tools Off-the-shelf: 

Tableau, Qlikview

Tools:  

Predefined charts: Raw, Chartio, Plotly Google fusion tables, Excel, Gephi

Code & Javascript libraries:  

 

R ggplot2, ggvis, rCharts + shiny(interactive apps) Python matplotlib, D3.js, Dimple.js, Leaflet, Rickshaw (use JSON data) Linux gnuplot

DataViz Technologies & Tools Tableau data viz

DataViz Technologies & Tools Chart in R ggplot2

References Visual display of Quantitative Information: Edward Tufte http://goo.gl/qb5ej Exploratory Data Analysis: John Tukey http://goo.gl/tV57HP Data Science Life cycle : Maloy Manna http://www.datasciencecentral.com/profiles/blogs/the-data-science-project-lifecycle Selecting right graph for your message: Stephen Few www.perceptualedge.com/articles/ie/the_right_graph.pdf Practical rules for using color in charts: Stephen Few www.perceptualedge.com/articles/visual.../rules_for_using_color.pdf OpenIntro Statistics: https://www.openintro.org/stat/ Misleading with statistics: Eric Portelance https://medium.com/i-data/misleading-with-statistics-c63780efa928 Computational Information Design: Ben Fry http://benfry.com/phd/dissertation-050312b-acrobat.pdf

Related Documents


More Documents from "Tunde Asaaju"