Introduction

This document introduces you to a basic set of functions that describe data continuous data. The other two vignettes introduce you to functions that describe categorical data and visualization options.

Summary Statistics

The ds_summary_stats function returns a comprehensive set of statistics including measures of location, variation, symmetry and extreme observations.

You can pass multiple variables as shown below:

ds_summary_stats(mtcarz, mpg, disp)
#> ------------------------------ Variable: mpg ------------------------------
#> 
#>                         Univariate Analysis                          
#> 
#>  N                       32.00      Variance                36.32 
#>  Missing                  0.00      Std Deviation            6.03 
#>  Mean                    20.09      Range                   23.50 
#>  Median                  19.20      Interquartile Range      7.38 
#>  Mode                    10.40      Uncorrected SS       14042.31 
#>  Trimmed Mean            19.95      Corrected SS          1126.05 
#>  Skewness                 0.67      Coeff Variation         30.00 
#>  Kurtosis                -0.02      Std Error Mean           1.07 
#> 
#>                               Quantiles                               
#> 
#>               Quantile                            Value                
#> 
#>              Max                                  33.90                
#>              99%                                  33.44                
#>              95%                                  31.30                
#>              90%                                  30.09                
#>              Q3                                   22.80                
#>              Median                               19.20                
#>              Q1                                   15.43                
#>              10%                                  14.34                
#>              5%                                   12.00                
#>              1%                                   10.40                
#>              Min                                  10.40                
#> 
#>                             Extreme Values                            
#> 
#>                 Low                                High                
#> 
#>   Obs                        Value       Obs                        Value 
#>   15                         10.4        20                         33.9  
#>   16                         10.4        18                         32.4  
#>   24                         13.3        19                         30.4  
#>    7                         14.3        28                         30.4  
#>   17                         14.7        26                         27.3  
#> 
#> 
#> 
#> ------------------------------ Variable: disp -----------------------------
#> 
#>                           Univariate Analysis                            
#> 
#>  N                         32.00      Variance               15360.80 
#>  Missing                    0.00      Std Deviation            123.94 
#>  Mean                     230.72      Range                    400.90 
#>  Median                   196.30      Interquartile Range      205.18 
#>  Mode                     275.80      Uncorrected SS       2179627.47 
#>  Trimmed Mean             228.00      Corrected SS          476184.79 
#>  Skewness                   0.42      Coeff Variation           53.72 
#>  Kurtosis                  -1.07      Std Error Mean            21.91 
#> 
#>                                 Quantiles                                 
#> 
#>                Quantile                              Value                 
#> 
#>               Max                                    472.00                
#>               99%                                    468.28                
#>               95%                                    449.00                
#>               90%                                    396.00                
#>               Q3                                     326.00                
#>               Median                                 196.30                
#>               Q1                                     120.83                
#>               10%                                    80.61                 
#>               5%                                     77.35                 
#>               1%                                     72.53                 
#>               Min                                    71.10                 
#> 
#>                               Extreme Values                              
#> 
#>                  Low                                  High                 
#> 
#>   Obs                          Value       Obs                          Value 
#>   20                           71.1        15                            472  
#>   19                           75.7        16                            460  
#>   18                           78.7        17                            440  
#>   26                            79         25                            400  
#>   28                           95.1         5                            360

If you do not specify any variables, it will detect all the continuous variables in the data set and return summary statistics for each of them.

Auto Summary

If you want to view summary statistics and frequency tables of all or subset of variables in a data set, use ds_auto_summary().

ds_auto_summary_stats(mtcarz, disp, mpg)
#> ------------------------------ Variable: disp -----------------------------
#> 
#> ---------------------------- Summary Statistics ---------------------------
#> 
#> ------------------------------ Variable: disp -----------------------------
#> 
#>                           Univariate Analysis                            
#> 
#>  N                         32.00      Variance               15360.80 
#>  Missing                    0.00      Std Deviation            123.94 
#>  Mean                     230.72      Range                    400.90 
#>  Median                   196.30      Interquartile Range      205.18 
#>  Mode                     275.80      Uncorrected SS       2179627.47 
#>  Trimmed Mean             228.00      Corrected SS          476184.79 
#>  Skewness                   0.42      Coeff Variation           53.72 
#>  Kurtosis                  -1.07      Std Error Mean            21.91 
#> 
#>                                 Quantiles                                 
#> 
#>                Quantile                              Value                 
#> 
#>               Max                                    472.00                
#>               99%                                    468.28                
#>               95%                                    449.00                
#>               90%                                    396.00                
#>               Q3                                     326.00                
#>               Median                                 196.30                
#>               Q1                                     120.83                
#>               10%                                    80.61                 
#>               5%                                     77.35                 
#>               1%                                     72.53                 
#>               Min                                    71.10                 
#> 
#>                               Extreme Values                              
#> 
#>                  Low                                  High                 
#> 
#>   Obs                          Value       Obs                          Value 
#>   20                           71.1        15                            472  
#>   19                           75.7        16                            460  
#>   18                           78.7        17                            440  
#>   26                            79         25                            400  
#>   28                           95.1         5                            360  
#> 
#> 
#> 
#> NULL
#> 
#> 
#> -------------------------- Frequency Distribution -------------------------
#> 
#>                                Variable: disp                                 
#> |---------------------------------------------------------------------------|
#> |      Bins       | Frequency | Cum Frequency |   Percent    | Cum Percent  |
#> |---------------------------------------------------------------------------|
#> |  71.1  - 151.3  |    12     |      12       |     37.5     |     37.5     |
#> |---------------------------------------------------------------------------|
#> | 151.3  - 231.5  |     5     |      17       |    15.62     |    53.12     |
#> |---------------------------------------------------------------------------|
#> | 231.5  - 311.6  |     6     |      23       |    18.75     |    71.88     |
#> |---------------------------------------------------------------------------|
#> | 311.6  - 391.8  |     5     |      28       |    15.62     |     87.5     |
#> |---------------------------------------------------------------------------|
#> | 391.8  -  472   |     4     |      32       |     12.5     |     100      |
#> |---------------------------------------------------------------------------|
#> |      Total      |    32     |       -       |    100.00    |      -       |
#> |---------------------------------------------------------------------------|
#> 
#> 
#> ------------------------------ Variable: mpg ------------------------------
#> 
#> ---------------------------- Summary Statistics ---------------------------
#> 
#> ------------------------------ Variable: mpg ------------------------------
#> 
#>                         Univariate Analysis                          
#> 
#>  N                       32.00      Variance                36.32 
#>  Missing                  0.00      Std Deviation            6.03 
#>  Mean                    20.09      Range                   23.50 
#>  Median                  19.20      Interquartile Range      7.38 
#>  Mode                    10.40      Uncorrected SS       14042.31 
#>  Trimmed Mean            19.95      Corrected SS          1126.05 
#>  Skewness                 0.67      Coeff Variation         30.00 
#>  Kurtosis                -0.02      Std Error Mean           1.07 
#> 
#>                               Quantiles                               
#> 
#>               Quantile                            Value                
#> 
#>              Max                                  33.90                
#>              99%                                  33.44                
#>              95%                                  31.30                
#>              90%                                  30.09                
#>              Q3                                   22.80                
#>              Median                               19.20                
#>              Q1                                   15.43                
#>              10%                                  14.34                
#>              5%                                   12.00                
#>              1%                                   10.40                
#>              Min                                  10.40                
#> 
#>                             Extreme Values                            
#> 
#>                 Low                                High                
#> 
#>   Obs                        Value       Obs                        Value 
#>   15                         10.4        20                         33.9  
#>   16                         10.4        18                         32.4  
#>   24                         13.3        19                         30.4  
#>    7                         14.3        28                         30.4  
#>   17                         14.7        26                         27.3  
#> 
#> 
#> 
#> NULL
#> 
#> 
#> -------------------------- Frequency Distribution -------------------------
#> 
#>                               Variable: mpg                               
#> |-----------------------------------------------------------------------|
#> |    Bins     | Frequency | Cum Frequency |   Percent    | Cum Percent  |
#> |-----------------------------------------------------------------------|
#> | 10.4 - 15.1 |     6     |       6       |    18.75     |    18.75     |
#> |-----------------------------------------------------------------------|
#> | 15.1 - 19.8 |    12     |      18       |     37.5     |    56.25     |
#> |-----------------------------------------------------------------------|
#> | 19.8 - 24.5 |     8     |      26       |      25      |    81.25     |
#> |-----------------------------------------------------------------------|
#> | 24.5 - 29.2 |     2     |      28       |     6.25     |     87.5     |
#> |-----------------------------------------------------------------------|
#> | 29.2 - 33.9 |     4     |      32       |     12.5     |     100      |
#> |-----------------------------------------------------------------------|
#> |    Total    |    32     |       -       |    100.00    |      -       |
#> |-----------------------------------------------------------------------|

Group Summary

The ds_group_summary() function returns descriptive statistics of a continuous variable for the different levels of a categorical variable.

ds_group_summary() returns a tibble which can be used for further analysis.

Box Plot

A plot() method has been defined for comparing distributions.

k <- ds_group_summary(mtcarz, cyl, mpg)
plot(k)

Multiple Variables

If you want grouped summary statistics for multiple variables in a data set, use ds_auto_group_summary().

ds_auto_group_summary(mtcarz, cyl, gear, mpg)
#>                                        mpg by cyl                                         
#> -----------------------------------------------------------------------------------------
#> |     Statistic/Levels|                    4|                    6|                    8|
#> -----------------------------------------------------------------------------------------
#> |                  Obs|                   11|                    7|                   14|
#> |              Minimum|                 21.4|                 17.8|                 10.4|
#> |              Maximum|                 33.9|                 21.4|                 19.2|
#> |                 Mean|                26.66|                19.74|                 15.1|
#> |               Median|                   26|                 19.7|                 15.2|
#> |                 Mode|                 22.8|                   21|                 10.4|
#> |       Std. Deviation|                 4.51|                 1.45|                 2.56|
#> |             Variance|                20.34|                 2.11|                 6.55|
#> |             Skewness|                 0.35|                -0.26|                -0.46|
#> |             Kurtosis|                -1.43|                -1.83|                 0.33|
#> |       Uncorrected SS|              8023.83|              2741.14|              3277.34|
#> |         Corrected SS|               203.39|                12.68|                 85.2|
#> |      Coeff Variation|                16.91|                 7.36|                16.95|
#> |      Std. Error Mean|                 1.36|                 0.55|                 0.68|
#> |                Range|                 12.5|                  3.6|                  8.8|
#> |  Interquartile Range|                  7.6|                 2.35|                 1.85|
#> -----------------------------------------------------------------------------------------
#> 
#> 
#> 
#>                                        mpg by gear                                        
#> -----------------------------------------------------------------------------------------
#> |     Statistic/Levels|                    3|                    4|                    5|
#> -----------------------------------------------------------------------------------------
#> |                  Obs|                   15|                   12|                    5|
#> |              Minimum|                 10.4|                 17.8|                   15|
#> |              Maximum|                 21.5|                 33.9|                 30.4|
#> |                 Mean|                16.11|                24.53|                21.38|
#> |               Median|                 15.5|                 22.8|                 19.7|
#> |                 Mode|                 10.4|                   21|                   15|
#> |       Std. Deviation|                 3.37|                 5.28|                 6.66|
#> |             Variance|                11.37|                27.84|                44.34|
#> |             Skewness|                -0.09|                  0.7|                 0.56|
#> |             Kurtosis|                -0.38|                -0.77|                -1.83|
#> |       Uncorrected SS|              4050.52|               7528.9|              2462.89|
#> |         Corrected SS|               159.15|               306.29|               177.37|
#> |      Coeff Variation|                20.93|                21.51|                31.15|
#> |      Std. Error Mean|                 0.87|                 1.52|                 2.98|
#> |                Range|                 11.1|                 16.1|                 15.4|
#> |  Interquartile Range|                  3.9|                 7.08|                 10.2|
#> -----------------------------------------------------------------------------------------

Measures

If you want to view the measure of location, variation, symmetry, percentiles and extreme observations as tibbles, use the below functions. All of them, except for ds_extreme_obs() will work with single or multiple variables. If you do not specify the variables, they will return the results for all the continuous variables in the data set.