Overview

Dataset statistics

Number of variables16
Number of observations891
Missing cells869
Missing cells (%)6.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory99.3 KiB
Average record size in memory114.1 B

Variable types

Numeric5
Categorical8
Boolean3

Warnings

who is highly correlated with sex and 1 other fieldsHigh correlation
embark_town is highly correlated with embarkedHigh correlation
pclass is highly correlated with classHigh correlation
embarked is highly correlated with embark_townHigh correlation
sex is highly correlated with who and 1 other fieldsHigh correlation
alive is highly correlated with survivedHigh correlation
adult_male is highly correlated with who and 1 other fieldsHigh correlation
survived is highly correlated with aliveHigh correlation
class is highly correlated with pclassHigh correlation
age has 177 (19.9%) missing values Missing
deck has 688 (77.2%) missing values Missing
df_index is uniformly distributed Uniform
df_index has unique values Unique
sibsp has 608 (68.2%) zeros Zeros
parch has 678 (76.1%) zeros Zeros
fare has 15 (1.7%) zeros Zeros

Reproduction

Analysis started2021-03-07 04:46:26.980366
Analysis finished2021-03-07 04:46:33.952795
Duration6.97 seconds
Software versionpandas-profiling v2.11.0
Download configurationconfig.yaml

Variables

df_index
Real number (ℝ≥0)

UNIFORM
UNIQUE

Distinct891
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean445
Minimum0
Maximum890
Zeros1
Zeros (%)0.1%
Memory size7.1 KiB
2021-03-07T13:46:34.062969image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile44.5
Q1222.5
median445
Q3667.5
95-th percentile845.5
Maximum890
Range890
Interquartile range (IQR)445

Descriptive statistics

Standard deviation257.353842
Coefficient of variation (CV)0.5783232405
Kurtosis-1.2
Mean445
Median Absolute Deviation (MAD)223
Skewness0
Sum396495
Variance66231
MonotocityStrictly increasing
2021-03-07T13:46:34.181547image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8901
 
0.1%
2921
 
0.1%
3031
 
0.1%
3021
 
0.1%
3011
 
0.1%
3001
 
0.1%
2991
 
0.1%
2981
 
0.1%
2971
 
0.1%
2961
 
0.1%
Other values (881)881
98.9%
ValueCountFrequency (%)
01
0.1%
11
0.1%
21
0.1%
31
0.1%
41
0.1%
ValueCountFrequency (%)
8901
0.1%
8891
0.1%
8881
0.1%
8871
0.1%
8861
0.1%

survived
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
0
549 
1
342 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters891
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row1
4th row1
5th row0
ValueCountFrequency (%)
0549
61.6%
1342
38.4%
2021-03-07T13:46:34.378991image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-03-07T13:46:34.435803image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
0549
61.6%
1342
38.4%

Most occurring characters

ValueCountFrequency (%)
0549
61.6%
1342
38.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number891
100.0%

Most frequent character per category

ValueCountFrequency (%)
0549
61.6%
1342
38.4%

Most occurring scripts

ValueCountFrequency (%)
Common891
100.0%

Most frequent character per script

ValueCountFrequency (%)
0549
61.6%
1342
38.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII891
100.0%

Most frequent character per block

ValueCountFrequency (%)
0549
61.6%
1342
38.4%

pclass
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
3
491 
1
216 
2
184 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters891
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row1
3rd row3
4th row1
5th row3
ValueCountFrequency (%)
3491
55.1%
1216
24.2%
2184
 
20.7%
2021-03-07T13:46:34.604729image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-03-07T13:46:34.661316image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
3491
55.1%
1216
24.2%
2184
 
20.7%

Most occurring characters

ValueCountFrequency (%)
3491
55.1%
1216
24.2%
2184
 
20.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number891
100.0%

Most frequent character per category

ValueCountFrequency (%)
3491
55.1%
1216
24.2%
2184
 
20.7%

Most occurring scripts

ValueCountFrequency (%)
Common891
100.0%

Most frequent character per script

ValueCountFrequency (%)
3491
55.1%
1216
24.2%
2184
 
20.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII891
100.0%

Most frequent character per block

ValueCountFrequency (%)
3491
55.1%
1216
24.2%
2184
 
20.7%

sex
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
male
577 
female
314 

Length

Max length6
Median length4
Mean length4.704826038
Min length4

Characters and Unicode

Total characters4192
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowmale
2nd rowfemale
3rd rowfemale
4th rowfemale
5th rowmale
ValueCountFrequency (%)
male577
64.8%
female314
35.2%
2021-03-07T13:46:34.829127image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-03-07T13:46:34.898438image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
male577
64.8%
female314
35.2%

Most occurring characters

ValueCountFrequency (%)
e1205
28.7%
m891
21.3%
a891
21.3%
l891
21.3%
f314
 
7.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter4192
100.0%

Most frequent character per category

ValueCountFrequency (%)
e1205
28.7%
m891
21.3%
a891
21.3%
l891
21.3%
f314
 
7.5%

Most occurring scripts

ValueCountFrequency (%)
Latin4192
100.0%

Most frequent character per script

ValueCountFrequency (%)
e1205
28.7%
m891
21.3%
a891
21.3%
l891
21.3%
f314
 
7.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII4192
100.0%

Most frequent character per block

ValueCountFrequency (%)
e1205
28.7%
m891
21.3%
a891
21.3%
l891
21.3%
f314
 
7.5%

age
Real number (ℝ≥0)

MISSING

Distinct88
Distinct (%)12.3%
Missing177
Missing (%)19.9%
Infinite0
Infinite (%)0.0%
Mean29.69911765
Minimum0.42
Maximum80
Zeros0
Zeros (%)0.0%
Memory size7.1 KiB
2021-03-07T13:46:35.583452image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.42
5-th percentile4
Q120.125
median28
Q338
95-th percentile56
Maximum80
Range79.58
Interquartile range (IQR)17.875

Descriptive statistics

Standard deviation14.52649733
Coefficient of variation (CV)0.4891221855
Kurtosis0.1782741536
Mean29.69911765
Median Absolute Deviation (MAD)9
Skewness0.3891077823
Sum21205.17
Variance211.0191247
MonotocityNot monotonic
2021-03-07T13:46:35.706812image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2430
 
3.4%
2227
 
3.0%
1826
 
2.9%
2825
 
2.8%
1925
 
2.8%
3025
 
2.8%
2124
 
2.7%
2523
 
2.6%
3622
 
2.5%
2920
 
2.2%
Other values (78)467
52.4%
(Missing)177
 
19.9%
ValueCountFrequency (%)
0.421
0.1%
0.671
0.1%
0.752
0.2%
0.832
0.2%
0.921
0.1%
ValueCountFrequency (%)
801
0.1%
741
0.1%
712
0.2%
70.51
0.1%
702
0.2%

sibsp
Real number (ℝ≥0)

ZEROS

Distinct7
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5230078563
Minimum0
Maximum8
Zeros608
Zeros (%)68.2%
Memory size7.1 KiB
2021-03-07T13:46:35.814324image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile3
Maximum8
Range8
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.102743432
Coefficient of variation (CV)2.108464374
Kurtosis17.88041973
Mean0.5230078563
Median Absolute Deviation (MAD)0
Skewness3.695351727
Sum466
Variance1.216043077
MonotocityNot monotonic
2021-03-07T13:46:35.904925image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0608
68.2%
1209
 
23.5%
228
 
3.1%
418
 
2.0%
316
 
1.8%
87
 
0.8%
55
 
0.6%
ValueCountFrequency (%)
0608
68.2%
1209
 
23.5%
228
 
3.1%
316
 
1.8%
418
 
2.0%
ValueCountFrequency (%)
87
 
0.8%
55
 
0.6%
418
2.0%
316
1.8%
228
3.1%

parch
Real number (ℝ≥0)

ZEROS

Distinct7
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3815937149
Minimum0
Maximum6
Zeros678
Zeros (%)76.1%
Memory size7.1 KiB
2021-03-07T13:46:35.997377image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum6
Range6
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.8060572211
Coefficient of variation (CV)2.112344071
Kurtosis9.778125179
Mean0.3815937149
Median Absolute Deviation (MAD)0
Skewness2.749117047
Sum340
Variance0.6497282437
MonotocityNot monotonic
2021-03-07T13:46:36.078306image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0678
76.1%
1118
 
13.2%
280
 
9.0%
55
 
0.6%
35
 
0.6%
44
 
0.4%
61
 
0.1%
ValueCountFrequency (%)
0678
76.1%
1118
 
13.2%
280
 
9.0%
35
 
0.6%
44
 
0.4%
ValueCountFrequency (%)
61
 
0.1%
55
 
0.6%
44
 
0.4%
35
 
0.6%
280
9.0%

fare
Real number (ℝ≥0)

ZEROS

Distinct248
Distinct (%)27.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean32.20420797
Minimum0
Maximum512.3292
Zeros15
Zeros (%)1.7%
Memory size7.1 KiB
2021-03-07T13:46:36.183472image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile7.225
Q17.9104
median14.4542
Q331
95-th percentile112.07915
Maximum512.3292
Range512.3292
Interquartile range (IQR)23.0896

Descriptive statistics

Standard deviation49.6934286
Coefficient of variation (CV)1.543072528
Kurtosis33.39814088
Mean32.20420797
Median Absolute Deviation (MAD)6.9042
Skewness4.78731652
Sum28693.9493
Variance2469.436846
MonotocityNot monotonic
2021-03-07T13:46:36.310231image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8.0543
 
4.8%
1342
 
4.7%
7.895838
 
4.3%
7.7534
 
3.8%
2631
 
3.5%
10.524
 
2.7%
7.92518
 
2.0%
7.77516
 
1.8%
26.5515
 
1.7%
015
 
1.7%
Other values (238)615
69.0%
ValueCountFrequency (%)
015
1.7%
4.01251
 
0.1%
51
 
0.1%
6.23751
 
0.1%
6.43751
 
0.1%
ValueCountFrequency (%)
512.32923
0.3%
2634
0.4%
262.3752
0.2%
247.52082
0.2%
227.5254
0.4%

embarked
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing2
Missing (%)0.2%
Memory size7.1 KiB
S
644 
C
168 
Q
77 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters889
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowS
2nd rowC
3rd rowS
4th rowS
5th rowS
ValueCountFrequency (%)
S644
72.3%
C168
 
18.9%
Q77
 
8.6%
(Missing)2
 
0.2%
2021-03-07T13:46:36.517472image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-03-07T13:46:36.578469image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
s644
72.4%
c168
 
18.9%
q77
 
8.7%

Most occurring characters

ValueCountFrequency (%)
S644
72.4%
C168
 
18.9%
Q77
 
8.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter889
100.0%

Most frequent character per category

ValueCountFrequency (%)
S644
72.4%
C168
 
18.9%
Q77
 
8.7%

Most occurring scripts

ValueCountFrequency (%)
Latin889
100.0%

Most frequent character per script

ValueCountFrequency (%)
S644
72.4%
C168
 
18.9%
Q77
 
8.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII889
100.0%

Most frequent character per block

ValueCountFrequency (%)
S644
72.4%
C168
 
18.9%
Q77
 
8.7%

class
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
Third
491 
First
216 
Second
184 

Length

Max length6
Median length5
Mean length5.20650954
Min length5

Characters and Unicode

Total characters4639
Distinct characters13
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowThird
2nd rowFirst
3rd rowThird
4th rowFirst
5th rowThird
ValueCountFrequency (%)
Third491
55.1%
First216
24.2%
Second184
 
20.7%
2021-03-07T13:46:36.757051image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-03-07T13:46:36.823213image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
third491
55.1%
first216
24.2%
second184
 
20.7%

Most occurring characters

ValueCountFrequency (%)
i707
15.2%
r707
15.2%
d675
14.6%
T491
10.6%
h491
10.6%
F216
 
4.7%
s216
 
4.7%
t216
 
4.7%
S184
 
4.0%
e184
 
4.0%
Other values (3)552
11.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3748
80.8%
Uppercase Letter891
 
19.2%

Most frequent character per category

ValueCountFrequency (%)
i707
18.9%
r707
18.9%
d675
18.0%
h491
13.1%
s216
 
5.8%
t216
 
5.8%
e184
 
4.9%
c184
 
4.9%
o184
 
4.9%
n184
 
4.9%
ValueCountFrequency (%)
T491
55.1%
F216
24.2%
S184
 
20.7%

Most occurring scripts

ValueCountFrequency (%)
Latin4639
100.0%

Most frequent character per script

ValueCountFrequency (%)
i707
15.2%
r707
15.2%
d675
14.6%
T491
10.6%
h491
10.6%
F216
 
4.7%
s216
 
4.7%
t216
 
4.7%
S184
 
4.0%
e184
 
4.0%
Other values (3)552
11.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII4639
100.0%

Most frequent character per block

ValueCountFrequency (%)
i707
15.2%
r707
15.2%
d675
14.6%
T491
10.6%
h491
10.6%
F216
 
4.7%
s216
 
4.7%
t216
 
4.7%
S184
 
4.0%
e184
 
4.0%
Other values (3)552
11.9%

who
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
man
537 
woman
271 
child
83 

Length

Max length5
Median length3
Mean length3.794612795
Min length3

Characters and Unicode

Total characters3381
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowman
2nd rowwoman
3rd rowwoman
4th rowwoman
5th rowman
ValueCountFrequency (%)
man537
60.3%
woman271
30.4%
child83
 
9.3%
2021-03-07T13:46:36.989629image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-03-07T13:46:37.060041image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
man537
60.3%
woman271
30.4%
child83
 
9.3%

Most occurring characters

ValueCountFrequency (%)
m808
23.9%
a808
23.9%
n808
23.9%
w271
 
8.0%
o271
 
8.0%
c83
 
2.5%
h83
 
2.5%
i83
 
2.5%
l83
 
2.5%
d83
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3381
100.0%

Most frequent character per category

ValueCountFrequency (%)
m808
23.9%
a808
23.9%
n808
23.9%
w271
 
8.0%
o271
 
8.0%
c83
 
2.5%
h83
 
2.5%
i83
 
2.5%
l83
 
2.5%
d83
 
2.5%

Most occurring scripts

ValueCountFrequency (%)
Latin3381
100.0%

Most frequent character per script

ValueCountFrequency (%)
m808
23.9%
a808
23.9%
n808
23.9%
w271
 
8.0%
o271
 
8.0%
c83
 
2.5%
h83
 
2.5%
i83
 
2.5%
l83
 
2.5%
d83
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII3381
100.0%

Most frequent character per block

ValueCountFrequency (%)
m808
23.9%
a808
23.9%
n808
23.9%
w271
 
8.0%
o271
 
8.0%
c83
 
2.5%
h83
 
2.5%
i83
 
2.5%
l83
 
2.5%
d83
 
2.5%

adult_male
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1019.0 B
True
537 
False
354 
ValueCountFrequency (%)
True537
60.3%
False354
39.7%
2021-03-07T13:46:37.104144image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

deck
Categorical

MISSING

Distinct7
Distinct (%)3.4%
Missing688
Missing (%)77.2%
Memory size7.1 KiB
C
59 
B
47 
D
33 
E
32 
A
15 
Other values (2)
17 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters203
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowC
2nd rowC
3rd rowE
4th rowG
5th rowC
ValueCountFrequency (%)
C59
 
6.6%
B47
 
5.3%
D33
 
3.7%
E32
 
3.6%
A15
 
1.7%
F13
 
1.5%
G4
 
0.4%
(Missing)688
77.2%
2021-03-07T13:46:37.275788image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-03-07T13:46:37.344834image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
c59
29.1%
b47
23.2%
d33
16.3%
e32
15.8%
a15
 
7.4%
f13
 
6.4%
g4
 
2.0%

Most occurring characters

ValueCountFrequency (%)
C59
29.1%
B47
23.2%
D33
16.3%
E32
15.8%
A15
 
7.4%
F13
 
6.4%
G4
 
2.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter203
100.0%

Most frequent character per category

ValueCountFrequency (%)
C59
29.1%
B47
23.2%
D33
16.3%
E32
15.8%
A15
 
7.4%
F13
 
6.4%
G4
 
2.0%

Most occurring scripts

ValueCountFrequency (%)
Latin203
100.0%

Most frequent character per script

ValueCountFrequency (%)
C59
29.1%
B47
23.2%
D33
16.3%
E32
15.8%
A15
 
7.4%
F13
 
6.4%
G4
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII203
100.0%

Most frequent character per block

ValueCountFrequency (%)
C59
29.1%
B47
23.2%
D33
16.3%
E32
15.8%
A15
 
7.4%
F13
 
6.4%
G4
 
2.0%

embark_town
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing2
Missing (%)0.2%
Memory size7.1 KiB
Southampton
644 
Cherbourg
168 
Queenstown
77 

Length

Max length11
Median length11
Mean length10.53543307
Min length9

Characters and Unicode

Total characters9366
Distinct characters17
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSouthampton
2nd rowCherbourg
3rd rowSouthampton
4th rowSouthampton
5th rowSouthampton
ValueCountFrequency (%)
Southampton644
72.3%
Cherbourg168
 
18.9%
Queenstown77
 
8.6%
(Missing)2
 
0.2%
2021-03-07T13:46:37.552598image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-03-07T13:46:37.624380image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
southampton644
72.4%
cherbourg168
 
18.9%
queenstown77
 
8.7%

Most occurring characters

ValueCountFrequency (%)
o1533
16.4%
t1365
14.6%
u889
9.5%
h812
8.7%
n798
8.5%
S644
6.9%
a644
6.9%
m644
6.9%
p644
6.9%
r336
 
3.6%
Other values (7)1057
11.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter8477
90.5%
Uppercase Letter889
 
9.5%

Most frequent character per category

ValueCountFrequency (%)
o1533
18.1%
t1365
16.1%
u889
10.5%
h812
9.6%
n798
9.4%
a644
7.6%
m644
7.6%
p644
7.6%
r336
 
4.0%
e322
 
3.8%
Other values (4)490
 
5.8%
ValueCountFrequency (%)
S644
72.4%
C168
 
18.9%
Q77
 
8.7%

Most occurring scripts

ValueCountFrequency (%)
Latin9366
100.0%

Most frequent character per script

ValueCountFrequency (%)
o1533
16.4%
t1365
14.6%
u889
9.5%
h812
8.7%
n798
8.5%
S644
6.9%
a644
6.9%
m644
6.9%
p644
6.9%
r336
 
3.6%
Other values (7)1057
11.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII9366
100.0%

Most frequent character per block

ValueCountFrequency (%)
o1533
16.4%
t1365
14.6%
u889
9.5%
h812
8.7%
n798
8.5%
S644
6.9%
a644
6.9%
m644
6.9%
p644
6.9%
r336
 
3.6%
Other values (7)1057
11.3%

alive
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1019.0 B
False
549 
True
342 
ValueCountFrequency (%)
False549
61.6%
True342
38.4%
2021-03-07T13:46:37.669628image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

alone
Boolean

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1019.0 B
True
537 
False
354 
ValueCountFrequency (%)
True537
60.3%
False354
39.7%
2021-03-07T13:46:37.705396image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Interactions

2021-03-07T13:46:31.017531image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-07T13:46:31.159214image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-07T13:46:31.265883image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-07T13:46:31.366298image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-07T13:46:31.469408image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-07T13:46:31.576221image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-07T13:46:31.684199image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-07T13:46:31.795535image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-07T13:46:31.903808image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-07T13:46:32.015584image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-07T13:46:32.122666image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-07T13:46:32.237989image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-07T13:46:32.350081image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-07T13:46:32.454247image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-07T13:46:32.563867image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-07T13:46:32.676458image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-07T13:46:32.783323image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-07T13:46:32.883851image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-07T13:46:32.990669image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-07T13:46:33.102084image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-03-07T13:46:37.770896image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-03-07T13:46:37.912041image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-03-07T13:46:38.056238image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-03-07T13:46:38.207600image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-03-07T13:46:38.378780image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-03-07T13:46:33.324175image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-03-07T13:46:33.591631image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-03-07T13:46:33.744759image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-03-07T13:46:33.850689image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

df_indexsurvivedpclasssexagesibspparchfareembarkedclasswhoadult_maledeckembark_townalivealone
0003male22.0107.2500SThirdmanTrueNaNSouthamptonnoFalse
1111female38.01071.2833CFirstwomanFalseCCherbourgyesFalse
2213female26.0007.9250SThirdwomanFalseNaNSouthamptonyesTrue
3311female35.01053.1000SFirstwomanFalseCSouthamptonyesFalse
4403male35.0008.0500SThirdmanTrueNaNSouthamptonnoTrue
5503maleNaN008.4583QThirdmanTrueNaNQueenstownnoTrue
6601male54.00051.8625SFirstmanTrueESouthamptonnoTrue
7703male2.03121.0750SThirdchildFalseNaNSouthamptonnoFalse
8813female27.00211.1333SThirdwomanFalseNaNSouthamptonyesFalse
9912female14.01030.0708CSecondchildFalseNaNCherbourgyesFalse

Last rows

df_indexsurvivedpclasssexagesibspparchfareembarkedclasswhoadult_maledeckembark_townalivealone
88188103male33.0007.8958SThirdmanTrueNaNSouthamptonnoTrue
88288203female22.00010.5167SThirdwomanFalseNaNSouthamptonnoTrue
88388302male28.00010.5000SSecondmanTrueNaNSouthamptonnoTrue
88488403male25.0007.0500SThirdmanTrueNaNSouthamptonnoTrue
88588503female39.00529.1250QThirdwomanFalseNaNQueenstownnoFalse
88688602male27.00013.0000SSecondmanTrueNaNSouthamptonnoTrue
88788711female19.00030.0000SFirstwomanFalseBSouthamptonyesTrue
88888803femaleNaN1223.4500SThirdwomanFalseNaNSouthamptonnoFalse
88988911male26.00030.0000CFirstmanTrueCCherbourgyesTrue
89089003male32.0007.7500QThirdmanTrueNaNQueenstownnoTrue