Pandas Profiling Report

Dataset statistics

Number of variables	16
Number of observations	891
Missing cells	869
Missing cells (%)	6.1%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	99.3 KiB
Average record size in memory	114.1 B

Variable types

Numeric	5
Categorical	8
Boolean	3

Warnings

`who` is highly correlated with `sex` and 1 other fields	High correlation
`embark_town` is highly correlated with `embarked`	High correlation
`pclass` is highly correlated with `class`	High correlation
`embarked` is highly correlated with `embark_town`	High correlation
`sex` is highly correlated with `who` and 1 other fields	High correlation
`alive` is highly correlated with `survived`	High correlation
`adult_male` is highly correlated with `who` and 1 other fields	High correlation
`survived` is highly correlated with `alive`	High correlation
`class` is highly correlated with `pclass`	High correlation
`age` has 177 (19.9%) missing values	Missing
`deck` has 688 (77.2%) missing values	Missing
`df_index` is uniformly distributed	Uniform
`df_index` has unique values	Unique
`sibsp` has 608 (68.2%) zeros	Zeros
`parch` has 678 (76.1%) zeros	Zeros
`fare` has 15 (1.7%) zeros	Zeros

Reproduction

Analysis started	2021-03-07 04:46:26.980366
Analysis finished	2021-03-07 04:46:33.952795
Duration	6.97 seconds
Software version	pandas-profiling v2.11.0
Download configuration	config.yaml

df_index
Real number (ℝ_≥0)

UNIFORM
UNIQUE

Distinct	891
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%

Mean	445
Minimum	0
Maximum	890
Zeros	1
Zeros (%)	0.1%
Memory size	7.1 KiB

Quantile statistics

Minimum	0
5-th percentile	44.5
Q1	222.5
median	445
Q3	667.5
95-th percentile	845.5
Maximum	890
Range	890
Interquartile range (IQR)	445

Descriptive statistics

Standard deviation	257.353842
Coefficient of variation (CV)	0.5783232405
Kurtosis	-1.2
Mean	445
Median Absolute Deviation (MAD)	223
Skewness	0
Sum	396495
Variance	66231
Monotocity	Strictly increasing

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
890	1	0.1%
292	1	0.1%
303	1	0.1%
302	1	0.1%
301	1	0.1%
300	1	0.1%
299	1	0.1%
298	1	0.1%
297	1	0.1%
296	1	0.1%
Other values (881)	881	98.9%

Minimum 5 values
Maximum 5 values

Value	Count	Frequency (%)
0	1	0.1%
1	1	0.1%
2	1	0.1%
3	1	0.1%
4	1	0.1%

Value	Count	Frequency (%)
890	1	0.1%
889	1	0.1%
888	1	0.1%
887	1	0.1%
886	1	0.1%

survived
Categorical

HIGH CORRELATION

Distinct	2
Distinct (%)	0.2%
Missing	0
Missing (%)	0.0%
Memory size	7.1 KiB

0	549
1	342

Length

Max length	1
Median length	1
Mean length	1
Min length	1

Characters and Unicode

Total characters	891
Distinct characters	2
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	0
2nd row	1
3rd row	1
4th row	1
5th row	0

Value	Count	Frequency (%)
0	549	61.6%
1	342	38.4%

Histogram of lengths of the category

Value	Count	Frequency (%)
0	549	61.6%
1	342	38.4%

Most occurring characters

Value	Count	Frequency (%)
0	549	61.6%
1	342	38.4%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	891	100.0%

Most frequent character per category

Value	Count	Frequency (%)
0	549	61.6%
1	342	38.4%

Most occurring scripts

Value	Count	Frequency (%)
Common	891	100.0%

Most frequent character per script

Value	Count	Frequency (%)
0	549	61.6%
1	342	38.4%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	891	100.0%

Most frequent character per block

Value	Count	Frequency (%)
0	549	61.6%
1	342	38.4%

pclass
Categorical

HIGH CORRELATION

Distinct	3
Distinct (%)	0.3%
Missing	0
Missing (%)	0.0%
Memory size	7.1 KiB

3	491
1	216
2	184

Length

Max length	1
Median length	1
Mean length	1
Min length	1

Characters and Unicode

Total characters	891
Distinct characters	3
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	3
2nd row	1
3rd row	3
4th row	1
5th row	3

Value	Count	Frequency (%)
3	491	55.1%
1	216	24.2%
2	184	20.7%

Histogram of lengths of the category

Value	Count	Frequency (%)
3	491	55.1%
1	216	24.2%
2	184	20.7%

Most occurring characters

Value	Count	Frequency (%)
3	491	55.1%
1	216	24.2%
2	184	20.7%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	891	100.0%

Most frequent character per category

Value	Count	Frequency (%)
3	491	55.1%
1	216	24.2%
2	184	20.7%

Most occurring scripts

Value	Count	Frequency (%)
Common	891	100.0%

Most frequent character per script

Value	Count	Frequency (%)
3	491	55.1%
1	216	24.2%
2	184	20.7%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	891	100.0%

Most frequent character per block

Value	Count	Frequency (%)
3	491	55.1%
1	216	24.2%
2	184	20.7%

sex
Categorical

HIGH CORRELATION

Distinct	2
Distinct (%)	0.2%
Missing	0
Missing (%)	0.0%
Memory size	7.1 KiB

male	577
female	314

Length

Max length	6
Median length	4
Mean length	4.704826038
Min length	4

Characters and Unicode

Total characters	4192
Distinct characters	5
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	male
2nd row	female
3rd row	female
4th row	female
5th row	male

Value	Count	Frequency (%)
male	577	64.8%
female	314	35.2%

Histogram of lengths of the category

Value	Count	Frequency (%)
male	577	64.8%
female	314	35.2%

Most occurring characters

Value	Count	Frequency (%)
e	1205	28.7%
m	891	21.3%
a	891	21.3%
l	891	21.3%
f	314	7.5%

Most occurring categories

Value	Count	Frequency (%)
Lowercase Letter	4192	100.0%

Most frequent character per category

Value	Count	Frequency (%)
e	1205	28.7%
m	891	21.3%
a	891	21.3%
l	891	21.3%
f	314	7.5%

Most occurring scripts

Value	Count	Frequency (%)
Latin	4192	100.0%

Most frequent character per script

Value	Count	Frequency (%)
e	1205	28.7%
m	891	21.3%
a	891	21.3%
l	891	21.3%
f	314	7.5%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	4192	100.0%

Most frequent character per block

Value	Count	Frequency (%)
e	1205	28.7%
m	891	21.3%
a	891	21.3%
l	891	21.3%
f	314	7.5%

age
Real number (ℝ_≥0)

MISSING

Distinct	88
Distinct (%)	12.3%
Missing	177
Missing (%)	19.9%
Infinite	0
Infinite (%)	0.0%

Mean	29.69911765
Minimum	0.42
Maximum	80
Zeros	0
Zeros (%)	0.0%
Memory size	7.1 KiB

Quantile statistics

Minimum	0.42
5-th percentile	4
Q1	20.125
median	28
Q3	38
95-th percentile	56
Maximum	80
Range	79.58
Interquartile range (IQR)	17.875

Descriptive statistics

Standard deviation	14.52649733
Coefficient of variation (CV)	0.4891221855
Kurtosis	0.1782741536
Mean	29.69911765
Median Absolute Deviation (MAD)	9
Skewness	0.3891077823
Sum	21205.17
Variance	211.0191247
Monotocity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
24	30	3.4%
22	27	3.0%
18	26	2.9%
28	25	2.8%
19	25	2.8%
30	25	2.8%
21	24	2.7%
25	23	2.6%
36	22	2.5%
29	20	2.2%
Other values (78)	467	52.4%
(Missing)	177	19.9%

Minimum 5 values
Maximum 5 values

Value	Count	Frequency (%)
0.42	1	0.1%
0.67	1	0.1%
0.75	2	0.2%
0.83	2	0.2%
0.92	1	0.1%

Value	Count	Frequency (%)
80	1	0.1%
74	1	0.1%
71	2	0.2%
70.5	1	0.1%
70	2	0.2%

sibsp
Real number (ℝ_≥0)

ZEROS

Distinct	7
Distinct (%)	0.8%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%

Mean	0.5230078563
Minimum	0
Maximum	8
Zeros	608
Zeros (%)	68.2%
Memory size	7.1 KiB

Quantile statistics

Minimum	0
5-th percentile	0
Q1	0
median	0
Q3	1
95-th percentile	3
Maximum	8
Range	8
Interquartile range (IQR)	1

Descriptive statistics

Standard deviation	1.102743432
Coefficient of variation (CV)	2.108464374
Kurtosis	17.88041973
Mean	0.5230078563
Median Absolute Deviation (MAD)	0
Skewness	3.695351727
Sum	466
Variance	1.216043077
Monotocity	Not monotonic

Histogram with fixed size bins (bins=7)

Value	Count	Frequency (%)
0	608	68.2%
1	209	23.5%
2	28	3.1%
4	18	2.0%
3	16	1.8%
8	7	0.8%
5	5	0.6%

Minimum 5 values
Maximum 5 values

Value	Count	Frequency (%)
0	608	68.2%
1	209	23.5%
2	28	3.1%
3	16	1.8%
4	18	2.0%

Value	Count	Frequency (%)
8	7	0.8%
5	5	0.6%
4	18	2.0%
3	16	1.8%
2	28	3.1%

parch
Real number (ℝ_≥0)

ZEROS

Distinct	7
Distinct (%)	0.8%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%

Mean	0.3815937149
Minimum	0
Maximum	6
Zeros	678
Zeros (%)	76.1%
Memory size	7.1 KiB

Quantile statistics

Minimum	0
5-th percentile	0
Q1	0
median	0
Q3	0
95-th percentile	2
Maximum	6
Range	6
Interquartile range (IQR)	0

Descriptive statistics

Standard deviation	0.8060572211
Coefficient of variation (CV)	2.112344071
Kurtosis	9.778125179
Mean	0.3815937149
Median Absolute Deviation (MAD)	0
Skewness	2.749117047
Sum	340
Variance	0.6497282437
Monotocity	Not monotonic

Histogram with fixed size bins (bins=7)

Value	Count	Frequency (%)
0	678	76.1%
1	118	13.2%
2	80	9.0%
5	5	0.6%
3	5	0.6%
4	4	0.4%
6	1	0.1%

Minimum 5 values
Maximum 5 values

Value	Count	Frequency (%)
0	678	76.1%
1	118	13.2%
2	80	9.0%
3	5	0.6%
4	4	0.4%

Value	Count	Frequency (%)
6	1	0.1%
5	5	0.6%
4	4	0.4%
3	5	0.6%
2	80	9.0%

fare
Real number (ℝ_≥0)

ZEROS

Distinct	248
Distinct (%)	27.8%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%

Mean	32.20420797
Minimum	0
Maximum	512.3292
Zeros	15
Zeros (%)	1.7%
Memory size	7.1 KiB

Quantile statistics

Minimum	0
5-th percentile	7.225
Q1	7.9104
median	14.4542
Q3	31
95-th percentile	112.07915
Maximum	512.3292
Range	512.3292
Interquartile range (IQR)	23.0896

Descriptive statistics

Standard deviation	49.6934286
Coefficient of variation (CV)	1.543072528
Kurtosis	33.39814088
Mean	32.20420797
Median Absolute Deviation (MAD)	6.9042
Skewness	4.78731652
Sum	28693.9493
Variance	2469.436846
Monotocity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
8.05	43	4.8%
13	42	4.7%
7.8958	38	4.3%
7.75	34	3.8%
26	31	3.5%
10.5	24	2.7%
7.925	18	2.0%
7.775	16	1.8%
26.55	15	1.7%
0	15	1.7%
Other values (238)	615	69.0%

Minimum 5 values
Maximum 5 values

Value	Count	Frequency (%)
0	15	1.7%
4.0125	1	0.1%
5	1	0.1%
6.2375	1	0.1%
6.4375	1	0.1%

Value	Count	Frequency (%)
512.3292	3	0.3%
263	4	0.4%
262.375	2	0.2%
247.5208	2	0.2%
227.525	4	0.4%

embarked
Categorical

HIGH CORRELATION

Distinct	3
Distinct (%)	0.3%
Missing	2
Missing (%)	0.2%
Memory size	7.1 KiB

S	644
C	168
Q	77

Length

Max length	1
Median length	1
Mean length	1
Min length	1

Characters and Unicode

Total characters	889
Distinct characters	3
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	S
2nd row	C
3rd row	S
4th row	S
5th row	S

Value	Count	Frequency (%)
S	644	72.3%
C	168	18.9%
Q	77	8.6%
(Missing)	2	0.2%

Histogram of lengths of the category

Value	Count	Frequency (%)
s	644	72.4%
c	168	18.9%
q	77	8.7%

Most occurring characters

Value	Count	Frequency (%)
S	644	72.4%
C	168	18.9%
Q	77	8.7%

Most occurring categories

Value	Count	Frequency (%)
Uppercase Letter	889	100.0%

Most frequent character per category

Value	Count	Frequency (%)
S	644	72.4%
C	168	18.9%
Q	77	8.7%

Most occurring scripts

Value	Count	Frequency (%)
Latin	889	100.0%

Most frequent character per script

Value	Count	Frequency (%)
S	644	72.4%
C	168	18.9%
Q	77	8.7%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	889	100.0%

Most frequent character per block

Value	Count	Frequency (%)
S	644	72.4%
C	168	18.9%
Q	77	8.7%

class
Categorical

HIGH CORRELATION

Distinct	3
Distinct (%)	0.3%
Missing	0
Missing (%)	0.0%
Memory size	7.1 KiB

Third	491
First	216
Second	184

Length

Max length	6
Median length	5
Mean length	5.20650954
Min length	5

Characters and Unicode

Total characters	4639
Distinct characters	13
Distinct categories	2 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	Third
2nd row	First
3rd row	Third
4th row	First
5th row	Third

Value	Count	Frequency (%)
Third	491	55.1%
First	216	24.2%
Second	184	20.7%

Histogram of lengths of the category

Value	Count	Frequency (%)
third	491	55.1%
first	216	24.2%
second	184	20.7%

Most occurring characters

Value	Count	Frequency (%)
i	707	15.2%
r	707	15.2%
d	675	14.6%
T	491	10.6%
h	491	10.6%
F	216	4.7%
s	216	4.7%
t	216	4.7%
S	184	4.0%
e	184	4.0%
Other values (3)	552	11.9%

Most occurring categories

Value	Count	Frequency (%)
Lowercase Letter	3748	80.8%
Uppercase Letter	891	19.2%

Most frequent character per category

Value	Count	Frequency (%)
i	707	18.9%
r	707	18.9%
d	675	18.0%
h	491	13.1%
s	216	5.8%
t	216	5.8%
e	184	4.9%
c	184	4.9%
o	184	4.9%
n	184	4.9%

Value	Count	Frequency (%)
T	491	55.1%
F	216	24.2%
S	184	20.7%

Most occurring scripts

Value	Count	Frequency (%)
Latin	4639	100.0%

Most frequent character per script

Value	Count	Frequency (%)
i	707	15.2%
r	707	15.2%
d	675	14.6%
T	491	10.6%
h	491	10.6%
F	216	4.7%
s	216	4.7%
t	216	4.7%
S	184	4.0%
e	184	4.0%
Other values (3)	552	11.9%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	4639	100.0%

Most frequent character per block

Value	Count	Frequency (%)
i	707	15.2%
r	707	15.2%
d	675	14.6%
T	491	10.6%
h	491	10.6%
F	216	4.7%
s	216	4.7%
t	216	4.7%
S	184	4.0%
e	184	4.0%
Other values (3)	552	11.9%

who
Categorical

HIGH CORRELATION

Distinct	3
Distinct (%)	0.3%
Missing	0
Missing (%)	0.0%
Memory size	7.1 KiB

man	537
woman	271
child	83

Length

Max length	5
Median length	3
Mean length	3.794612795
Min length	3

Characters and Unicode

Total characters	3381
Distinct characters	10
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	man
2nd row	woman
3rd row	woman
4th row	woman
5th row	man

Value	Count	Frequency (%)
man	537	60.3%
woman	271	30.4%
child	83	9.3%

Histogram of lengths of the category

Value	Count	Frequency (%)
man	537	60.3%
woman	271	30.4%
child	83	9.3%

Most occurring characters

Value	Count	Frequency (%)
m	808	23.9%
a	808	23.9%
n	808	23.9%
w	271	8.0%
o	271	8.0%
c	83	2.5%
h	83	2.5%
i	83	2.5%
l	83	2.5%
d	83	2.5%

Most occurring categories

Value	Count	Frequency (%)
Lowercase Letter	3381	100.0%

Most frequent character per category

Value	Count	Frequency (%)
m	808	23.9%
a	808	23.9%
n	808	23.9%
w	271	8.0%
o	271	8.0%
c	83	2.5%
h	83	2.5%
i	83	2.5%
l	83	2.5%
d	83	2.5%

Most occurring scripts

Value	Count	Frequency (%)
Latin	3381	100.0%

Most frequent character per script

Value	Count	Frequency (%)
m	808	23.9%
a	808	23.9%
n	808	23.9%
w	271	8.0%
o	271	8.0%
c	83	2.5%
h	83	2.5%
i	83	2.5%
l	83	2.5%
d	83	2.5%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	3381	100.0%

Most frequent character per block

Value	Count	Frequency (%)
m	808	23.9%
a	808	23.9%
n	808	23.9%
w	271	8.0%
o	271	8.0%
c	83	2.5%
h	83	2.5%
i	83	2.5%
l	83	2.5%
d	83	2.5%

adult_male
Boolean

HIGH CORRELATION

Distinct	2
Distinct (%)	0.2%
Missing	0
Missing (%)	0.0%
Memory size	1019.0 B

True	537
False	354

Common Values
Chart

Value	Count	Frequency (%)
True	537	60.3%
False	354	39.7%

deck
Categorical

MISSING

Distinct	7
Distinct (%)	3.4%
Missing	688
Missing (%)	77.2%
Memory size	7.1 KiB

C	59
B	47
D	33
E	32
A	15
Other values (2)	17

Length

Max length	1
Median length	1
Mean length	1
Min length	1

Characters and Unicode

Total characters	203
Distinct characters	7
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	C
2nd row	C
3rd row	E
4th row	G
5th row	C

Value	Count	Frequency (%)
C	59	6.6%
B	47	5.3%
D	33	3.7%
E	32	3.6%
A	15	1.7%
F	13	1.5%
G	4	0.4%
(Missing)	688	77.2%

Histogram of lengths of the category

Value	Count	Frequency (%)
c	59	29.1%
b	47	23.2%
d	33	16.3%
e	32	15.8%
a	15	7.4%
f	13	6.4%
g	4	2.0%

Most occurring characters

Value	Count	Frequency (%)
C	59	29.1%
B	47	23.2%
D	33	16.3%
E	32	15.8%
A	15	7.4%
F	13	6.4%
G	4	2.0%

Most occurring categories

Value	Count	Frequency (%)
Uppercase Letter	203	100.0%

Most frequent character per category

Value	Count	Frequency (%)
C	59	29.1%
B	47	23.2%
D	33	16.3%
E	32	15.8%
A	15	7.4%
F	13	6.4%
G	4	2.0%

Most occurring scripts

Value	Count	Frequency (%)
Latin	203	100.0%

Most frequent character per script

Value	Count	Frequency (%)
C	59	29.1%
B	47	23.2%
D	33	16.3%
E	32	15.8%
A	15	7.4%
F	13	6.4%
G	4	2.0%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	203	100.0%

Most frequent character per block

Value	Count	Frequency (%)
C	59	29.1%
B	47	23.2%
D	33	16.3%
E	32	15.8%
A	15	7.4%
F	13	6.4%
G	4	2.0%

embark_town
Categorical

HIGH CORRELATION

Distinct	3
Distinct (%)	0.3%
Missing	2
Missing (%)	0.2%
Memory size	7.1 KiB

Southampton	644
Cherbourg	168
Queenstown	77

Length

Max length	11
Median length	11
Mean length	10.53543307
Min length	9

Characters and Unicode

Total characters	9366
Distinct characters	17
Distinct categories	2 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	Southampton
2nd row	Cherbourg
3rd row	Southampton
4th row	Southampton
5th row	Southampton

Value	Count	Frequency (%)
Southampton	644	72.3%
Cherbourg	168	18.9%
Queenstown	77	8.6%
(Missing)	2	0.2%

Histogram of lengths of the category

Value	Count	Frequency (%)
southampton	644	72.4%
cherbourg	168	18.9%
queenstown	77	8.7%

Most occurring characters

Value	Count	Frequency (%)
o	1533	16.4%
t	1365	14.6%
u	889	9.5%
h	812	8.7%
n	798	8.5%
S	644	6.9%
a	644	6.9%
m	644	6.9%
p	644	6.9%
r	336	3.6%
Other values (7)	1057	11.3%

Most occurring categories

Value	Count	Frequency (%)
Lowercase Letter	8477	90.5%
Uppercase Letter	889	9.5%

Most frequent character per category

Value	Count	Frequency (%)
o	1533	18.1%
t	1365	16.1%
u	889	10.5%
h	812	9.6%
n	798	9.4%
a	644	7.6%
m	644	7.6%
p	644	7.6%
r	336	4.0%
e	322	3.8%
Other values (4)	490	5.8%

Value	Count	Frequency (%)
S	644	72.4%
C	168	18.9%
Q	77	8.7%

Most occurring scripts

Value	Count	Frequency (%)
Latin	9366	100.0%

Most frequent character per script

Value	Count	Frequency (%)
o	1533	16.4%
t	1365	14.6%
u	889	9.5%
h	812	8.7%
n	798	8.5%
S	644	6.9%
a	644	6.9%
m	644	6.9%
p	644	6.9%
r	336	3.6%
Other values (7)	1057	11.3%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	9366	100.0%

Most frequent character per block

Value	Count	Frequency (%)
o	1533	16.4%
t	1365	14.6%
u	889	9.5%
h	812	8.7%
n	798	8.5%
S	644	6.9%
a	644	6.9%
m	644	6.9%
p	644	6.9%
r	336	3.6%
Other values (7)	1057	11.3%

alive
Boolean

HIGH CORRELATION

Distinct	2
Distinct (%)	0.2%
Missing	0
Missing (%)	0.0%
Memory size	1019.0 B

False	549
True	342

Common Values
Chart

Value	Count	Frequency (%)
False	549	61.6%
True	342	38.4%

alone
Boolean

Distinct	2
Distinct (%)	0.2%
Missing	0
Missing (%)	0.0%
Memory size	1019.0 B

True	537
False	354

Common Values
Chart

Value	Count	Frequency (%)
True	537	60.3%
False	354	39.7%

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

First rows

	df_index	survived	pclass	sex	age	sibsp	parch	fare	embarked	class	who	adult_male	deck	embark_town	alive	alone
0	0	0	3	male	22.0	1	0	7.2500	S	Third	man	True	NaN	Southampton	no	False
1	1	1	1	female	38.0	1	0	71.2833	C	First	woman	False	C	Cherbourg	yes	False
2	2	1	3	female	26.0	0	0	7.9250	S	Third	woman	False	NaN	Southampton	yes	True
3	3	1	1	female	35.0	1	0	53.1000	S	First	woman	False	C	Southampton	yes	False
4	4	0	3	male	35.0	0	0	8.0500	S	Third	man	True	NaN	Southampton	no	True
5	5	0	3	male	NaN	0	0	8.4583	Q	Third	man	True	NaN	Queenstown	no	True
6	6	0	1	male	54.0	0	0	51.8625	S	First	man	True	E	Southampton	no	True
7	7	0	3	male	2.0	3	1	21.0750	S	Third	child	False	NaN	Southampton	no	False
8	8	1	3	female	27.0	0	2	11.1333	S	Third	woman	False	NaN	Southampton	yes	False
9	9	1	2	female	14.0	1	0	30.0708	C	Second	child	False	NaN	Cherbourg	yes	False

Last rows

	df_index	survived	pclass	sex	age	sibsp	parch	fare	embarked	class	who	adult_male	deck	embark_town	alive	alone
881	881	0	3	male	33.0	0	0	7.8958	S	Third	man	True	NaN	Southampton	no	True
882	882	0	3	female	22.0	0	0	10.5167	S	Third	woman	False	NaN	Southampton	no	True
883	883	0	2	male	28.0	0	0	10.5000	S	Second	man	True	NaN	Southampton	no	True
884	884	0	3	male	25.0	0	0	7.0500	S	Third	man	True	NaN	Southampton	no	True
885	885	0	3	female	39.0	0	5	29.1250	Q	Third	woman	False	NaN	Queenstown	no	False
886	886	0	2	male	27.0	0	0	13.0000	S	Second	man	True	NaN	Southampton	no	True
887	887	1	1	female	19.0	0	0	30.0000	S	First	woman	False	B	Southampton	yes	True
888	888	0	3	female	NaN	1	2	23.4500	S	Third	woman	False	NaN	Southampton	no	False
889	889	1	1	male	26.0	0	0	30.0000	C	First	man	True	C	Cherbourg	yes	True
890	890	0	3	male	32.0	0	0	7.7500	Q	Third	man	True	NaN	Queenstown	no	True

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Most occurring scripts

Most frequent character per script

Most occurring blocks

Most frequent character per block

Most occurring characters

Most occurring categories

Most frequent character per category

Most occurring scripts

Most frequent character per script

Most occurring blocks

Most frequent character per block

Most occurring characters

Most occurring categories

Most frequent character per category

Most occurring scripts

Most frequent character per script

Most occurring blocks

Most frequent character per block

Most occurring characters

Most occurring categories

Most frequent character per category

Most occurring scripts

Most frequent character per script

Most occurring blocks

Most frequent character per block

Most occurring characters

Most occurring categories

Most frequent character per category

Most occurring scripts

Most frequent character per script

Most occurring blocks

Most frequent character per block

Most occurring characters

Most occurring categories

Most frequent character per category

Most occurring scripts

Most frequent character per script

Most occurring blocks

Most frequent character per block

Most occurring characters

Most occurring categories

Most frequent character per category

Most occurring scripts

Most frequent character per script

Most occurring blocks

Most frequent character per block

Most occurring characters

Most occurring categories

Most frequent character per category

Most occurring scripts

Most frequent character per script

Most occurring blocks

Most frequent character per block

Interactions

Correlations

Pearson's r

Spearman's ρ

Kendall's τ

Phik (φk)

Cramér's V (φc)

Missing values

Sample

First rows

Last rows