锐单电子商城 , 一站式电子元器件采购平台!
  • 电话:400-990-0325

Datawhale数据分析第一章第一节:数据载入及初步观察

时间:2023-06-25 19:07:00 bradley端子块定时继电器

复习:本课程的主要目的是通过实际数据了解数据分析的过程,熟悉数据分析python基本操作。了解课程目的后,我们将正式开始数据分析的实践教学,完成kaggle上泰坦尼克的任务,实战数据分析的全过程。
这里有两份资料:
教材《Python for Data Analysis》和 baidu.com & google.com(善用搜索引擎)

1 第一章:数据载入及初步观察

1.1 载入数据

数据集下载 https://www.kaggle.com/c/titanic/overview

1.1.1 任务一:导入numpy和pandas

import numpy as np import pandas as pd 

提示如果加载失败,学习如何在你身上python环境下安装numpy和pandas这两个库

1.1.2 任务二:载入数据

(1) 使用相对路径载入数据
(2) 使用绝对路径载入数据

import os  os.getcwd() 
'/Users/liubaoyun/Desktop/Datawhale 数据分析/Datawhale数据分析/第一单元项目集合 
abs_path = os.path.abspath('train.csv') abs_path 
'/Users/liubaoyun/Desktop/Datawhale 数据分析/Datawhale数据分析/第一单元项目集合/train.csv' 
abs_train = pd.read_csv(abs_path) 
rel_train = pd.read_csv('train.csv') 
train = abs_train 

提示当相对路径载入错误时,尝试使用os.getcwd()查看当前工作目录。
在知道了数据加载的方法后,试试pd.read_csv()和pd.read_table()不同的是,如果你想让他们有同样的效果,你需要做什么?了解一下.tsv’和’.csv如何加载这两个数据集?
摘要加载的数据是所有工作的第一步,我们的工作将接触到不同的数据格式(eg:.csv;.tsv;.xlsx),然而,加载的方法和想法是相同的。在今后的工作和项目过程中,如果你遇到以前没有遇到的问题,你应该检查更多的信息并使用它们吗?googel,了解业务逻辑,了解什么是输入和输出。

1.1.3 任务三:每1000行为一个数据模块,逐块读取

#在尝试大文件之前,可以pandans调整显示设置 pd.options.display.max_rows=10 
train 
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 373450 8.0500 NaN S
... ... ... ... ... ... ... ... ... ... ... ... ...
886 887 0 2 Montvila, Rev. Juozas male 27.0 0 0 211536 13.0000 NaN S
887 888 1 1 Graham, Miss. Margaret Edith female 19.0 0 0 112053 30.0000 B42 S
888 889 0 3 Johnston, Miss. Catherine Helen "Carrie" female NaN 1 2 W./C. 6607 23.4500 NaN S
889 890 1 1 Behr, Mr. Karl Howell male 26.0 0 0 111369 30.0000 C148 C
890 891 0 3 Dooley, Mr. Patrick male 32.0 0 0 370376 7.7500 NaN Q

891 rows × 12 columns

chunker = pd.read_csv('train.csv',chunksize=100)
chunker

for piece in chunker:
    print('chunk_train')
    print('\n')
    print(piece)
chunk_train


    PassengerId  Survived  Pclass  \
0             1         0       3   
1             2         1       1   
2             3         1       3   
3             4         1       1   
4             5         0       3   
..          ...       ...     ...   
95           96         0       3   
96           97         0       1   
97           98         1       1   
98           99         1       2   
99          100         0       2   

                                                 Name     Sex   Age  SibSp  \
0                             Braund, Mr. Owen Harris    male  22.0      1   
1   Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                              Heikkinen, Miss. Laina  female  26.0      0   
3        Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                            Allen, Mr. William Henry    male  35.0      0   
..                                                ...     ...   ...    ...   
95                        Shorney, Mr. Charles Joseph    male   NaN      0   
96                          Goldschmidt, Mr. George B    male  71.0      0   
97                    Greenfield, Mr. William Bertram    male  23.0      0   
98               Doling, Mrs. John T (Ada Julia Bone)  female  34.0      0   
99                                  Kantor, Mr. Sinai    male  34.0      1   

    Parch            Ticket     Fare    Cabin Embarked  
0       0         A/5 21171   7.2500      NaN        S  
1       0          PC 17599  71.2833      C85        C  
2       0  STON/O2. 3101282   7.9250      NaN        S  
3       0            113803  53.1000     C123        S  
4       0            373450   8.0500      NaN        S  
..    ...               ...      ...      ...      ...  
95      0            374910   8.0500      NaN        S  
96      0          PC 17754  34.6542       A5        C  
97      1          PC 17759  63.3583  D10 D12        C  
98      1            231919  23.0000      NaN        S  
99      0            244367  26.0000      NaN        S  

[100 rows x 12 columns]
chunk_train


     PassengerId  Survived  Pclass                                    Name  \
100          101         0       3                 Petranec, Miss. Matilda   
101          102         0       3        Petroff, Mr. Pastcho ("Pentcho")   
102          103         0       1               White, Mr. Richard Frasar   
103          104         0       3              Johansson, Mr. Gustaf Joel   
104          105         0       3          Gustafsson, Mr. Anders Vilhelm   
..           ...       ...     ...                                     ...   
195          196         1       1                    Lurette, Miss. Elise   
196          197         0       3                     Mernagh, Mr. Robert   
197          198         0       3        Olsen, Mr. Karl Siegwart Andreas   
198          199         1       3        Madigan, Miss. Margaret "Maggie"   
199          200         0       2  Yrois, Miss. Henriette ("Mrs Harbeck")   

        Sex   Age  SibSp  Parch    Ticket      Fare Cabin Embarked  
100  female  28.0      0      0    349245    7.8958   NaN        S  
101    male   NaN      0      0    349215    7.8958   NaN        S  
102    male  21.0      0      1     35281   77.2875   D26        S  
103    male  33.0      0      0      7540    8.6542   NaN        S  
104    male  37.0      2      0   3101276    7.9250   NaN        S  
..      ...   ...    ...    ...       ...       ...   ...      ...  
195  female  58.0      0      0  PC 17569  146.5208   B80        C  
196    male   NaN      0      0    368703    7.7500   NaN        Q  
197    male  42.0      0      1      4579    8.4042   NaN        S  
198  female   NaN      0      0    370370    7.7500   NaN        Q  
199  female  24.0      0      0    248747   13.0000   NaN        S  

[100 rows x 12 columns]
chunk_train


     PassengerId  Survived  Pclass  \
200          201         0       3   
201          202         0       3   
202          203         0       3   
203          204         0       3   
204          205         1       3   
..           ...       ...     ...   
295          296         0       1   
296          297         0       3   
297          298         0       1   
298          299         1       1   
299          300         1       1   

                                                Name     Sex   Age  SibSp  \
200                   Vande Walle, Mr. Nestor Cyriel    male  28.0      0   
201                              Sage, Mr. Frederick    male   NaN      8   
202                       Johanson, Mr. Jakob Alfred    male  34.0      0   
203                             Youseff, Mr. Gerious    male  45.5      0   
204                         Cohen, Mr. Gurshon "Gus"    male  18.0      0   
..                                               ...     ...   ...    ...   
295                                Lewy, Mr. Ervin G    male   NaN      0   
296                               Hanna, Mr. Mansour    male  23.5      0   
297                     Allison, Miss. Helen Loraine  female   2.0      1   
298                            Saalfeld, Mr. Adolphe    male   NaN      0   
299  Baxter, Mrs. James (Helene DeLaudeniere Chaput)  female  50.0      0   

     Parch    Ticket      Fare    Cabin Embarked  
200      0    345770    9.5000      NaN        S  
201      2  CA. 2343   69.5500      NaN        S  
202      0   3101264    6.4958      NaN        S  
203      0      2628    7.2250      NaN        C  
204      0  A/5 3540    8.0500      NaN        S  
..     ...       ...       ...      ...      ...  
295      0  PC 17612   27.7208      NaN        C  
296      0      2693    7.2292      NaN        C  
297      2    113781  151.5500  C22 C26        S  
298      0     19988   30.5000     C106        S  
299      1  PC 17558  247.5208  B58 B60        C  

[100 rows x 12 columns]
chunk_train


     PassengerId  Survived  Pclass                                      Name  \
300          301         1       3  Kelly, Miss. Anna Katherine "Annie Kate"   
301          302         1       3                        McCoy, Mr. Bernard   
302          303         0       3           Johnson, Mr. William Cahoone Jr   
303          304         1       2                       Keane, Miss. Nora A   
304          305         0       3         Williams, Mr. Howard Hugh "Harry"   
..           ...       ...     ...                                       ...   
395          396         0       3                       Johansson, Mr. Erik   
396          397         0       3                       Olsson, Miss. Elina   
397          398         0       2                   McKane, Mr. Peter David   
398          399         0       2                          Pain, Dr. Alfred   
399          400         1       2          Trout, Mrs. William H (Jessie L)   

        Sex   Age  SibSp  Parch    Ticket     Fare Cabin Embarked  
300  female   NaN      0      0      9234   7.7500   NaN        Q  
301    male   NaN      2      0    367226  23.2500   NaN        Q  
302    male  19.0      0      0      LINE   0.0000   NaN        S  
303  female   NaN      0      0    226593  12.3500  E101        Q  
304    male   NaN      0      0  A/5 2466   8.0500   NaN        S  
..      ...   ...    ...    ...       ...      ...   ...      ...  
395    male  22.0      0      0    350052   7.7958   NaN        S  
396  female  31.0      0      0    350407   7.8542   NaN        S  
397    male  46.0      0      0     28403  26.0000   NaN        S  
398    male  23.0      0      0    244278  10.5000   NaN        S  
399  female  28.0      0      0    240929  12.6500   NaN        S  

[100 rows x 12 columns]
chunk_train


     PassengerId  Survived  Pclass  \
400          401         1       3   
401          402         0       3   
402          403         0       3   
403          404         0       3   
404          405         0       3   
..           ...       ...     ...   
495          496         0       3   
496          497         1       1   
497          498         0       3   
498          499         0       1   
499          500         0       3   

                                                Name     Sex   Age  SibSp  \
400                               Niskanen, Mr. Juha    male  39.0      0   
401                                  Adams, Mr. John    male  26.0      0   
402                         Jussila, Miss. Mari Aina  female  21.0      1   
403                   Hakkarainen, Mr. Pekka Pietari    male  28.0      1   
404                          Oreskovic, Miss. Marija  female  20.0      0   
..                                               ...     ...   ...    ...   
495                            Yousseff, Mr. Gerious    male   NaN      0   
496                   Eustis, Miss. Elizabeth Mussey  female  54.0      1   
497                  Shellard, Mr. Frederick William    male   NaN      0   
498  Allison, Mrs. Hudson J C (Bessie Waldo Daniels)  female  25.0      1   
499                               Svensson, Mr. Olof    male  24.0      0   

     Parch             Ticket      Fare    Cabin Embarked  
400      0  STON/O 2. 3101289    7.9250      NaN        S  
401      0             341826    8.0500      NaN        S  
402      0               4137    9.8250      NaN        S  
403      0   STON/O2. 3101279   15.8500      NaN        S  
404      0             315096    8.6625      NaN        S  
..     ...                ...       ...      ...      ...  
495      0               2627   14.4583      NaN        C  
496      0              36947   78.2667      D20        C  
497      0          C.A. 6212   15.1000      NaN        S  
498      2             113781  151.5500  C22 C26        S  
499      0             350035    7.7958      NaN        S  

[100 rows x 12 columns]
chunk_train


     PassengerId  Survived  Pclass  \
500          501         0       3   
501          502         0       3   
502          503         0       3   
503          504         0       3   
504          505         1       1   
..           ...       ...     ...   
595          596         0       3   
596          597         1       2   
597          598         0       3   
598          599         0       3   
599          600         1       1   

                                             Name     Sex   Age  SibSp  Parch  \
500                              Calic, Mr. Petar    male  17.0      0      0   
501                           Canavan, Miss. Mary  female  21.0      0      0   
502                O'Sullivan, Miss. Bridget Mary  female   NaN      0      0   
503                Laitinen, Miss. Kristina Sofia  female  37.0      0      0   
504                         Maioni, Miss. Roberta  female  16.0      0      0   
..                                            ...     ...   ...    ...    ...   
595                   Van Impe, Mr. Jean Baptiste    male  36.0      1      1   
596                    Leitch, Miss. Jessie Wills  female   NaN      0      0   
597                           Johnson, Mr. Alfred    male  49.0      0      0   
598                             Boulos, Mr. Hanna    male   NaN      0      0   
599  Duff Gordon, Sir. Cosmo Edmund ("Mr Morgan")    male  49.0      1      0   

       Ticket     Fare Cabin Embarked  
500    315086   8.6625   NaN        S  
501    364846   7.7500   NaN        Q  
502    330909   7.6292   NaN        Q  
503      4135   9.5875   NaN        S  
504    110152  86.5000   B79        S  
..        ...      ...   ...      ...  
595    345773  24.1500   NaN        S  
596    248727  33.0000   NaN        S  
597      LINE   0.0000   NaN        S  
598      2664   7.2250   NaN        C  
599  PC 17485  56.9292   A20        C  

[100 rows x 12 columns]
chunk_train


     PassengerId  Survived  Pclass  \
600          601         1       2   
601          602         0       3   
602          603         0       1   
603          604         0       3   
604          605         1       1   
..           ...       ...     ...   
695          696         0       2   
696          697         0       3   
697          698         1       3   
698          699         0       1   
699          700         0       3   

                                                  Name     Sex   Age  SibSp  \
600  Jacobsohn, Mrs. Sidney Samuel (Amy Frances Chr...  female  24.0      2   
601                               Slabenoff, Mr. Petco    male   NaN      0   
602                          Harrington, Mr. Charles H    male   NaN      0   
603                          Torber, Mr. Ernst William    male  44.0      0   
604                    Homer, Mr. Harry ("Mr E Haven")    male  35.0      0   
..                                                 ...     ...   ...    ...   
695                         Chapman, Mr. Charles Henry    male  52.0      0   
696                                   Kelly, Mr. James    male  44.0      0   
697                   Mullens, Miss. Katherine "Katie"  female   NaN      0   
698                           Thayer, Mr. John Borland    male  49.0      1   
699           Humblen, Mr. Adolf Mathias Nicolai Olsen    male  42.0      0   

     Parch  Ticket      Fare  Cabin Embarked  
600      1  243847   27.0000    NaN        S  
601      0  349214    7.8958    NaN        S  
602      0  113796   42.4000    NaN        S  
603      0  364511    8.0500    NaN        S  
604      0  111426   26.5500    NaN        C  
..     ...     ...       ...    ...      ...  
695      0  248731   13.5000    NaN        S  
696      0  363592    8.0500    NaN        S  
697      0   35852    7.7333    NaN        Q  
698      1   17421  110.8833    C68        C  
699      0  348121    7.6500  F G63        S  

[100 rows x 12 columns]
chunk_train


     PassengerId  Survived  Pclass  \
700          701         1       1   
701          702         1       1   
702          703         0       3   
703          704         0       3   
704          705         0       3   
..           ...       ...     ...   
795          796         0       2   
796          797         1       1   
797          798         1       3   
798          799         0       3   
799          800         0       3   

                                                  Name     Sex   Age  SibSp  \
700  Astor, Mrs. John Jacob (Madeleine Talmadge Force)  female  18.0      1   
701                   Silverthorne, Mr. Spencer Victor    male  35.0      0   
702                              Barbara, Miss. Saiide  female  18.0      0   
703                              Gallagher, Mr. Martin    male  25.0      0   
704                            Hansen, Mr. Henrik Juul    male  26.0      1   
..                                                 ...     ...   ...    ...   
795                                 Otter, Mr. Richard    male  39.0      0   
796                        Leader, Dr. Alice (Farnham)  female  49.0      0   
797                                   Osman, Mrs. Mara  female  31.0      0   
798                       Ibrahim Shawah, Mr. Yousseff    male  30.0      0   
799  Van Impe, Mrs. Jean Baptiste (Rosalie Paula Go...  female  30.0      1   

     Parch    Ticket      Fare    Cabin Embarked  
700      0  PC 17757  227.5250  C62 C64        C  
701      0  PC 17475   26.2875      E24        S  
702      1      2691   14.4542      NaN        C  
703      0     36864    7.7417      NaN        Q  
704      0    350025    7.8542      NaN        S  
..     ...       ...       ...      ...      ...  
795      0     28213   13.0000      NaN        S  
796      0     17465   25.9292      D17        S  
797      0    349244    8.6833      NaN        S  
798      0      2685    7.2292      NaN        C  
799      1    345773   24.1500      NaN        S  

[100 rows x 12 columns]
chunk_train


     PassengerId  Survived  Pclass  \
800          801         0       2   
801          802         1       2   
802          803         1       1   
803          804         1       3   
804          805         1       3   
..           ...       ...     ...   
886          887         0       2   
887          888         1       1   
888          889         0       3   
889          890         1       1   
890          891         0       3   

                                            Name     Sex    Age  SibSp  Parch  \
800                         Ponesell, Mr. Martin    male  34.00      0      0   
801  Collyer, Mrs. Harvey (Charlotte Annie Tate)  female  31.00      1      1   
802          Carter, Master. William Thornton II    male  11.00      1      2   
803              Thomas, Master. Assad Alexander    male   0.42      0      1   
804                      Hedman, Mr. Oskar Arvid    male  27.00      0      0   
..                                           ...     ...    ...    ...    ...   
886                        Montvila, Rev. Juozas    male  27.00      0      0   
887                 Graham, Miss. Margaret Edith  female  19.00      0      0   
888     Johnston, Miss. Catherine Helen "Carrie"  female    NaN      1      2   
889                        Behr, Mr. Karl Howell    male  26.00      0      0   
890                          Dooley, Mr. Patrick    male  32.00      0      0   

         Ticket      Fare    Cabin Embarked  
800      250647   13.0000      NaN        S  
801  C.A. 31921   26.2500      NaN        S  
802      113760  120.0000  B96 B98        S  
803        2625    8.5167      NaN        C  
804      347089    6.9750      NaN        S  
..          ...       ...      ...      ...  
886      211536   13.0000      NaN        S  
887      112053   30.0000      B42        S  
888  W./C. 6607   23.4500      NaN        S  
889      111369   30.0000     C148        C  
890      370376    7.7500      NaN        Q  

[91 rows x 12 columns]

【思考】什么是逐块读取?为什么要逐块读取呢?

【提示】大家可以chunker(数据块)是什么类型?用for循环打印出来出处具体的样子是什么?

1.1.4 任务四:将表头改成中文,索引改为乘客ID [对于某些英文资料,我们可以通过翻译来更直观的熟悉我们的数据]

PassengerId => 乘客ID
Survived => 是否幸存
Pclass => 乘客等级(1/2/3等舱位)
Name => 乘客姓名
Sex => 性别
Age => 年龄
SibSp => 堂兄弟/妹个数
Parch => 父母与小孩个数
Ticket => 船票信息
Fare => 票价
Cabin => 客舱
Embarked => 登船港口

train.head()
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
train.columns=['乘客ID','是否幸存','乘客等级(1/2/3等舱位)',
               '乘客姓名','性别','年龄','堂兄弟/妹个数',
               '父母与小孩个数','船票信息','票价','客舱','登船港口'
    ]
train
乘客ID 是否幸存 乘客等级(1/2/3等舱位) 乘客姓名 性别 年龄 堂兄弟/妹个数 父母与小孩个数 船票信息 票价 客舱 登船港口
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
... ... ... ... ... ... ... ... ... ... ... ... ...
886 887 0 2 Montvila, Rev. Juozas male 27.0 0 0 211536 13.0000 NaN S
887 888 1 1 Graham, Miss. Margaret Edith female 19.0 0 0 112053 30.0000 B42 S
888 889 0 3 Johnston, Miss. Catherine Helen "Carrie" female NaN 1 2 W./C. 6607 23.4500 NaN S
889 890 1 1 Behr, Mr. Karl Howell male 26.0 0 0 111369 30.0000 C148 C
890 891 0 3 Dooley, Mr. Patrick male 32.0 0 0 370376 7.7500 NaN Q

891 rows × 12 columns

【思考】所谓将表头改为中文其中一个思路是:将英文列名表头替换成中文。还有其他的方法吗?

1.2 初步观察

导入数据后,你可能要对数据的整体结构和样例进行概览,比如说,数据大小、有多少列,各列都是什么格式的,是否包含null等

1.2.1 任务一:查看数据的基本信息

train.info()

RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   乘客ID            891 non-null    int64  
 1   是否幸存            891 non-null    int64  
 2   乘客等级(1/2/3等舱位)  891 non-null    int64  
 3   乘客姓名            891 non-null    object 
 4   性别              891 non-null    object 
 5   年龄              714 non-null    float64
 6   堂兄弟/妹个数         891 non-null    int64  
 7   父母与小孩个数         891 non-null    int64  
 8   船票信息            891 non-null    object 
 9   票价              891 non-null    float64
 10  客舱              204 non-null    object 
 11  登船港口            889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB

【提示】有多个函数可以这样做,你可以做一下总结

1.2.2 任务二:观察表格前10行的数据和后15行的数据

#写入代码
train[:10]
乘客ID 是否幸存 乘客等级(1/2/3等舱位) 乘客姓名 性别 年龄 堂兄弟/妹个数 父母与小孩个数 船票信息 票价 客舱 登船港口
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
5 6 0 3 Moran, Mr. James male NaN 0 0 330877 8.4583 NaN Q
6 7 0 1 McCarthy, Mr. Timothy J male 54.0 0 0 17463 51.8625 E46 S
7 8 0 3 Palsson, Master. Gosta Leonard male 2.0 3 1 349909 21.0750 NaN S
8 9 1 3 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27.0 0 2 347742 11.1333 NaN S
9 10 1 2 Nasser, Mrs. Nicholas (Adele Achem) female 14.0 1 0 237736 30.0708 NaN C
train[-15:]
乘客ID 是否幸存 乘客等级(1/2/3等舱位) 乘客姓名 性别 年龄 堂兄弟/妹个数 父母与小孩个数 船票信息 票价 客舱 登船港口
876 877 0 3 Gustafsson, Mr. Alfred Ossian male 20.0 0 0 7534 9.8458 NaN S
877 878 0 3 Petroff, Mr. Nedelio male 19.0 0 0 349212 7.8958 NaN S
878 879 0 3 Laleff, Mr. Kristo male NaN 0 0 349217 7.8958 NaN S
879 880 1 1 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) female 56.0 0 1 11767 83.1583 C50 C
880 881 1 2 Shelley, Mrs. William (Imanita Parrish Hall) female 25.0 0 1 230433 26.0000 NaN S
... ... ... ... ... ... ... ... ... ... ... ... ...
886 887 0 2 Montvila, Rev. Juozas male 27.0 0 0 211536 13.0000 NaN S
887 888 1 1 Graham, Miss. Margaret Edith female 19.0 0 0 112053 30.0000 B42 S
888 889 0 3 Johnston, Miss. Catherine Helen "Carrie" female NaN 1 2 W./C. 6607 23.4500 NaN S
889 890 1 1 Behr, Mr. Karl Howell male 26.0 0 0 111369 30.0000 C148 C
890 891 0 3 Dooley, Mr. Patrick male 32.0 0 0 370376 7.7500 NaN Q

15 rows × 12 columns

1.2.4 任务三:判断数据是否为空,为空的地方返回True,其余地方返回False

train.isnull()
乘客ID 是否幸存 乘客等级(1/2/3等舱位) 乘客姓名 性别 年龄 堂兄弟/妹个数 父母与小孩个数 船票信息 票价 客舱 登船港口
0 False False False False False False False False False False True False
1 False False False False False False False False False False False False
2 False False False False False False False False False False True False
3 False False False False False False False False False False False False
4 False False False False False False False False False False True False
... ... ... ... ... ... ... ... ... ... ... ... ...
886 False False False False False False False False False False True False
887 False False False False False False False False False False False False
888 False False False False False True False False False False True False
889 False False False False False False False False False False False False
890 False False False False False False False False False False True False

891 rows × 12 columns

【总结】上面的操作都是数据分析中对于数据本身的观察

【思考】对于一个数据,还可以从哪些方面来观察?找找答案,这个将对下面的数据分析有很大的帮助

1.3 保存数据

1.3.1 任务一:将你加载并做出改变的数据,在工作目录下保存为一个新文件train_chinese.csv

#写入代码
# 注意:不同的操作系统保存下来可能会有乱码。大家可以加入`encoding='GBK' 或者 ’encoding = ’utf-8‘‘`
train.to_csv('train_chinese.csv',encoding = 'utf8')
c= pd.read_csv('train_chinese.csv')
c.head()
Unnamed: 0 乘客ID 是否幸存 乘客等级(1/2/3等舱位) 乘客姓名 性别 年龄 堂兄弟/妹个数 父母与小孩个数 船票信息 票价 客舱 登船港口
0 0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S

【总结】数据的加载以及入门,接下来就要接触数据本身的运算,我们将主要掌握numpy和pandas在工作和项目场景的运用。

锐单商城拥有海量元器件数据手册IC替代型号,打造电子元器件IC百科大全!

相关文章