机器学习 - Pandas 练习, 常见功能查阅

news/2025/2/25 5:29:35

机器学习记录

Pandas

pandas__3">安装 pandas 库:

conda install pandas

数据

git clone https://github.com/KeithGalli/pandas.git

练习

python">import pandas as pd
python">data_dir = "/data_dir"
python">df = pd.read_csv(f'{data_dir}/pandas/pokemon_data.csv')
df.shape
(800, 12)
python">df.head()
#NameType 1Type 2HPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
01BulbasaurGrassPoison4549496565451False
12IvysaurGrassPoison6062638080601False
23VenusaurGrassPoison808283100100801False
33VenusaurMega VenusaurGrassPoison80100123122120801False
44CharmanderFireNaN3952436050651False
python">df = pd.read_excel(f'{data_dir}/pandas/pokemon_data.xlsx')
df.shape
(800, 12)
python">df.head()
#NameType 1Type 2HPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
01BulbasaurGrassPoison4549496565451False
12IvysaurGrassPoison6062638080601False
23VenusaurGrassPoison808283100100801False
33VenusaurMega VenusaurGrassPoison80100123122120801False
44CharmanderFireNaN3952436050651False
python">def read():
    global df
    df = pd.read_csv(f'{data_dir}/pandas/pokemon_data.csv')
read()
df.columns
Index(['#', 'Name', 'Type 1', 'Type 2', 'HP', 'Attack', 'Defense', 'Sp. Atk',
       'Sp. Def', 'Speed', 'Generation', 'Legendary'],
      dtype='object')
python">df.Name
0                  Bulbasaur
1                    Ivysaur
2                   Venusaur
3      VenusaurMega Venusaur
4                 Charmander
               ...          
795                  Diancie
796      DiancieMega Diancie
797      HoopaHoopa Confined
798       HoopaHoopa Unbound
799                Volcanion
Name: Name, Length: 800, dtype: object
python">df['Name']
0                  Bulbasaur
1                    Ivysaur
2                   Venusaur
3      VenusaurMega Venusaur
4                 Charmander
               ...          
795                  Diancie
796      DiancieMega Diancie
797      HoopaHoopa Confined
798       HoopaHoopa Unbound
799                Volcanion
Name: Name, Length: 800, dtype: object
python">df[['Name', 'HP', 'Speed']]
NameHPSpeed
0Bulbasaur4545
1Ivysaur6060
2Venusaur8080
3VenusaurMega Venusaur8080
4Charmander3965
............
795Diancie5050
796DiancieMega Diancie50110
797HoopaHoopa Confined8070
798HoopaHoopa Unbound8080
799Volcanion8070

800 rows × 3 columns

python"># 第 [n] 行的数据
df.iloc[1]
#                   2
Name          Ivysaur
Type 1          Grass
Type 2         Poison
HP                 60
Attack             62
Defense            63
Sp. Atk            80
Sp. Def            80
Speed              60
Generation          1
Legendary       False
Name: 1, dtype: object
python"># 第 [n, m) 行的数据
df.iloc[0:4]
#NameType 1Type 2HPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
01BulbasaurGrassPoison4549496565451False
12IvysaurGrassPoison6062638080601False
23VenusaurGrassPoison808283100100801False
33VenusaurMega VenusaurGrassPoison80100123122120801False
python"># 第 [n] 行, [m] 列的数据
df.iloc[2, 2]
'Grass'
python"># 行遍历
# for index, row in df.iterrows():
# #     print(index, row)
#     print(index, row['Name'])
0 Bulbasaur
1 Ivysaur
2 Venusaur
3 VenusaurMega Venusaur
4 Charmander
5 Charmeleon
...
796 DiancieMega Diancie
797 HoopaHoopa Confined
798 HoopaHoopa Unbound
799 Volcanion
python"># 根据字段过滤数据
df.loc[df['Type 1'] == "Grass"]
#NameType 1Type 2HPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
01BulbasaurGrassPoison4549496565451False
12IvysaurGrassPoison6062638080601False
23VenusaurGrassPoison808283100100801False
33VenusaurMega VenusaurGrassPoison80100123122120801False
4843OddishGrassPoison4550557565301False
.......................................
718650ChespinGrassNaN5661654845386False
719651QuilladinGrassNaN6178955658576False
720652ChesnaughtGrassFighting881071227475646False
740672SkiddoGrassNaN6665486257526False
741673GogoatGrassNaN123100629781686False

70 rows × 12 columns

python"># 统计, 只对"数值"类型统计
# count:非空值的数量。
# mean:平均值。
# std:标准差。
# min:最小值。
# 25%:下四分位数。
# 50%:中位数(下四分位数和上四分位数的平均值)。
# 75%:上四分位数。
# max:最大值。
df.describe()
#HPAttackDefenseSp. AtkSp. DefSpeedGeneration
count800.000000800.000000800.000000800.000000800.000000800.000000800.000000800.00000
mean362.81375069.25875079.00125073.84250072.82000071.90250068.2775003.32375
std208.34379825.53466932.45736631.18350132.72229427.82891629.0604741.66129
min1.0000001.0000005.0000005.00000010.00000020.0000005.0000001.00000
25%184.75000050.00000055.00000050.00000049.75000050.00000045.0000002.00000
50%364.50000065.00000075.00000070.00000065.00000070.00000065.0000003.00000
75%539.25000080.000000100.00000090.00000095.00000090.00000090.0000005.00000
max721.000000255.000000190.000000230.000000194.000000230.000000180.0000006.00000
python"># 排序
df.sort_values('Name')
#NameType 1Type 2HPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
510460AbomasnowGrassIce9092759285604False
511460AbomasnowMega AbomasnowGrassIce90132105132105304False
6863AbraPsychicNaN25201510555901False
392359AbsolDarkNaN65130607560753False
393359AbsolMega AbsolDarkNaN6515060115601153False
.......................................
632571ZoroarkDarkNaN6010560120601055False
631570ZoruaDarkNaN4065408040655False
4641ZubatPoisonFlying4045353040551False
695634ZweilousDarkDragon7285706570585False
794718Zygarde50% FormeDragonGround1081001218195956True

800 rows × 12 columns

python"># 排序: 倒序
df.sort_values('Name', ascending=False)
#NameType 1Type 2HPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
794718Zygarde50% FormeDragonGround1081001218195956True
695634ZweilousDarkDragon7285706570585False
4641ZubatPoisonFlying4045353040551False
631570ZoruaDarkNaN4065408040655False
632571ZoroarkDarkNaN6010560120601055False
.......................................
393359AbsolMega AbsolDarkNaN6515060115601153False
392359AbsolDarkNaN65130607560753False
6863AbraPsychicNaN25201510555901False
511460AbomasnowMega AbomasnowGrassIce90132105132105304False
510460AbomasnowGrassIce9092759285604False

800 rows × 12 columns

python"># 排序: 多字段排序
df.sort_values(['Type 1', 'HP'])
#NameType 1Type 2HPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
316292ShedinjaBugGhost190453030403False
230213ShuckleBugRock20102301023052False
462415CombeeBugFlying3030423042704False
603543VenipedeBugPoison3045593039575False
314290NincadaBugGround3145903030403False
.......................................
142131LaprasWaterIce13085808595601False
145134VaporeonWaterNaN130656011095651False
350320WailmerWaterNaN13070357035603False
655594AlomomolaWaterNaN16575804045655False
351321WailordWaterNaN17090459045603False

800 rows × 12 columns

python"># 排序: 多字段排序, 指定每个字段的排序顺序
df.sort_values(['Type 1', 'HP'], ascending=[0, 1])
#NameType 1Type 2HPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
139129MagikarpWaterNaN2010551520801False
381349FeebasWaterNaN2015201055803False
9790ShellderWaterNaN30651004525401False
10698KrabbyWaterNaN30105902525501False
125116HorseaWaterNaN3040707025601False
.......................................
232214HeracrossMega HeracrossBugFighting8018511540105752False
678617AccelgorBugNaN807040100601455False
734666VivillonBugFlying8052509050896False
698637VolcaronaBugFire8560651351051005False
520469YanmegaBugFlying86768611656954False

800 rows × 12 columns

python">df['Total'] = df['HP'] + df['Attack'] + df['Defense'] + df['Sp. Atk'] + df['Sp. Def'] + df['Speed']
df
#NameType 1Type 2HPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendaryTotal
01BulbasaurGrassPoison4549496565451False318
12IvysaurGrassPoison6062638080601False405
23VenusaurGrassPoison808283100100801False525
33VenusaurMega VenusaurGrassPoison80100123122120801False625
44CharmanderFireNaN3952436050651False309
..........................................
795719DiancieRockFairy50100150100150506True600
796719DiancieMega DiancieRockFairy501601101601101106True700
797720HoopaHoopa ConfinedPsychicGhost8011060150130706True600
798720HoopaHoopa UnboundPsychicDark8016060170130806True680
799721VolcanionFireWater8011012013090706True600

800 rows × 13 columns

python">df['Total'] = df['HP'] + df['Attack'] + df['Defense'] + df['Sp. Atk'] + df['Sp. Def'] + df['Speed']
# df.drop 不会修改 df, 需要赋值. 如果删除的 列 不存在, 则会报错
df = df.drop(columns=['Total'])

df.head()
#NameType 1Type 2HPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
01BulbasaurGrassPoison4549496565451False
12IvysaurGrassPoison6062638080601False
23VenusaurGrassPoison808283100100801False
33VenusaurMega VenusaurGrassPoison80100123122120801False
44CharmanderFireNaN3952436050651False
python"># 当 axis=0 时,对每一列进行求和,返回一行结果。
# 当 axis=1 时,对每一行进行求和,返回一列结果。
df['Total'] = df.iloc[:, 4:10].sum(axis=1)

df.head()
#NameType 1Type 2HPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendaryTotal
01BulbasaurGrassPoison4549496565451False318
12IvysaurGrassPoison6062638080601False405
23VenusaurGrassPoison808283100100801False525
33VenusaurMega VenusaurGrassPoison80100123122120801False625
44CharmanderFireNaN3952436050651False309
python">read()

df['Total'] = df.iloc[:, 4:10].sum(axis=1)

# 输出指定列
df = df[['Total', 'HP', 'Defense']]

df.head(5)
TotalHPDefense
03184549
14056063
25258083
362580123
43093943
python">read()

df['Total'] = df.iloc[:, 4:10].sum(axis=1)

cols = list(df.columns)
print(cols)

# 输出指定列
df = df[cols[0:4] + [cols[-1]] + cols[4:-1]]

df.head(5)
['#', 'Name', 'Type 1', 'Type 2', 'HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed', 'Generation', 'Legendary', 'Total']
#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
01BulbasaurGrassPoison3184549496565451False
12IvysaurGrassPoison4056062638080601False
23VenusaurGrassPoison525808283100100801False
33VenusaurMega VenusaurGrassPoison62580100123122120801False
44CharmanderFireNaN3093952436050651False
python"># 写文件
# df.to_csv('/tmp/modified.csv', index=False)
# df.to_excel('/tmp/modified.xlsx', index=False)
python"># 多个字段过滤
df.loc[(df['Type 1'] == 'Grass') & (df['Type 2'] == 'Poison')]
#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
01BulbasaurGrassPoison3184549496565451False
12IvysaurGrassPoison4056062638080601False
23VenusaurGrassPoison525808283100100801False
33VenusaurMega VenusaurGrassPoison62580100123122120801False
4843OddishGrassPoison3204550557565301False
4944GloomGrassPoison3956065708575401False
5045VileplumeGrassPoison49075808511090501False
7569BellsproutGrassPoison3005075357030401False
7670WeepinbellGrassPoison3906590508545551False
7771VictreebelGrassPoison490801056510070701False
344315RoseliaGrassPoison40050604510080653False
451406BudewGrassPoison2804030355070554False
452407RoseradeGrassPoison515607065125105904False
651590FoongusGrassPoison2946955455555155False
652591AmoongussGrassPoison46411485708580305False
python"># 多个字段过滤
df.loc[(df['Type 1'] == 'Grass') | (df['Type 2'] == 'Poison')]
#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
01BulbasaurGrassPoison3184549496565451False
12IvysaurGrassPoison4056062638080601False
23VenusaurGrassPoison525808283100100801False
33VenusaurMega VenusaurGrassPoison62580100123122120801False
1613WeedleBugPoison1954035302020501False
..........................................
718650ChespinGrassNaN3135661654845386False
719651QuilladinGrassNaN4056178955658576False
720652ChesnaughtGrassFighting530881071227475646False
740672SkiddoGrassNaN3506665486257526False
741673GogoatGrassNaN531123100629781686False

89 rows × 13 columns

python">new_df = df.loc[(df['Type 1'] == 'Grass') & (df['Type 2'] == 'Poison') & (df['HP'] > 70)]

# new_df.to_csv('/tmp/filtered.csv', index=False)

# "index" 是 df 的 index. 会重新为 new_df 生成index
new_df = new_df.reset_index()

new_df
index#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
023VenusaurGrassPoison525808283100100801False
133VenusaurMega VenusaurGrassPoison62580100123122120801False
25045VileplumeGrassPoison49075808511090501False
37771VictreebelGrassPoison490801056510070701False
4652591AmoongussGrassPoison46411485708580305False
python">new_df = df.loc[(df['Type 1'] == 'Grass') & (df['Type 2'] == 'Poison') & (df['HP'] > 70)]

# 会重新为 new_df 生成index, 删除 df 保留下来的 index 列
new_df = new_df.reset_index(drop=True)

new_df
#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
03VenusaurGrassPoison525808283100100801False
13VenusaurMega VenusaurGrassPoison62580100123122120801False
245VileplumeGrassPoison49075808511090501False
371VictreebelGrassPoison490801056510070701False
4591AmoongussGrassPoison46411485708580305False
python">df.loc[df['Name'].str.contains('Mega')]
#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
33VenusaurMega VenusaurGrassPoison62580100123122120801False
76CharizardMega Charizard XFireDragon63478130111130851001False
86CharizardMega Charizard YFireFlying63478104781591151001False
129BlastoiseMega BlastoiseWaterNaN63079103120135115781False
1915BeedrillMega BeedrillBugPoison495651504015801451False
2318PidgeotMega PidgeotNormalFlying579838080135801211False
7165AlakazamMega AlakazamPsychicNaN590555065175951501False
8780SlowbroMega SlowbroWaterPsychic590957518013080301False
10294GengarMega GengarGhostPoison600606580170951301False
124115KangaskhanMega KangaskhanNormalNaN590105125100601001001False
137127PinsirMega PinsirBugFlying6006515512065901051False
141130GyaradosMega GyaradosWaterDark6409515510970130811False
154142AerodactylMega AerodactylRockFlying615801358570951501False
163150MewtwoMega Mewtwo XPsychicFighting7801061901001541001301True
164150MewtwoMega Mewtwo YPsychicNaN780106150701941201401True
168154MeganiumGrassNaN525808210083100802False
196181AmpharosMega AmpharosElectricDragon6109095105165110452False
224208SteelixMega SteelixSteelGround610751252305595302False
229212ScizorMega ScizorBugSteel6007015014065100752False
232214HeracrossMega HeracrossBugFighting6008018511540105752False
248229HoundoomMega HoundoomDarkFire600759090140901152False
268248TyranitarMega TyranitarRockDark70010016415095120712False
275254SceptileMega SceptileGrassDragon6307011075145851453False
279257BlazikenMega BlazikenFireFighting6308016080130801003False
283260SwampertMega SwampertWaterGround63510015011095110703False
306282GardevoirMega GardevoirPsychicFairy6186885651651351003False
327302SableyeMega SableyeDarkGhost480508512585115203False
329303MawileMega MawileSteelFairy480501051255595503False
333306AggronMega AggronSteelNaN630701402306080503False
336308MedichamMega MedichamFightingPsychic510601008580851003False
339310ManectricMega ManectricElectricNaN575707580135801353False
349319SharpedoMega SharpedoWaterDark5607014070110651053False
354323CameruptMega CameruptFireGround56070120100145105203False
366334AltariaMega AltariaDragonFairy59075110110110105803False
387354BanetteMega BanetteGhostNaN55564165759383753False
393359AbsolMega AbsolDarkNaN5656515060115601153False
397362GlalieMega GlalieIceNaN5808012080120801003False
409373SalamenceMega SalamenceDragonFlying70095145130120901203False
413376MetagrossMega MetagrossSteelPsychic700801451501051101103False
418380LatiasMega LatiasDragonPsychic700801001201401501103True
420381LatiosMega LatiosDragonPsychic700801301001601201103True
426384RayquazaMega RayquazaDragonFlying7801051801001801001153True
476428LopunnyMega LopunnyNormalFighting580651369454961354False
494445GarchompMega GarchompDragonGround70010817011512095924False
498448LucarioMega LucarioFightingSteel6257014588140701124False
511460AbomasnowMega AbomasnowGrassIce59490132105132105304False
527475GalladeMega GalladePsychicFighting6186816595651151104False
591531AudinoMega AudinoNormalFairy5451036012680126505False
796719DiancieMega DiancieRockFairy700501601101601101106True
python">import re
df.loc[df['Type 1'].str.contains('Fire|grass', regex=True, flags=re.I)]
#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
01BulbasaurGrassPoison3184549496565451False
12IvysaurGrassPoison4056062638080601False
23VenusaurGrassPoison525808283100100801False
33VenusaurMega VenusaurGrassPoison62580100123122120801False
44CharmanderFireNaN3093952436050651False
..........................................
735667LitleoFireNormal3696250587354726False
736668PyroarFireNormal507866872109661066False
740672SkiddoGrassNaN3506665486257526False
741673GogoatGrassNaN531123100629781686False
799721VolcanionFireWater6008011012013090706True

122 rows × 13 columns

python">import re
df.loc[df['Name'].str.contains('^pi[a-z]', regex=True, flags=re.I)]
#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
2016PidgeyNormalFlying2514045403535561False
2117PidgeottoNormalFlying3496360555050711False
2218PidgeotNormalFlying47983807570701011False
2318PidgeotMega PidgeotNormalFlying579838080135801211False
3025PikachuElectricNaN3203555405050901False
136127PinsirBugNaN500651251005570851False
137127PinsirMega PinsirBugFlying6006515512065901051False
186172PichuElectricNaN2052040153535602False
219204PinecoBugNaN2905065903535152False
239221PiloswineIceGround450100100806060502False
438393PiplupWaterNaN3145351536156404False
558499PigniteFireFighting4189093557055555False
578519PidoveNormalFlying2645055503630435False
python">read()
# 第一个参数满足条件的, 设置第二个参数的列为后面的值
df.loc[df['Type 1'] == 'Grass', 'Type 1'] = 'Flamer'

df
#NameType 1Type 2HPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
01BulbasaurFlamerPoison4549496565451False
12IvysaurFlamerPoison6062638080601False
23VenusaurFlamerPoison808283100100801False
33VenusaurMega VenusaurFlamerPoison80100123122120801False
44CharmanderFireNaN3952436050651False
.......................................
795719DiancieRockFairy50100150100150506True
796719DiancieMega DiancieRockFairy501601101601101106True
797720HoopaHoopa ConfinedPsychicGhost8011060150130706True
798720HoopaHoopa UnboundPsychicDark8016060170130806True
799721VolcanionFireWater8011012013090706True

800 rows × 12 columns

python">read()
# 第一个参数满足条件的, 设置第二个参数的列为后面的值
df.loc[df['Type 1'] == 'Grass', ['Type 1', 'Lendary']] = 'TEST VALUE'

df
#NameType 1Type 2HPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendaryLendary
01BulbasaurTEST VALUEPoison4549496565451FalseTEST VALUE
12IvysaurTEST VALUEPoison6062638080601FalseTEST VALUE
23VenusaurTEST VALUEPoison808283100100801FalseTEST VALUE
33VenusaurMega VenusaurTEST VALUEPoison80100123122120801FalseTEST VALUE
44CharmanderFireNaN3952436050651FalseNaN
..........................................
795719DiancieRockFairy50100150100150506TrueNaN
796719DiancieMega DiancieRockFairy501601101601101106TrueNaN
797720HoopaHoopa ConfinedPsychicGhost8011060150130706TrueNaN
798720HoopaHoopa UnboundPsychicDark8016060170130806TrueNaN
799721VolcanionFireWater8011012013090706TrueNaN

800 rows × 13 columns

python">read()
# 第一个参数满足条件的, 设置第二个参数的列为后面的值
df.loc[df['Type 1'] == 'Grass', ['Type 1', 'Lendary']] = ['TEST 1', 'TEST 2']

df
#NameType 1Type 2HPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendaryLendary
01BulbasaurTEST 1Poison4549496565451FalseTEST 2
12IvysaurTEST 1Poison6062638080601FalseTEST 2
23VenusaurTEST 1Poison808283100100801FalseTEST 2
33VenusaurMega VenusaurTEST 1Poison80100123122120801FalseTEST 2
44CharmanderFireNaN3952436050651FalseNaN
..........................................
795719DiancieRockFairy50100150100150506TrueNaN
796719DiancieMega DiancieRockFairy501601101601101106TrueNaN
797720HoopaHoopa ConfinedPsychicGhost8011060150130706TrueNaN
798720HoopaHoopa UnboundPsychicDark8016060170130806TrueNaN
799721VolcanionFireWater8011012013090706TrueNaN

800 rows × 13 columns

python">read()

# 分组 求平均数
df.groupby(['Type 1']).mean(numeric_only=True).sort_values('Defense', ascending=False)
#HPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
Type 1
Steel442.85185265.22222292.703704126.37037067.51851980.62963055.2592593.8518520.148148
Rock392.72727365.36363692.863636100.79545563.34090975.47727355.9090913.4545450.090909
Dragon474.37500083.312500112.12500086.37500096.84375088.84375083.0312503.8750000.375000
Ground356.28125073.78125095.75000084.84375056.46875062.75000063.9062503.1562500.125000
Ghost486.50000064.43750073.78125081.18750079.34375076.46875064.3437504.1875000.062500
Water303.08928672.06250074.15178672.94642974.81250070.51785765.9642862.8571430.035714
Ice423.54166772.00000072.75000071.41666777.54166776.29166763.4583333.5416670.083333
Grass344.87142967.27142973.21428670.80000077.50000070.42857161.9285713.3571430.042857
Bug334.49275456.88405870.97101470.72463853.86956564.79710161.6811593.2173910.000000
Dark461.35483966.80645288.38709770.22580674.64516169.51612976.1612904.0322580.064516
Poison251.78571467.25000074.67857168.82142960.42857164.39285763.5714292.5357140.000000
Fire327.40384669.90384684.76923167.76923188.98076972.21153874.4423083.2115380.096154
Psychic380.80701870.63157971.45614067.68421198.40350986.28070281.4912283.3859650.245614
Electric363.50000059.79545569.09090966.29545590.02272773.70454584.5000003.2727270.090909
Flying677.75000070.75000078.75000066.25000094.25000072.500000102.5000005.5000000.500000
Fighting363.85185269.85185296.77777865.92592653.11111164.70370466.0740743.3703700.000000
Fairy449.52941274.11764761.52941265.70588278.52941284.70588248.5882354.1176470.058824
Normal319.17346977.27551073.46938859.84693955.81632763.72449071.5510203.0510200.020408
python">read()

# 分组 求和
df.groupby(['Type 1']).sum(numeric_only=True).sort_values('HP', ascending=False)
#HPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
Type 1
Water339468071830581708379789873883204
Normal312797573720058655470624570122992
Grass241414709512549565425493043352353
Psychic2170640264073385856094918464519314
Bug230803925489748803717447142562220
Fire170253635440835244627375538711675
Rock172802876408644352787332124601524
Dragon1518026663588276430992843265712412
Electric159942631304029173961324337181444
Ground114012361306427151807200820451014
Dark143022071274021772314215523611252
Ghost155682062236125982539244720591342
Fighting9824188626131780143417471784910
Poison7050188320911927169218031780710
Steel119571761250334121823217714921044
Ice10165172817461714186118311523852
Fairy764212601046111713351440826701
Flying2711283315265377290410222
python">read()

# 分组 求数量
df.groupby(['Type 1']).count().sort_values('HP', ascending=False)
#NameType 2HPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
Type 1
Water11211253112112112112112112112112
Normal9898379898989898989898
Grass7070377070707070707070
Bug6969526969696969696969
Psychic5757195757575757575757
Fire5252245252525252525252
Electric4444174444444444444444
Rock4444354444444444444444
Ghost3232223232323232323232
Ground3232193232323232323232
Dragon3232213232323232323232
Dark3131213131313131313131
Poison2828132828282828282828
Fighting272772727272727272727
Steel2727222727272727272727
Ice2424112424242424242424
Fairy171721717171717171717
Flying44244444444
python">read()

# 每行都设置 count = 1
df['count'] = 1

# 分组 求数量
df.groupby(['Type 1']).count()['count']
Type 1
Bug          69
Dark         31
Dragon       32
Electric     44
Fairy        17
Fighting     27
Fire         52
Flying        4
Ghost        32
Grass        70
Ground       32
Ice          24
Normal       98
Poison       28
Psychic      57
Rock         44
Steel        27
Water       112
Name: count, dtype: int64
python">read()

# 每行都设置 count = 1
df['count'] = 1

# 分组 求数量
df.groupby(['Type 1', 'Type 2']).count()['count']
Type 1  Type 2  
Bug     Electric     2
        Fighting     2
        Fire         2
        Flying      14
        Ghost        1
                    ..
Water   Ice          3
        Poison       3
        Psychic      5
        Rock         4
        Steel        1
Name: count, Length: 136, dtype: int64
python"># 读取大文件
# 分批读取数据, chunksize 行数
for df in pd.read_csv(f'{data_dir}/pandas/pokemon_data.csv', chunksize=5):
#     print('CHUNK DF:::')
#     print(df['Name'])
    ;

python"># 读取大文件
# 分批读取数据, chunksize 行数

new_df = pd.DataFrame()
for df in pd.read_csv(f'{data_dir}/pandas/pokemon_data.csv', chunksize=5):
    new_df = pd.concat([new_df, df])
#     results = df.groupby(['Type 1']).count()
#     new_df = pd.concat([new_df, results])

new_df

#NameType 1Type 2HPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
01BulbasaurGrassPoison4549496565451False
12IvysaurGrassPoison6062638080601False
23VenusaurGrassPoison808283100100801False
33VenusaurMega VenusaurGrassPoison80100123122120801False
44CharmanderFireNaN3952436050651False
.......................................
795719DiancieRockFairy50100150100150506True
796719DiancieMega DiancieRockFairy501601101601101106True
797720HoopaHoopa ConfinedPsychicGhost8011060150130706True
798720HoopaHoopa UnboundPsychicDark8016060170130806True
799721VolcanionFireWater8011012013090706True

800 rows × 12 columns


http://www.niftyadmin.cn/n/260385.html

相关文章

XM 叠螺式污泥脱水机怎么样设备优势

叠螺式污泥脱水机怎么样设备优势 叠螺式污泥脱水机优势:节能减排,节能增效,是我国提倡经济发展保护环境的一种趋势,叠螺式污泥脱水机对于污水处理,对于减少废水排放,节能降耗,具有非常显著的功能…

Unity音量滑块沿弧形移动

一、音量滑块的移动 1、滑块在滑动的时候,其运动轨迹沿着大圆的弧边展开 2、滑块不能无限滑动,而是两端各有一个挡块,移动到挡块位置,则不能往下移动,但可以折回 3、鼠标悬停滑块时,给出音量值和操作提示 …

[Rust] 嵌入式 riscv64 Rust 开发环境搭建

Rust 开发环境配置 首先安装 Rust 版本管理器 rustup 和 Rust 包管理器 cargo,这里我们用官方的安装脚本来安装: curl https://sh.rustup.rs -sSf | sh如果通过官方的脚本下载失败了,可以在浏览器的地址栏中输入 https://sh.rustup.rs 来下…

element+vue小技巧和报错解决(持续更新)

目录 1-关于el-table复选框中表头和内容不对齐问题 2-日期选择器传值给后端格式不对 3-获取表格的当前行数据#default"{row}" 1-关于el-table复选框中表头和内容不对齐问题 <el-table:data"tableData"stripestyle"width: 100%"tooltip-ef…

SpringCloudAlibaba服务熔断、限流——Sentinel

Sentinel 本专栏学习内容来自尚硅谷周阳老师的视频 有兴趣的小伙伴可以点击视频地址观看 简介 Sentinel是Alibaba公司推出的一个熔断与限流工具&#xff0c;相当于我们之前学习的Hystrix&#xff0c;可以解决服务使用中的各种问题&#xff0c;例如&#xff1a;服务雪崩、服务降…

中国制造再击败一家海外企业,彻底取得垄断地位

中国制造已在13个行业取得领先优势&#xff0c;凸显出中国制造的快速崛起&#xff0c;日前中国制造又在一个行业彻底击败海外同行&#xff0c;再次证明了中国制造的实力。 一、海外企业承认失败 提前LGD宣布它位于广州的8.5代液晶面板生产线停产&#xff0c;预计该项目将出售给…

蒸发器前氟离子超标的解决方法

深度除氟工艺 1、活性氧化铝&#xff1a;需PH调整至酸性 2、碳基/羟基磷灰石&#xff1a;再生次数有限制 3、反渗透膜&#xff1a;造价成本高 4、特种除氟树脂&#xff1a;预处理需做好 氟化物选择吸附树脂 Tulsimer CH-87 是一款去除水溶液中氟离子的专用的凝胶型选择性…

TOMCAT NGINX 环境的搭建脚本

##Nginx安装 systemctl stop firewalld systemctl disable firewalld setenforce 0 yum -y install pcre-devel zlib-devel gcc gcc-c make useradd -M -s /sbin/nologin nginx cd /opt for i in ls * do if [[ $i *.tar.gz ]] then tar xf $i fi done cd /opt tar xf nginx-…