Titanic

2022. 9. 19. 00:07

๐Ÿ“https://github.com/memoming/memomingChannel

 

GitHub - memoming/memomingChannel: [Youtube] ๋ฉ”๋ชจ๋ฐ ์ฑ„๋„ Official Github

[Youtube] ๋ฉ”๋ชจ๋ฐ ์ฑ„๋„ Official Github. Contribute to memoming/memomingChannel development by creating an account on GitHub.

github.com

 

โš ๏ธ ๊ฐ€์„ค: ๋ˆ์ด ๋งŽ์ด ์‚ฌ๋žŒ์ผ์ˆ˜๋ก(์š”๊ธˆ์„ ๋งŽ์ด ๋‚ผ์ˆ˜๋ก) ์ƒ์กดํ™•๋ฅ ์ด ๋†’์„ ๊ฒƒ์ด๋‹ค.

 

1. train.csv ์ฝ์–ด์˜ค๊ธฐ

import numpy as np
import pandas as pd

titanic_csv_filePath="train.csv ํŒŒ์ผ ๊ฒฝ๋กœ"
titanic_df=pd.read_csv(titanic_csv_filePath)

print(titanic_df)
print(titanic_df.info())

โ‡จ Cabin ๊ฐ’์€ data์˜ ์ˆ˜๊ฐ€ 204๊ฐœ๋กœ ์ ์œผ๋ฏ€๋กœ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ฒ ๋‹ค๋ผ๊ณ  ํŒ๋‹จํ•  ์ˆ˜ ์žˆ๋‹ค.

 

2. ์ƒ์กด์ž์™€ ๋น„์ƒ์กด์ž์˜ ๋น„์œจ ๊ตฌํ•˜๊ธฐ

alive=titanic_df[ titanic_df["Survived"]==1 ]
dead=titanic_df[ titanic_df["Survived"]==0 ]

print(len(alive))
print(len(dead))

print(len(alive), "/", len(titanic_df))
print(len(alive), "/", len(titanic_df))

 

titanic_df [ titanic_df["Survived"]==1 ]

: ์ „์ฒด dataframe์—์„œ Survived=1์ธ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜จ๋‹ค๋Š” ์˜๋ฏธ

 

3. ์ƒ์กด์ž ๋น„์ƒ์กด์ž ์‹œ๊ฐํ™”

import matplotlib.pyplot as plt

plt.bar(["alive", "dead"], height=[len(alive), len(dead)])
plt.show()

 

4. ํƒ‘์Šน์ž ๋ณ„ ์š”๊ธˆ ์‹œ๊ฐํ™” (scatter: ์–ด๋””์— ๋งŽ์ด ๋ชฐ๋ ค์žˆ๋Š”์ง€ ํ™•์ธ)

scatter(df์—์„œ x์ถ•์ด ๋  ๊ฒƒ, df์—์„œ y์ถ•์ด ๋  ๊ฒƒ)

label: x, y์ถ•์ด ๋ฌด์—‡์ธ์ง€ ๋‚˜ํƒ€๋‚ด๊ธฐ

plt.scatter(titanic_df["PassengerId"], titanic_df["Fare"])

plt.xlabel("Passenger ID")
plt.ylabel("Fare")
plt.show()

โ‡จ ๊ทธ๋ž˜ํ”„๋งŒ ๋ณด๊ณ  ์–ด๋””์— ์‚ฌ๋žŒ์ด ๋งŽ์ด ์‚ด์•„๋‚จ๊ณ , ๋œ ์‚ด์•„๋‚จ์•˜๋Š”์ง€ ์•Œ ์ˆ˜ ์—†๋‹ค.

โ‡จ ์ƒ์กด์ž์™€ ๋น„์ƒ์กด์ž๋ฅผ ๊ตฌ๋ถ„ํ•ด์•ผ ํ•œ๋‹ค.

 

5. scatter์—์„œ ์ƒ์กด์ž(green), ๋น„์ƒ์กด์ž(green) ๊ตฌ๋ถ„ํ•˜๊ธฐ

alive: ์ƒ์กดํ•œ ์‚ฌ๋žŒ์˜ df

dead: ์ƒ์กดํ•˜์ง€ ๋ชปํ•œ ์‚ฌ๋žŒ์˜ df

plt.scatter(alive["PassengerId"], alive["Fare"], color="green")
plt.scatter(dead["PassengerId"], dead["Fare"], color="red")

plt.xlabel("Passenger ID")
plt.ylabel("Fare")
plt.show()

โ‡จ 50๋‹ฌ๋Ÿฌ ์ด์ƒ์„ ๋ณด์•˜์„ ๋•Œ, ๋ˆ์„ ๋งŽ์ด ๋‚ธ ์‚ฌ๋žŒ์ผ์ˆ˜๋ก ์‚ด์•„๋‚จ์€ ๊ฒƒ ๊ฐ™๋‹ค.

 

6. $50๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์‚ฌ๋žŒ ์ˆ˜ ๊ตฌํ•˜๊ธฐ

over_50=titanic_df[titanic_df["Fare"]>=50]
under_50=titanic_df[titanic_df["Fare"]<50]

print(len(over_50), "/", len(titanic_df), ",", str(len(over_50)/len(titanic_df)*100)[:5], "%")
print(len(under_50), "/", len(titanic_df), ",", str(len(under_50)/len(titanic_df)*100)[:5], "%")

18.06178 ์—์„œ ์†Œ์ˆซ์  ๋‘˜์งธ์ž๋ฆฌ๊นŒ์ง€๋งŒ ์–ป๊ณ  ์‹ถ์„ ๋•Œ

: ๋ฌธ์ž์—ด์—์„œ 18.06, ์ฆ‰ 5์ž๋ฆฌ๋งŒ ํ•„์š”ํ•˜๋ฏ€๋กœ slicing์„ ํ•ด์ค€๋‹ค.

โ‡จ str( -๊ณต์‹- )[:5]

 

7. ์ƒ์กด์ž ์ค‘์—์„œ $50 ๋ฏธ๋งŒ, ์ด์ƒ ๋‚˜๋ˆ„๊ธฐ

alive_over_50=over_50[ over_50["Survived"]==1 ]
alive_under_50=under_50[ under_50["Survived"]==1 ]

print(len(alive_over_50), "/", len(over_50), str(len(alive_over_50)/len(over_50)*100)[:5], "%")
print(len(alive_under_50), "/", len(under_50), str(len(alive_under_50)/len(under_50)*100)[:5], "%")

๐Ÿ’ก๊ฒฐ๋ก : ๋ˆ์ด ๋งŽ์ด ์‚ฌ๋žŒ์ผ์ˆ˜๋ก(์š”๊ธˆ์„ ๋งŽ์ด ๋‚ผ์ˆ˜๋ก) ์ƒ์กดํ™•๋ฅ ์ด ๋†’๋‹ค. (๊ฐ€์„ค์ด ๋งž๋‹ค.)

'๐Ÿ“ Data Analysis > ๐Ÿ–ฑ ์‹ค์Šต' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

Hospital  (0) 2022.09.14
Pandas  (0) 2022.09.08

BELATED ARTICLES

more