๐Ÿ“ Data Analysis/๐Ÿ“‘ ์ด๋ก 

๊ฒ€์ƒ‰๊ฒฐ๊ณผ 3 ๊ฐœ
Clustering

Clustering ์ด๋ž€? "๋ฐ์ดํ„ฐ ํฌ์ธํŠธ"์˜ ๊ทธ๋ฃนํ™”์™€ ๊ด€๋ จ๋œ ๋จธ์‹ ๋Ÿฌ๋‹ ๊ธฐ์ˆ ์ด๋‹ค. (unsupervised learning) ์–ด๋– ํ•œ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ ์ง‘ํ•ฉ์ด ์ฃผ์–ด์กŒ์„ ๋•Œ, clustering ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๋ฅผ ํŠน์ • ๊ทธ๋ฃน์œผ๋กœ ๋ถ„๋ฅ˜ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ถ„๋ฅ˜ํ•  ๋•Œ์˜ ๊ธฐ์ค€ โ‘  high intra-class: ํ•˜๋‚˜์˜ ๊ทธ๋ฃน์—๋Š” ์ตœ๋Œ€ํ•œ ๋น„์Šทํ•œ ๊ฒƒ๋ผ๋ฆฌ โ‘ก low intra-class: ์„œ๋กœ ๋‹ค๋ฅธ ๊ทธ๋ฃน๋“ค์€ ์ตœ๋Œ€ํ•œ ๋‹ค๋ฅธ ๊ฒƒ๋ผ๋ฆฌ ๊ฐ ๊ตฐ์ง‘์— ํ• ๋‹น๋œ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๋“ค์˜ ํ‰๊ท ์„ ์ด์šฉํ•˜์—ฌ, ์ค‘์‹ฌ์ ์„ ๋ฐ˜๋ณต์ ์œผ๋กœ ์—…๋ฐ์ดํŠธํ•˜๋ฉฐ ๊ตฐ์ง‘์„ ํ˜•์„ฑํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋‹ค. ์žฅ์  โ‘  ์‹ค์ œ๋กœ ์ˆ˜ํ–‰ํ•˜๋Š” ์ž‘์—…์ด "ํฌ์ธํŠธ์™€ ๊ทธ๋ฃน ์ค‘์•™ ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ ๊ณ„์‚ฐ"์ด๋ฏ€๋กœ ๋งค์šฐ ๋น ๋ฅด๋‹ค. โ‘ก ๋ณต์žก๋„๊ฐ€ O(n) ์ด๋‹ค. ๋‹จ์  โ‘  ๋ชจ๋“  value ๊ฐ€ numeric ์ด์–ด์•ผ ..

Association Rule

Frequent Item Set ์ด๋ž€? ๋ฐ์ดํ„ฐ๋ฅผ ๊ด€์ธกํ•  ๋•Œ, ๋ฐ˜๋ณต์ ์œผ๋กœ ๋ฐœ๊ฒฌ๋˜๋Š” ํŒจํ„ด์„ ์˜๋ฏธํ•œ๋‹ค. ๋ฐ˜๋ณต์ ์œผ๋กœ ๋ฐœ๊ฒฌ๋˜์–ด์•ผ ๋ฏธ๋ž˜์— ๋Œ€ํ•œ ์˜ˆ์ธก์ด ํšจ๊ณผ์ ์ด๋ฏ€๋กœ ์œ ์˜๋ฏธํ•˜๋‹ค. ์ „์ฒด ๋ฒŒ์–ด์ง„ ๊ฒฝ์šฐ ์ค‘์— ๋ช‡ ๋ฒˆ ๊ทธ ์‚ฌ๊ฑด์ด ๋ฒŒ์–ด์กŒ๋Š”์ง€์— ๋Œ€ํ•œ ๋น„์œจ์ด "์ผ์ • ๊ธฐ์ค€ ์ด์ƒ"์ด๋ฉด frequent ํ•˜๋‹ค๊ณ  ๋ณธ๋‹ค. โ‘  absolute support: ๊ทธ๋ƒฅ ๊ฐœ์ˆ˜๋ฅผ ์„ธ๋Š” ๊ฒƒ โ‘ก relative support: absolute support ๋ฅผ ์ „์ฒด ๊ฒฝ์šฐ๋กœ ๋‚˜๋ˆˆ ๊ฒƒ โ‡จ ํŠน์ • item set ์ด "Minimum Supprot Threshold" ์ด์ƒ์ด๋ฉด frequent ํ•˜๋‹ค๊ณ  ๋ณธ๋‹ค. Association Rule ์ด๋ž€? ์–ด๋–ค ์‚ฌ๊ฑด์ด ์–ผ๋งˆ๋‚˜ ์ž์ฃผ ํ•จ๊ป˜ ๋ฐœ์ƒํ•˜๋Š”์ง€, ์„œ๋กœ ์–ผ๋งˆ๋‚˜ ์—ฐ๊ด€๋˜์–ด ์žˆ๋Š”์ง€ ํ‘œ์‹œํ•˜๋Š” ๊ทœ์น™์ด๋‹ค. Frequent Item Set ..