Association Rule

2022. 9. 30. 15:27

 

Frequent Item Set ์ด๋ž€?

๋ฐ์ดํ„ฐ๋ฅผ ๊ด€์ธกํ•  ๋•Œ, ๋ฐ˜๋ณต์ ์œผ๋กœ ๋ฐœ๊ฒฌ๋˜๋Š” ํŒจํ„ด์„ ์˜๋ฏธํ•œ๋‹ค. ๋ฐ˜๋ณต์ ์œผ๋กœ ๋ฐœ๊ฒฌ๋˜์–ด์•ผ ๋ฏธ๋ž˜์— ๋Œ€ํ•œ ์˜ˆ์ธก์ด ํšจ๊ณผ์ ์ด๋ฏ€๋กœ ์œ ์˜๋ฏธํ•˜๋‹ค.

์ „์ฒด ๋ฒŒ์–ด์ง„ ๊ฒฝ์šฐ ์ค‘์— ๋ช‡ ๋ฒˆ ๊ทธ ์‚ฌ๊ฑด์ด ๋ฒŒ์–ด์กŒ๋Š”์ง€์— ๋Œ€ํ•œ ๋น„์œจ์ด "์ผ์ • ๊ธฐ์ค€ ์ด์ƒ"์ด๋ฉด frequent ํ•˜๋‹ค๊ณ  ๋ณธ๋‹ค.

 

โ‘  absolute support: ๊ทธ๋ƒฅ ๊ฐœ์ˆ˜๋ฅผ ์„ธ๋Š” ๊ฒƒ

โ‘ก relative support: absolute support ๋ฅผ ์ „์ฒด ๊ฒฝ์šฐ๋กœ ๋‚˜๋ˆˆ ๊ฒƒ

โ‡จ ํŠน์ • item set ์ด "Minimum Supprot Threshold" ์ด์ƒ์ด๋ฉด frequent ํ•˜๋‹ค๊ณ  ๋ณธ๋‹ค.

 

 

Association Rule ์ด๋ž€?

์–ด๋–ค ์‚ฌ๊ฑด์ด ์–ผ๋งˆ๋‚˜ ์ž์ฃผ ํ•จ๊ป˜ ๋ฐœ์ƒํ•˜๋Š”์ง€, ์„œ๋กœ ์–ผ๋งˆ๋‚˜ ์—ฐ๊ด€๋˜์–ด ์žˆ๋Š”์ง€ ํ‘œ์‹œํ•˜๋Š” ๊ทœ์น™์ด๋‹ค.

Frequent Item Set ์„ ํ•œ ์ฐจ๋ก€ ๋” ๊ฐ€๊ณตํ•œ ๊ฒƒ์ด Association Rule ์ด๋‹ค.

 

์ธก์ • Parameter

โ‘  support: ์ „์ฒด ๊ฒฝ์šฐ์˜ ์ˆ˜์—์„œ ๋‘ item ์ด ๊ฐ™์ด ๋‚˜์˜ค๋Š” ๋น„์œจ

โ‘ก confidence: x ๊ฐ€ ๋‚˜์˜ค๋Š” ๊ฒฝ์šฐ์˜ ์ˆ˜์—์„œ x ์™€ y ๊ฐ€ ํ•จ๊ป˜ ๋‚˜์˜ค๋Š” ๋น„์œจ (์กฐ๊ฑด๋ถ€ ํ™•๋ฅ ์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋œ๋‹ค.)

 

์œ ์˜๋ฏธํ•œ Association Rule ์ด ๋˜๊ธฐ ์œ„ํ•œ ์กฐ๊ฑด

โ‘  Supprot๊ฐ€ Supprot Minimum Threshold ์ด์ƒ์ด์–ด์•ผ ํ•œ๋‹ค. (์ฆ‰, frequent item set ์ด์–ด์•ผ ํ•œ๋‹ค.)

 

 

โ‘ก Confidence ๊ฐ€ Confidence Minimum Threshold ์ด์ƒ์ด์–ด์•ผ ํ•œ๋‹ค.

 

Association Rule ์„ ์ฐพ๋Š” ๋Œ€ํ‘œ์ ์ธ ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•

โ‘  Apriori Algorithm

โ‘ก FP-Growth Algorithm

 

 

<Apriori Algorithm>

์–ด๋– ํ•œ ๋‘ item ์˜ ์ง‘ํ•ฉ์ด frequent ํ•˜๊ฒŒ ๋ฐœ์ƒํ•˜๋Š”์ง€๋ฅผ ์•Œ๋ ค์ฃผ๋Š” ๊ทœ์น™์„ ์ƒ์„ฑํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋‹ค.

Lattice ๊ตฌ์กฐ → ์–ด๋– ํ•œ item set ์ด frequent ํ•˜์ง€ ์•Š๋‹ค๋ฉด ๊ทธ๊ฒƒ์˜ ๋ชจ๋“  superset ์€ ์ ˆ๋Œ€๋กœ frequent ํ•  ์ˆ˜ ์—†๋‹ค.

 

์•Œ๊ณ ๋ฆฌ์ฆ˜

โ‘  ๊ฐ๊ฐ์˜ item ์—์„œ Length=1 ์ธ ๊ฒƒ์˜ support ๋ฅผ ๊ฒ€์‚ฌํ•œ๋‹ค. (minimum supprot threshold ๋ณด๋‹ค ์ž‘์€ ๊ฒƒ์€ ์ œ๊ฑฐํ•œ๋‹ค.)

โ‘ก Length=2 ์ธ ์กฐํ•ฉ์„ ๊ตฌ์„ฑํ•˜๊ณ  support ๋ฅผ ๊ฒ€์‚ฌํ•œ๋‹ค.

โ‘ข ๊ธฐ์ค€์„ ์ •ํ•˜๊ณ (ex. ์•ž๋งŒ ๋ณด๊ณ  ๊ฒฐ์ •ํ•˜๊ฒ ๋‹ค.), ์ด์— ๋”ฐ๋ผ Length=3 ์ธ ์กฐํ•ฉ์„ ๊ตฌ์„ฑํ•œ๋‹ค. 

โ‘ฃ ๋งŒ๋“ค์–ด์ง„ Length=3 ์ธ ์กฐํ•ฉ์—์„œ, ๊ธฐ์ค€์—์„œ ๋ฒ—์–ด๋‚œ ๊ฒƒ์˜ supprot ๋ฅผ ๊ฒ€์‚ฌํ•œ๋‹ค.

โ‘ค ์กฐํ•ฉ์ด ๋” ์ด์ƒ ๋‚˜ํƒ€๋‚˜์ง€ ์•Š์„ ๋•Œ๊นŒ์ง€ ๋ฐ˜๋ณตํ•œ๋‹ค.

 

๋‹จ์ 

โ‘  ๋„ˆ๋น„ ์œ ์„  ํƒ์ƒ‰์ด๋ฏ€๋กœ depth ๊ฐ€ ๊ธธ์–ด์ง€๋ฉด candidate ๊ฐ€ ๋งŽ์•„์ง€๋ฉด ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์ดˆ๊ณผ๋  ์ˆ˜ ์žˆ๋‹ค.

 

 

<FP-Growth Algorithm>

Apriori Algorithm ์„ ๊ฐœ์„ ํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋‹ค.

 

์•Œ๊ณ ๋ฆฌ์ฆ˜

โ‘  Length=1 ์˜ ๋ชจ๋“  supprot ๋ฅผ ๊ฒ€์‚ฌํ•œ๋‹ค.

 

โ‘ก Support ๊ฐ€ Minimum Supprot ์ด์ƒ์ธ ๊ฒƒ๋“ค๋งŒ ๊ณจ๋ผ๋‚ธ ํ›„, ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์žฌ์ •๋ ฌํ•œ๋‹ค.

โ‘ข Root ๋ฅผ ์‹œ์ž‘์œผ๋กœ, FP Tree ๋ฅผ ๋งŒ๋“ ๋‹ค. (์˜†์— supprot ์™€ link ๋ฅผ ํ•จ๊ป˜ ์ž‘์„ฑํ•œ๋‹ค.)

โ‘ฃ ๊ฐ๊ฐ์˜ Conditional Tree ๋ฅผ ๋งŒ๋“ ๋‹ค.

โ‘ค ์กฐํ•ฉ์ด ๋” ์ด์ƒ ๋‚˜์˜ค์ง€ ์•Š์„ ๋•Œ๊นŒ์ง€ Conditional Tree ๋ฅผ ๋ฐ˜๋ณตํ•˜์—ฌ ๋งŒ๋“ ๋‹ค. 

 

์žฅ์ 

โ‘  Basket Data ์—์„œ ํ•„์š”ํ•œ ๊ฒƒ๋“ค๋งŒ ์••์ถ•ํ•ด์„œ ํ‘œํ˜„ํ•œ ์ž๋ฃŒ๊ตฌ์กฐ์ด๋‹ค.

 

โ‘ก Candidate ๋ฅผ ๋งŒ๋“ค์ง€ ์•Š๊ณ , FP Tree ๋งŒ ๋งŒ๋“ค๋ฉด ๋˜๋ฏ€๋กœ ๋น ๋ฅด๋‹ค.

โ‘ข ๊นŠ์ด ์œ ์„  ํƒ์ƒ‰์ด๋ฏ€๋กœ ๋ฉ”๋ชจ๋ฆฌ์— ๋Œ€ํ•œ ๋ถ€๋‹ด์ด ์ ๋‹ค.

'๐Ÿ“ Data Analysis > ๐Ÿ“‘ ์ด๋ก ' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

Clustering  (0) 2022.10.20
1. ๋‚˜์˜ ๋ฐ์ดํ„ฐ ์•Œ๊ธฐ  (0) 2022.09.19

BELATED ARTICLES

more