Simulasaun Deteksaun Spam Email (Metode Naive Bayes)

1. Rekolha Data

Rekolha dataset email ho label spam (1) atau ham (0). Dataset populer: Enron Spam Dataset, SpamAssassin Public Corpus.

Ezemplu:

    | Email                   | Label |
    |-------------------------|-------|
    | "Dapatkan diskon 50%"   | 1     |
    | "Berita baik untuk Anda"| 0     |
    

2. Pra-pemrosesan

Etapa ba prosesu ne mak hanesan tuir mai nee:

Ezemplu: "Saya pergi berbelanja!" → ["saya", "pergi", "berbelanja"]

3. Ekstraksaun Fitur

Ita sura utiliza metode Bag-of-Words (BoW).

    | Kata       | Frekuensi |
    |------------|-----------|
    | diskon     | 2         |
    | hadiah     | 1         |
    | beli       | 1         |
    

4. Sura Probabilidade Klase

Ita sei sura probabilidade klase spam no ham.

    P(Spam) = Total Spam / Total Email
    P(Ham) = Total Ham / Total Email
    

Ezemplu:

    Jika ada 3 Spam dan 2 Ham:
    P(Spam) = 3 / 5 = 0.6
    P(Ham) = 2 / 5 = 0.4
    

5. Sura Probabilidade liafuan

Ita sei sura probabilidade kada liafuan ba klase spam no ham.

    P(Luafuan|Spam) = Frekuensia Luafuan iha Spam / Total Luafuan iha Spam
    P(Luafuan|Ham) = Frekuensia Luafuan iha Ham / Total Luafuan iha Ham
    

Ezemplu:

    Karik "diskon" mosu dala 2 iha Fraze 5 spam:
    P(diskon|Spam) = 2 / 5 = 0.4

    Karik "diskon" la mosu iha ham no ita uza Laplace Smoothing:
    P(diskon|Ham) = (0 + 1) / (Total liafuan iha ham + total Fitur)
    = 1 / (3 + 1) = 0.25
    

6. Prediksaun ba klase email

Ita sei uza formula Naive Bayes hodi prediksaun ba klase email.

    P(Spam|Email) ∝ P(Spam) * P(Liafuan1|Spam) * P(Liafuan1|Spam) * ...
    P(Ham|Email) ∝ P(Ham) * P(Liafuan2|Ham) * P(Liafuan2|Ham) * ...
    

Ezemplu:

    Karik email nee iha liafuan "diskon" no "hari":
    P(Spam|Email) ∝ 0.6 * P(diskon|Spam) * P(hari|Spam)
    P(Ham|Email) ∝ 0.4 * P(diskon|Ham) * P(hari|Ham)
    

7. Evaluasaun Model

Metrik nebee utiliza ba evaluasaun mak hanesan akurasi, presisi, no recall.

8. Implementasaun Modelu

Integrasaun model ba sistema email hodi detekta spam ho realtime.

Ezemplu: Se modelu detekta spam, mak email refere sei fosai nudar spam.