Create Your Own Neural Network From The Scratch (A.I - 101)

Create Your Own Neural Network From The Scratch (A.I - 101)

Kuchwizzy

JF-Expert Member
Joined
Oct 1, 2019
Posts
1,193
Reaction score
2,599
Wazee wazee

Kwanza nianze kwa kukiri kuwa sina kawaida ya kumalizia nyuzi ninazozianzisha kama ambavyo lecturers wengine wa JF walivyo pia

Sababu kubwa huwa ni mbili, kukosa interest au kutingwa na kazi, na mda mwingine unakosa wale sophisticated audiences unaowatarajia kulingana na topic husika (kwa hii na expect audience wachache anyway)

Ila huu uzi utakua tofauti, nitajitahidi kuu-keep active hata kama utachukua mwaka mzima au wafatiliaji watakua wawili.
Mathematical foundation ya Neural Network sio ngumu sana, kama umesoma Advanced Mathematics (5 & 6) unaweza ku digest 90% ya hizo concept

Kinachofanya Neural Network iwe so powerful sio math or its architecture ni uwezo wake wa generalize kutoka kwenye what it learns from large numbers of independent random examples kulingana LLN

"The law of large numbers (LLN) is a mathematical law that states that the average of the results obtained from a large number of independent random samples converges to the true value, if it exists" - Wikipedia

(Tutaelezea zaidi maana ya hiko nilichotoka kusema kwenye lectures za mwisho mwisho)

Tuanze na neno kutoka kwa motivational speaker wetu, Prof Richard Feyman
wpid-Photo-Nov-8-2012-607-AM (2).jpg


Prerequisites: Na assume upo comfortable na Python Programming, na una fundamentals za Pure mathematics haswa topics kama Matrices, Vectors, Logic, Function, Calculus (Differentiation & Partial Derivative), na basic concepts za Linear Algebra. Statistics na Probability

Kwa advanced mathematical concepts ambazo hukusoma advance tutazielezea as we go.
Pia hakikisha una PC, Gmail account, extra I.Q points za ku digest hizi concepts pamoja na great interest /passion na A.I

Link ya lectures hizi hapa chini (kwasasa tupo Lecture 15).

Lecture 1 - Easy For Me, Hard For You

Lecture 2 - A Simple Predicting Machine

Lecture 3 - Classifying is Not Very Different from Predicting

Lecture 4 - Sometimes One Classifier Is Not Enough

Lecture 5 - Neurons, Nature’s Computing Machines

Lecture 6 - Following Signals Through A Neural Network

Lecture 7 - Matrix Multiplication is Useful .. Honest!

Lecture 8 - A Three Layer Example with Matrix Multiplication

Lecture 9 - Learning Weights From More Than One Node

Lecture 10 - Backpropagating Errors with Matrix Multiplication

Lecture 11 - How Do We Actually Update Weights?

Lecture 12 - Gradient Descent Algorithm

Lecture 13 - Mean Squared Error ( Loss ) Function And Mathematics of Gradient Descent Algorithm

Lecture 14 - Derivative of Error with respect to any weight between input and hidden layers

Lecture 15 - How to update weights using Gradient Descent
 
Wazee wazee|

Kwanza nianze kwa kukiri kuwa sina kawaida ya kumalizia nyuzi ninazozianzisha kama ambavyo lecturers wengine wa JF walivyo pia, sababu kubwa huwa ni mbili, kukosa interest au kutingwa na kazi na mda mwingine unakosa wale sophisticated audiences unaowatarajia kulingana na topic husika

Ila huu uzi utakua tofauti, nitajitahidi kuu-keep active hata kama utachukua mwaka mzima, wachangiaji wa "kula tunda kimasihara" mbona wanaweza?

Tuanze......

Moja kati ya kitu ninachotakamani kujua pale ninapojifunza teknolojia yoyote ile ni jinsi ya ku replicate hio teknolojia kwanzia mwanzo kabisa (from the scratch)
Nikinukuu maneno ya Richard Feynman, "Kile nisichoweza kukitengeneza, siwezi kukielewa"

Ukweli ni kwamba ni ngumu sana ku create most of the tech (softwares) from the scratch, sababu huwa ni mbili

1. Too complex
2. Pointless

Tazama software kama Operating system? vipi kama tukiamua ku code from the scratch inatuchukua mda gani kupata non trivial, functional operating system? Too complex

Pia ni pointless, why reinventing the wheel wakati unaweza build O.S on the top of Linux kernel? au ukachukua Linux distro yoyote for free?

Lakini kama unataka kuelewa Operating System, huna budi ya kutengeneza moja from the scratch, sio lazima iwe complex kama Windows au Linux, but iwe minimal enough kufanya kazi.

Inspiration kubwa ya kuanzisha huu uzi, ni ujio wa Generattive AI, na kama una pay attention utagundua kuwa soon softwares nyingi zitakua integrated na hizi Gen AIs

Nyuma ya Generative AI revolution, kuna Transformer Architecture hii ndio core tech nyuma ya ChatGPT, Gemini, Llama na almost Generative foundation models zote kwasasa
Ukiaangalia hii architecture utagundua ni layers za Neural networks tu zilizopangwa katika mtindo mzuri

Means, kama utaelewa Neural network vizuri, unatakua na uwezo wa kuelewa sio tu Transformer architecture ila A.I field kwa ujumla kwasababu ina rely zaidi kwenye Neural network kama dominant ML Algorithm

Na njia pekee ya kuelewa Neural Network? Yah unda yako from the scratch

Tofauti ya Neural network na Operating system ni kwamba Neural network sio ngumu kabisa kuiunda from the scratch
Na ninaposema from the scratch, sina maana ya codes tu, bali kwanzia theory, mathematical formalism, coding, training mpaka evaluation yake.

Mathematical concepts nyuma ya Neural network zote tumezisoma form 5 & 6 sio complex, kwa asilimia kubwa ni Differentiation, Matrix, Functions, na Vectors plus concept kadhaa kwenye Biology

So lengo la huu uzi ni ku design na ku code neural network from the scratch......

Tutaendelea...

+255748333586
naweka kambi hapa
 
References nyingi nitazitoa kwenye kitabu cha Make Your Own Neural Network cha Tariq Rashid (attachment ipo chini)

Part 01 - Easy for me, Hard for you.​

Kimsingi, kompyuta haina tofauti yoyote na kikokotoo (calculator), ipo haraka sana tu kwenye kufanya mahesabu (arithmetic)
Majukumu mengi yanayofanywa na kompyuta hayaitaji akili nyingi zaidi ya kufanya hesabu rahisi kwa haraka
Inaweza kukushangaza kua, hata kitendo cha kufululiza (stream) video au muziki kinafanywa kwa kutumia hesabu rahisi ambazo ulijifunza shule ya msingi

Kujumlisha namba kwa haraka, maelfu au mamilioni ya namba kwa sekunde sio akili bandia (artificial intelligence)
Inaweza kuwa ngumu kwako kama binadamu kujumlisha namba kubwa, mfano 2,673,684,087 + 476,783,373 kwa haraka, lakini hili zoezi halitumii akili yoyote zaidi ya uwezo wa kufata maelekezo ya msingi (basic instructions)
na hiko ndiyo kinachofanywa na vifaa ya kieletroniki ndani ya kompyuta yako.

Vipi kama tukibadilisha zoezi?

cat pic.jpg


Nikikwambia ni kitu gani unakitambua kwenye hio picha hutotumia hata sekunde nzima kunipa jibu kuwa ni "paka"
Lakini hili ni zoezi gumu sana kwa kompyuta richa ya uwezo wake mkubwa wa kupiga hesabu kwa kasi ya mwanga

Sababu ni kwamba, kompyuta inategemea maelekezo kufanya jukumu fulani, na haya maelekezo huandikwa na binadamu kama programu ya kompyuta au codes au softaware
Tunaweza tengeneza programu ya kompyuta inayoipa uwezo kompyuta uwezo wa kucheza Tic Tac Toe Game
Kwasababu binadamu anaweza kuandika maelekezo yanayoeleweka ya jinsi gani ya kucheza mchezo huu

carbon.png


Kwa upande mwingine, binadamu hawezi kuandika maelekezo kwa kompyuta ya jinsi gani ya kutambua paka kwenye picha, unaweza jaribu hili zoezi mwenyewe uone jinsi gani lilivyo gumu

Moja hata binadamu mwenyewe hajui jinsi gani anaweza tambua paka kwenye picha, ni ubongo ndio unakwambia
Pili huwezi andika hizo hatua chini, hujui paka atakua wapi kwenye picha na ni paka wa rangi gani au hatua gani

Sasa tunaweza vipi kuiwezesha kompyuta kufanya aina hii ya majukumu (majukumu ambayo sisi wenyewe hatujui jinsi ya kuyafanya) bila kubadili muundo au asili ya kompyuta? Hapa ndio Akili bandia inapokuja.

Inabidi tuje na Algoriti (Algorithm)
njia itakayo iwezesha kompyuta kujifunza yenyewe namna gani ya kutatua aina hii ya matatizo.

NEXT: A Simple Predicting Machine
Tutaendelea....
 

Attachments

Wazee wazee|

Kwanza nianze kwa kukiri kuwa sina kawaida ya kumalizia nyuzi ninazozianzisha kama ambavyo lecturers wengine wa JF walivyo pia, sababu kubwa huwa ni mbili, kukosa interest au kutingwa na kazi na mda mwingine unakosa wale sophisticated audiences unaowatarajia kulingana na topic husika

Ila huu uzi utakua tofauti, nitajitahidi kuu-keep active hata kama utachukua mwaka mzima, wachangiaji wa "kula tunda kimasihara" mbona wanaweza?

Tuanze......

Moja kati ya kitu ninachotakamani kujua pale ninapojifunza teknolojia yoyote ile ni jinsi ya ku replicate hio teknolojia kwanzia mwanzo kabisa (from the scratch)
Nikinukuu maneno ya Richard Feynman, "Kile nisichoweza kukitengeneza, siwezi kukielewa"

Ukweli ni kwamba ni ngumu sana ku create most of the tech (softwares) from the scratch, sababu huwa ni mbili

1. Too complex
2. Pointless

T
+255748333586
Mkuu mkuu! Umejaribu kuzifuatilia Infrormed physics neuro network? Very difficult to implement.
Ingekuwa raisi kiivyo AI model zingejaa tele. Hizo kitu zinahitaji infrastructure kubwa na team ya kutosha . China wenyewe hawajaja na model inayoeleweka mpaka sasa wapo wana by time USA ndio master.
Kama tumeshindwa kuunda Application tu kama You tube, tiktok, Tweeter, Facebook, wasap, nk leo tutawezaje ma AI , tutaweza ?
Kitu kinge hakuna kitu ghali na kitakachotushinda sisi weusi wa mkaa ni ku train AI. Unahitaji corpus of data. Utazipata wapi, una hela ?
Imagine Chatgpt na Gemini wameweza kutrai lugha zetu za asili hizo AI. Sisi hata kuziandika tu lugha zetu za asili kwenye vitabu hatujawahi waza, ukivikuta vitabu ujue waliandika wazungu au warabu.
Ni kweli AI zinaweza zisiwe ngumu ila sio kwa mtu mweusi. Yaani tumekuwa so lazy and unvisionary.
Basi mtu kama wewe uwe mfano kwenye jamii yetu kuvunja mwiko wa kuwa wafuasi. Nakupongeza kwa HAMASA yako.
 

Part 02 - A Simple Predicting Machine​


Ninapokuuliza swali, mfano nini kipo kwenye picha? unafikiria kisha unanipa jibu. Kwasababu kompyuta ni kikokotoo tu, haina uwezo wa kufikiria kama tunavyofanya, hivyo huchukua swali kama input kisha huchakata (process) na kutupa jibu kama output
input-process-output.PNG

Suppose, input ni 3 x 4, tunajua jinsi ya kuchakata / kokotoa aina hii ya input kupata output:
process = 4 + 4 + 4
output = 12
3-4.PNG




Sasa vipi kwa input ambayo hatujui jinsi gani ya kukokotoa ili kupata output? assume, kwa ajili tu ya huu mjadala, hatujui jinsi gani ya kubadili kilometa kuwa maili lakini tunajua mahusiano kati ya kilometa na maili ni linear
Ikiwa na maana kuwa maili zinaongezeka kadri kilometa zinapoongezeka na tuna mifano kadhaa ya kilometa na maili zake

km-to-m.PNG


Kwasababu tunajua mahusiano kati ya kilometa na maili ni Linear
Tunaweza sema, ili kupata maili tutazidisha kilometa na constant fulani, tuseme c

equation (3).png


Lakini hatujui thamani ya c, tunaweza anza na thamani yoyote kwa bahati nasibu, tuseme c = 0.5
Kwenye mfano wetu tuliyonao, kilometa ni 100

equation (2).png


fgchgugfu.PNG


Kwa c = 0.5, tunapata kuwa kilometa 100 ni sawa na maili 50, lakini mfano wetu unatwambia kuwa, kilometa 100 ni sawa na maili 62.137, kuna makosa (error)
Makosa (error), ni tofauti kati ya ukweli tunaoujua (truth) na utabiri wetu (prediction)
Tunaweza sema:

dsrfhikjjik.png


truth = 62.137, na prediction yetu = 50

dstghjjo.png


gduhdhkhd.PNG


Tunajua tumekosea kwa 12.137, badala ya kukata tamaa tunaweza tumia haya makosa kubahatisha thamani nyingine ya c itakayo tupeleka karibu na ukweli (truth)
Kwasababu mahusiano kati ya maili na kilometa ni Linear na utabiri wetu upo nyuma ya ukweli tunaoujua
Hivyo ili kuukaribia ukweli, tunachotakiwa kufanya ni kuongeza thamani ya c

Tusema sasa, c= 0.6, hivyo:

equation (4).png


ffhgjllhjgvvjkhjj.PNG


Point ya msingi hapa ni kwamba, tunatumia makosa (error) tuliyopata baada ya kulinganisha ukweli tunaoujua (truth), kutoka kwenye mifano tuliyonayo (training data) na utabiri wetu wa sasa (prediction) kuboresha utabiri wetu ujao (future prediction). Tunarudia rudia huu mchakato (iterative) mpaka pale tutakapojiamini kwenye utabiri wetu

Hii idea rahisi ndio idea ya msingi zaidi kwenye utendaji kazi wa Neural Network
Kwenye huu utabiri wa pili, tumepunguza makosa kutoka 12.137, mpaka 2.137, kadri makosa yanavyopungua ndiyo utabiri wetu unapoelekea kwenye usahihi zaidi (utapokaribia ukweli tunaoujua)
A.I experts wanaposema wana "train neural network" kimsingi wanachofanya ni kupunguza makosa yanayopata kwa kila mifano (training dataset) mpaka pale makosa yanapokua madogo zaidi (weka hii concept kichwani, tutaitumia zaidi mbeleni)

Sasa, kupunguza tena makosa tuliyopata, tubahatishe tena kuwa c = 0.7

equation (5).png


Negative error? maana yake ni nini?
Ni kwamba tume overshot
Kwa maneno mengine, utabiri wetu (prediction) umevuka lengo letu au ukweli tunaoujua (truth / target value)
Hivyo basi, tulikua sahihi zaidi pale c = 0.6, kuliko c ikiwa 0.7 (Shika pia hii concept, ni muhimu zaidi hapo mbeleni)

Hapa tunajifunza nini? kama kosa ni kubwa tunapaswa kuongeza thamani ya c zaidi, lakini kama kosa ni dogo, tunapaswa kuongeza thamani ya c kidogo
Kwa maneno mengine, makosa yanapaswa ku guide kwa kiwango gani tuna adjust variable au parameter yetu, katika mfano huu paramter yetu ni c.

Fun fact: Expert wanaposema model fulani ina idadi kadhaa ya parameter, mfano Llama 3.1 ina parameters bilioni 405, parameters zinazozungumziwa hapa ndio kama hizi, kwenye hii model yetu rahisi ya kutabiri maili ikipewa kilometa, parameter yetu ni moja tu, c

Kwahio, badala ya c kuwa 0.7, tunapaswa kuchukua hatua ndogo ndogo ili kuepuka overshooting kwasababu makosa kwasasa ni madogo

Tusema, c = 0.61

equation (6).png


fhghjhjhgfhgjh.PNG



Point ya msingi hapa (na ya muhimu), ni kutumia makosa tunayopata (error) kuamua kwa kiasi gani tunapaswa ku adjust thamani ya parameter yetu, ku avoid overshooting na kukalibia ukweli tunaoujua (truth) au lengo (target value)

Tunaweza tumia mda huu sasa, ku formalize baadhi ya concept tulizojifunza:

Error​

The error, often referred to as the loss or cost, measures the difference between the neural network's predictions and the actual target values

Truth (or Ground Truth)​

The truth, or ground truth, is the actual value that the neural network is trying to predict. It is the correct output that corresponds to a given input in the training data. This value is used to compute the error during training.

Target Value​

The target value is another term for the truth or ground truth. It represents the desired output for a given input in the training dataset. The neural network aims to produce outputs that are as close as possible to these target values.

Parameter​

Parameters are the variables that the neural network learns and adjusts during training to minimize the error

Training Data​

Training data is the dataset used to train the neural network. It consists of input-output pairs where the inputs are fed into the network, and the outputs are the corresponding target values. The network learns by adjusting its parameters to reduce the error on the training data.

Prediction​

A prediction is the output produced by the neural network given a certain input. It is the network's estimate of the target value based on its learned parameters. During training, predictions are compared to the target values to compute the error.

Next: Classifying is Not Very Different from Predicting
 

Attachments

  • km-2-m.png
    km-2-m.png
    3.5 KB · Views: 9
  • equation (1).png
    equation (1).png
    3.6 KB · Views: 9
Mkuu mkuu! Umejaribu kuzifuatilia Infrormed physics neuro network? Very difficult to implement.
Ingekuwa raisi kiivyo AI model zingejaa tele. Hizo kitu zinahitaji infrastructure kubwa na team ya kutosha . China wenyewe hawajaja na model inayoeleweka mpaka sasa wapo wana by time USA ndio master.
Kama tumeshindwa kuunda Application tu kama You tube, tiktok, Tweeter, Facebook, wasap, nk leo tutawezaje ma AI , tutaweza ?
Kitu kinge hakuna kitu ghali na kitakachotushinda sisi weusi wa mkaa ni ku train AI. Unahitaji corpus of data. Utazipata wapi, una hela ?
Imagine Chatgpt na Gemini wameweza kutrai lugha zetu za asili hizo AI. Sisi hata kuziandika tu lugha zetu za asili kwenye vitabu hatujawahi waza, ukivikuta vitabu ujue waliandika wazungu au warabu.
Ni kweli AI zinaweza zisiwe ngumu ila sio kwa mtu mweusi. Yaani tumekuwa so lazy and unvisionary.
Basi mtu kama wewe uwe mfano kwenye jamii yetu kuvunja mwiko wa kuwa wafuasi. Nakupongeza kwa HAMASA yako.
Umemiss point ya msingi hapa
Neural network kama concept ni easy to create from the scratch, na ndio kitu tunachokifaya na tutakachokikamilisha hapa na wala sio Nuclear Physics level tech
Mathematical foundation yake yote inawezwa eleweka hata kwa kijana wa Form 6 mwenye uelewa mzuri wa Advanced Mathematics

Kitu kigumu ambacho nadhani wewe ndio unachomaanisha ni gharama ya ku train bigger models in terms of dataset na hardware ya ku train models kubwa kwa kutumia big datasets lakini underlying science ya neural network ni ile ile, kinachobadilika ni number of parameters na ukubwa wa dataset unayotumia

Hata hivyo, gharama ya ku train hizi model (pesa unayowapa Cloud hosting providers kwa masaa unayotumia kwenye GPUs clusters zao) inapungua pia over time thus why Andrej Karpathy aliweza ku reproduce GPT 2 kwa dollars 20 tu ndani ya lisaa na nusu (lecture yake nzima ipo Youtube)

ChatGPT inaweza fanya complex tasks sio kwasababu neural network ni complex tech ila ni kwasababu neural network ni simple yet impressive tech, kwasababu inajifunza yenyewe, ChatGPT literally imejifunza yenyewe kuandika codes na kuelewa unachoandika still experts bado hawaelewi imewezaje

Elewa kigumu sio sayansi ya neural network, ni upatikanaji wa high quality training dataset na gharama ya hardwares (chips) zinazotumika ku train hizo big models cause zinatumia hata miezi kwenye training


Pia upo wrong kuhusu China, china ndio wanao lead dunia kwenye A.I, na wana model zenye uwezo mkubwa tu, nenda Hugging Face

Pia watu hawaundi Facebook, Youtube au Whatsapp yao sio kwasababu underlying tech ni ngumu ila ni kwasababu ya network effect, huwezi convince watu Billion 2 kuanza kutumia Facebook yako wakati tayari Facebook ipo
 
a na wala sio Nuclear Physics level tech
Nazunguzumzia AI model zinazo simulate physical world. Physics informed Neuralnetwork ni new branch of AI which inform the phydical world.
Sio physics kama physics but physics computation using AI .
Huyu jamaa chini ana lecture nyingi unaweza mfuatilia
az_recorder_20240728_145533.jpg

AI ya aina hii hakuna form six anaweza kuiunda.
Unatakiwa uwe umeiva physics kweli sio ile ya kitoto.Theory za hii fani zimeenda shule unabidi utulieze kichwa kuziingia.
Sio fizikia ya kuunda maroketi au magari au mabomu bali their computation in AI.
 
Pia upo wrong kuhusu China, china ndio wanao lead dunia kwenye A.I,
ChatGpt ndio AIbot inaoyoongoza mpaka sasa. Sam Altman amedhihirisha hilo.Altiman ametoa condition 4 za kuifanya USA iendelee kuwa leader na kuinyima China pumzi kupata baadhi ya mambo ya msingi ktk AI.
Rejea mazungumzo ya Altiman akiishauri serikali ya USA.
Unajua kampuni kama google ambao wana data set kubwa sana gawe?i achwa nyuma na china.
 
Hebu mkuu endelea wengine hatukupita form six na ni software developers na tunatamani one day tuingie kwenye upande wa Ai
 
Nazunguzumzia AI model zinazo simulate physical world. Physics informed Neuralnetwork ni new branch of AI which inform the phydical world.
Sio physics kama physics but physics computation using AI .
Huyu jamaa chini ana lecture nyingi unaweza mfuatiliaView attachment 3055103
AI ya aina hii hakuna form six anaweza kuiunda.
Unatakiwa uwe umeiva physics kweli sio ile ya kitoto.Theory za hii fani zimeenda shule unabidi utulieze kichwa kuziingia.
Sio fizikia ya kuunda maroketi au magari au mabomu bali their computation in AI.
"AI ya aina hii hakuna form six anaweza kuiunda.
Unatakiwa uwe umeiva physics kweli sio ile ya kitoto."

Labda ni kwasababu huelewi Model kwenye computer science huwa zinakua formalized vipi
Halafu naona una ile mentality ya secondary kuwa "Hii topic ngumu achana nayo", hio mindset haina msaada wowote kwa watu wenye kiu ya kujifunza ambao ndio audience ninao target hapa

Ipo hivi, kwenye CS huwa tuna deal na oversimplification ya physical world, concept kama ya Neural network ni oversimplification ya biological neuron, tunachojali zaidi ni ufanisi wa hio algorithm tunayoitaka kutoka kwenye hio concept na sio each and every detail, thus why ni computer scientists walikuja na hio solution bila kuhitaji degree yoyote ya Biology

Pia Lagrangian Mechanics sio complex kama unavyotaka kuogopesha watu, pia hata point ya kuonyesha kuwa kitu fulani ni kigumu pia hai make sense, lengo ni nini? watu wasijaribu?

Lagrangian mechanics ni simplified version ya Newtonian mechanics, badala ya ku deals na forces, una deal na total energy ya system, core idea ni kwamba nature ina tendency ya ku-take action inayo minimize total energy ya system (Principle of Least Action), kwahii assumption pekee, unaweza derive equation yoyote ile ya physical system kwa urahisi kuliko approach ya Newton, kwa maneno mengine Classical mechanics ni ngumu kuliko Lagrangian Mechanics, huku Classical mechanics watu wanaisoma tangu secondary

Chukua hii idea then i incoporate kwenye neural network ndo unapata Lagrangian NN, na hata ukisoma paper yao Math ni easy kui turn into code, so LNN ni oversimplification ya Lagrangian mechanics kama Neural Net ilivyo oversimplification ya Biology, huna haja ya "kuiva" kwenye Physics kama unavyodai

Nguvu ya Neural Network haipo kwenye ugumu wa architecture ila upo kwenye uwezo wa Network yenyewe kujifunza hizo complex patterns yenyewe kulingana na ubora na Training data

Ugumu haupo kwenye ku implement Neural network from the scratch, ugumu upo kwenye kupata high quality big dataset na hardwares za ku train model zenye parameters nyingi, hiko ndio kinachofanya kuwepo kwa hii monopoly ya Big tech kwenye A.I, cause wana data na computing power, sayansi ya Neural Network yote ipo public.


Point nzima ya kukazana kuwa kitu fulani ni kigumu kwangu hai make sense? lengo ni lipi?
 
ChatGpt ndio model inaoyoongoza mpaka sasa. Sam Altman amedhihirisha hilo.Altiman ametoa condition 4 za kuifanya USA iendelee kuwa leader na kuinyima China pumzi kupata baadhi ya mambo ya msingi ktk AI.
Rejea mazungumzo ya Altiman akiishauri serikali ya USA.
Unajua kampuni kama google ambao wana data set kubwa sana gawe?i achwa nyuma na china.
ChatGPT sio model na Sam Altamn hana scientific contribution yoyote kwenye Generative A.I, ni CEO tu, yupo kwenye management, so hana lolote la msingi la kusema kuhusu progress ya A.I research China au Marekani

China wana lead dunia kwenye A.I, hii ni established fact, moja wana graduates wengi wa A.I na Papers nyingi wameandika
 
ChatGPT sio model na Sam Altamn hana scientific contribution yoyote kwenye Generative A.I, ni CEO tu, yupo kwenye management, so hana lolote la msingi la kusema kuhusu progress ya A.I research China au Marekani

China wana lead dunia kwenye A.I, hii ni established fact, moja wana graduates wengi wa A.I na Papers nyingi wameandika
Google search inamiambia hivi.

View attachment 3055306View attachment 3055307
az_recorder_20240728_194925.jpg
az_recorder_20240728_194904.jpg
az_recorder_20240728_194418.jpg
az_recorder_20240728_194742.jpg
az_recorder_20240728_194426.jpg
az_recorder_20240728_194355.jpg

Mkuu hatupo kubishana ila source zinaniambia USA Company ndio leader .
Kwenye hardware Nvidia na Software ni Microsoft akifuatiwa na Google.
Kwenye AI model ni Gemin akifuatia Chatgpt4.
Kwenye investment ya AI according to Altiman China imewekeza Dola bilioni 8 wakati USA zaidi ya dola billi 70.
 

Attachments

  • az_recorder_20240728_194347.jpg
    az_recorder_20240728_194347.jpg
    447.3 KB · Views: 8

Part 03 - Classifying is Not Very Different from Predicting​

Tumeona jinsi gani tunaweza unda model rahisi ya kutabiri maili ikipewa kilometa kupitia ukweli tunaoujua, training data
Aina hii ya model inajulikana kama simple predictor (mtabiri), kwasababu inachukua input fulani (kilometa) na kutabiri output yake (maili)

Assume, mda huu tuna tatizo tofauti, badala ya kufanya utabiri (prediction), tunataka kufanya uanishaji (classification)
hhjkljakhja.PNG


Grafu hapo juu linaonesha urefu na upana wa wadudu wa aina mbili, viwavi (catepillars) na bunzi /mdudu kibibi (ladybirds)
caterpillar.jpg

(Picha ya kiwavi / catepillar)
ladybugs.jpg

(picha ya bunzi / ladybird)
Viwavi ni warefu na wembamba wakati bunzi ni wafupi na wanene. Tunaweza vipi ku ainisha / classify makundi haya mawili ya wadudu kwa kutumia concepts tulizoziona wakati tunajifunza kuhusu simple predictor?

Idea ya msingi nyuma ya simple predictor ni linear relationship kati ya input na output, tunaweza tumia hii concept ya linearity kama straight line kutengenisha wadudu kulingana na makundi haya mawili (classification)

bjkjkhjaJajj.PNG


Lengo ni kupata mstari mmoja utakao tenganisha hayo makundi mawili ya wadudu, tunaweza anza na mstari wowote kwa bahati nasibu kutenganisha hayo makundi mawili ya wadudu kama tulivyoanza na huo mstari hapo juu

Slope au gradient ya huu mstari ndiyo parameter au variable tunayopaswa ku adjust (kama ilivyokua constant c kwenye simple predictor) ili kupata mstari utakao tenganisha haya makundi mawili kwa usahihi

Tunaweza bahatisha mstari mwingine

hjhjhjhjsjss.PNG


Bado, lengo ni kupata mstari utakao tenganisha haya makundi mawili ya wadudu, tujaribu tena

gvhjhjjja.PNG


Mwisho, tumepata mstari unaotenganisha kwa usahihi haya makundi mawili ya wadudu
Huu mstari tunaweza utumia kama simple classifier, kama ilivyo kwa simple predictor tunaweza tumia huu mstari ku ainisha kuwa mdudu fulani tusiyemjua (unknown bug) ni kiwavi au bunzi kulingana na wapi yupo kwenye grafu kutoka kwenye mstari huu

ghjqhfkjkaa.PNG


Kimsingi hakuna tofauti kati ya ku predict na ku classify, concept ile ile ya linearity kwenye simple predictor inaweza tumika ku classify data
Lakini swali la msingi ni, tumepata vipi huu mstari (classifier) ?


Training A Simple Classifier

Kama tulivyofanya awali, tunaanza na training dataset

gbhjhhjkK.PNG


Tuna ladybird mwenye upana wa 3.0 na urefu wa 1.0, na caterpillar mwenye upana wa 1.0 na urefu wa 3.0
Tukiweka hizi nukta kwenye grafu, tunapata hiki:

ghjhjkhdjahjkkjk.PNG


Kama mwanzo, tunatumia linear relationship iliyopo hapa kama straight line
Hii ni equation ya straight line:

equation (7).png


A na B zikiwa constant

Ili kazi yetu isiwe ngumu, tunaweza deal na aina ya mstari ambao hauvuki origin kwa kufanya B iwe 0 (B = 0)

equation (8).png


A ikiwa constant

Mathematically, A ina control muinuko (slope) ya huu mstari, thamani kubwa ya A means ni slope kubwa ya huu mstari, kama tulivyofanya mwanzo, tunaweza anza na random value ya A

Tuseme, A = 0.25

equation (9).png


Note hapa kuwa, x na y hazi represent width na length kwasababu hapa hatu try ku convert chochote, lengo letu ni ku classify
Tunaweza plot huu mstari, kwa urahisi tunaweza consider x coordinates chache ( at x = 0, na x = 1, y = 0 na y = 0.25)
Kupata hiki:

ghjhkJKKka.PNG


Bado mstari wetu haujaweza ku divide haya makundi mawili ya wadudu, kuna makosa (error)
Tunakokotoa vipi error, kama tulivyofanya kwenye case ya simple predictor?

Turudi kwenye equation ya mstari wetu:
equation (10).png



Hii equation inachosema, kwa first training example (width ikiwa 3.0 na length ikiwa 1.0) ladybiird mwenye huu upana (x = 3.0) anapaswa kuwa na urefu wa:

equation (11).png


Lakini tunajua kwa mujibu wa training example, thamani ya y inapaswa kuwa 1 ( y = 1)
Kwakua tunataka mstari utakao gawa haya makundi mawili ya wadudu, badala ya ku target y = 1, tunaweza target y = 1.1

Kwanini ? kwasababu hatutaki mstari uguse point hii (3,1) bali uitenganishe

Hivyo basi, error ni:

equation (13).png


hhghjhjkldjss.PNG


Sasa tutumie makosa tuliyopata, Error (E) kujua kwa kiasi gani tunapaswa kubadili thamani ya A
Ili tufanikiwe kwenye hili, tunapaswa kujua mahusiano yaliyopo kati ya A (slope ya classfier yetu) na E (Error)
Turudi kwenye equation yetu tena:

equation (14).png


Tuseme target value yetu (ambayo inatoka kwenye training data) iwe t, tunajua kuwa ili tuifikie t inabidi tuongeze thamani ya A kwa kiasi kidogo sana dA (d ikiwakilisha delta symbol, ikiwa na maana ya "small change in")

Kihesabu, tunaweza sema kuwa t:

equation (15).png


Tunaweza visualize graphs za hizo equations mbili kama ifuatavyo

ahjksghaja.PNG


Utakumbuka kuwa, Error (E), ni tofauti kati ya truth au target value, t na prediction, kwenye case yetu, y
Hivyo:

equation (16).png


Sasa tushapata mahusiano yaliyopo kati ya Error (E) na kwa kiasi gani tunapaswa kuongeza thamani ya A kulingana na hio Error
Tukirudi kwenye mahesabu yetu, Error (E) ilikua 0.35, na input, x ilikua 3.0
Hivyo tunapaswa kuongeza thamani ya A, kwa:

equation (17).png


Sasa, tu plug values za second point in our training dataset kwenye hii updated equation ya classifier yetu
Mda huu, x ni 1.0 na target value, y ni 3.0

Lakini kama tulivyokubaliana mwanzo, lengo ni kupata mstari utakao tenganisha haya makundi mawili ya wadudu na sio kupita kwenye hizo nukta, hivyo tunaweza chagua y = 2.9 kama target value yetu, inaweza kuwa yoyote hata y = 3.1, lengo ni huu mstari uwe juu au chini ya hizi nukta na sio kupita kati yake

Tukikokotoa Error (E) mda huu tunapata:

equation (18).png


Tumepata Error ya 2.5333, lakini mwanzo ilikua ni 0.35, kwanini hii Error ya sasa ni kubwa kuliko ya awali?
Sababu ni kwamba tumetumia point moja tu kwenye training dataset yetu ku train classifier yetu, hivyo kuna bias (upendeleo) fulani kuelekea hii point kwenye training dataset

Now, tu update tena thamani ya A kulingana na feedback tuliyopata kwenye hii Error

equation (19).png


Sasa, equation mpya ya classifier yetu ni:
equation (20).png


Now, kama x ni 1.0, y ni 2.9 (target value tuliyokua tunaitaka)
Kwa kutumia Errors tulizopata kutoka kwenye hizo training dataset, tumaweza ku train classfier yetu na ku predict target value kwa usahihi
Sasa tu visualize kile tulichokifanya tangu mwanzo:

fyghdjaghjghacshj.PNG


Kama ukichunguza mstari wa mwisho kwa makini, utagundua hatujafikia yale malengo tuliyopanga kuyafikia
Lengo lilikua ni kupata mstari utakao tenganisha hizo nukta (makundi mawili ya wadudu) kwa usahihi

Sababu ya hili tatizo (shika pia hii concept, ni muhimu sana kwenye neural network na huko tunapoelekea) ni kwamba, kwa kila update tunayofanya kwenye slope yetu A tunapoteza kile tulichojifunza hapo awali (kwenye mfano uliopita)
Approach tunayoitumia, inaweka upendeleo (introduce bias) kuelekea mfano wa sasa tunaotumia

Hii approach inafanya classifier itupe jibu sahihi kwenye mfano wa pili, na jibu lisilo sahihi kwa mfano wa kwanza, vile vile itatupa jibu sahihi kwa mfano wa tatu huku ikipoteza uwezo wake wa kutabiri target value ya mfano wa pili and so on...

Tuna solve vipi hili tatizo? Jawabu ni ku introduce concept ya Moderation au Learning rate
Badala ya kufata kile mfano wa sasa unachotwambia kwa miguu yote miwili, tunapiga hatua moja tu huku mguu mwingine ukibaki pale mfano wa kwanza ulipotwambia twende

Hii mbinu ina side effect nyingine muhimu, kama kuna makosa (noise) kwenye training dataset zetu, kitu ambacho ni cha kawaida kwenye real world data, hii mbinu kwa kiasi kikubwa inapunguza noise au errors kwenye dataset

So, tunaweza update equation yetu ya kwa kiasi gani A inapaswa kuongezeka kulingana na Error (E) na input x
equation (21).png


L, ikiwa ni Learning rate
Tuseme sasa, Learning rate iwe 0.5, ikiwa na maana kuwa, tutapiga nusu hatua kuelekea kule kila mfano utakapo twambia badala ya kwenda mazima kama tulivyofanya mwanzo

Sasa tukirudi mahesabu yetu kwa haraka haraka kupitia hii equation mpya (unaweza rudia wewe mwenyewe kwasababu hatua ni zile zile ila this time tuna zidisha na Learning rate yetu ya 0.5)

Tutapata graph ya namna hii:

ahahhah.PNG


Now, tumepata mstari mzuri zaidi ukizingatia tumetumia mifano miwli tu kwenye training dataset baada ya ku introduce Learning rate, parameter nyingine inayo guide kwa kiwango gani inapaswa kuongeza thamani ya slope kwaa kuzingatia kile tulichojifunza kwenye mifano yote iliyopita

Tunaweza tumia huu mda ku formalize kile tulichojifunza mpaka sasa:

Simple Predictor​

A simple predictor is a model that estimates a continuous output based on input features.The predictor's goal is to minimize the difference between the predicted values and the actual values.

Simple Classifier​

A simple classifier in neural networks categorizes input data into discrete classes. For example, a binary classifier predicts one of two possible classes (e.g., spam or not spam or caterpillar or ladybird). The classifier uses features from the input data to assign a class label.

Why Prediction is Not Different from Classification​

Prediction and classification in neural networks are fundamentally similar tasks with different types of outputs:

  • Prediction: Outputs a continuous value (regression). For example, predicting house prices based on features like size and location.
  • Classification: Outputs a discrete value (class label). For example, categorizing emails as spam or not spam.
In both cases, the neural network learns to map inputs to outputs by minimizing a loss or error during training.

Learning Rate or Moderation Factor​

The learning rate, also known as the moderation factor, is a hyperparameter that controls the size of the steps the neural network takes while updating its parameters

  • High Learning Rate: Leads to larger steps towards the minimum of the loss function (error). This can speed up training but risks overshooting the optimal solution, leading to unstable training.
  • Low Learning Rate: Results in smaller steps, making the training process slower but more stable and precise.

In summary:
  • Simple Predictor: Estimates continuous values.
  • Simple Classifier: Categorizes data into discrete classes.
  • Prediction vs. Classification: Both map inputs to outputs; the difference is in the output type and loss function (error).
  • Learning Rate: Controls the speed and stability of training in neural networks.
Next : Sometimes One Classifier Is Not Enough
 

Attachments

  • equation (9).png
    equation (9).png
    5.2 KB · Views: 6
  • equation (12).png
    equation (12).png
    13.9 KB · Views: 6

Part 04 - Sometimes One Classifier Is Not Enough​

Simple predictor na simple classifier zina ufanisi mkubwa kwa baadhi ya cases, pale tunapozipa input, zikafanya calculation na kutupa output
Lakini kuna baadhi ya cases hatuwezi kuzitatua kwa kutumia simple predictor au classifier

Tuzitazame training dataset ambazo zipo governed na Boolean Logic.
Tuzitazame statements mbili common kwenye Boolean Logic, AND statement na OR statement
Kwenye AND statement, statement nzima ni true (1) kama kila conditions ni true, otherwise, statement ni false (0)

Na kwenye OR statement, statement nzima ni true kama angalau one of its condition ni true, otherwise ni false

dfddhbjfvgfhfg.PNG


Kwasababu Boolean function ina inputs mbili, predictor / classifier yetu itaonekana hivi

hgjhbhahbhbjnba.PNG


Tusema kwamba, tunahitaji classifier itwambie kuwa wenda data ina obey Boolean function ipi
Mfano;

"Je, kuna malaria zaidi kama mvua inanyesha na joto ni kubwa kuliko 35 C" au
"Je, kuna malaria zaidi kama mvua inanyesha au joto ni kubwa kuliko 35 C"


Tuki plot points za AND function kwenye graph (x,y) pairs zikiwa (0,0), (0,1), (1,0) na (1,1) na kuziunganisha, huku zile point zinazo output true (1) tukizi mark kwa green circle, na zile zinazo output false (1) tukizi mark kwa red circle

Tunapata grafu ya aina hii:

Capturetyfhoiui.PNG



Ukitazama hio grafu hapo juu, tumetumia mbinu ile ya ku train classifier kwa kutumia hizo points kama training data kupata hio classifier inayotwambia aina ya Boolean function kwa ku devide points zinazotupa true na false outputs (unaweza jaribu mwenyewe kwa kutumia mbinu ile ile tuliyojifunza hapo mwanzo ili kupata huo mstari)

Point ya msingi hapa ni kwamba, inawezekana kwa simple linear classifier kujifunza boolean AND function

Tunaweza fanya hivyo hivyo kwa OR function

Captureffttyguhyuiigffyg.PNG


So, inawezekana pia kwa simple linear classifier kujifunza OR function

Lakini tutazame aina nyingine ya Boolean Logic Function, inayojulikana kama XOR au Exclusive OR
XOR statement ni true kama moja ya conditions zake ni true lakini sio zote

Capturetytyyuyuyuyyu.PNG


Tu plot graph ya hii function:
Capturerfy gguyuiyoi.PNG



Utagundua this time, hatuwezi kutumia simple linear classifier tena ku devide true na false outputs, kwa sababu points zinazotupa false outputs (red circles) zipo chini na juu ya mstari, hatuwezi kuzitenganisha

Kwa maneno mengine, simple linear predictor / classifier haiwezi kujifunza kupitia training dataset zinazo obey XOR boolean function

Tunataka Neural network iwe na uwezo wa kujifunza kupitia aina yoyote ile ya training dataset, haswa zile ambazo ni non linear (not linearly separable), tusizoweza kuzitatua kwa kuchora mstari.

Tunajifunza vipi kupitia non linear dataset? solution ni kutumia multiple classifiers, badala ya simple single classifier
Mfano, kwa XOR function hapo juu tunaweza divide hizo points kwa kutumia classifier mbili

Capturesdtfyityutueryt.PNG



Solution nyingine ni ku introduce non linearity, badala ya ku rely kwenye single straight line ku separate data
This time tutatumia multiple classifers zinazotumia shape yoyote (curve) badala ya straight line tu ku separate data
Hii ni core idea ya Neural Network, kila neurons kwenye Neural Network ni single classifier inayofanya kazi kwa pamoja na neurons nyingine ku predict au classify data points kwenye training dataset

Tunaweza tumia huu mda, ku formalize tulichojifunza mpaka sasa

A simple linear classifier cannot separate data that is not governed by a single linear process, such as data defined by the logical XOR operator. The solution to this problem is to use multiple classifiers and introduce non-linearity, allowing the use of curves or more complex shapes for data classification instead of relying on a single straight line.

Boolean Logic​

Boolean logic is a subset of algebra used for creating true/false statements. It operates on binary variables and uses logical operations such as AND, OR, and XOR to produce logical expressions.

OR
Boolean algebra is a system of logic that deals with truth values (either true or false) and operations that combine these values. It's like a mathematical system, but instead of numbers, it uses logical statements and operators.

AND, OR, and XOR​

  1. AND:
    • Operation: The AND operation outputs true (1) only if both inputs are true.
    • AND.png
  2. OR:
    • Operation: The OR operation outputs true (1) if at least one of the inputs is true.
    • OR.png
  3. XOR(Exclusive OR):
    • Operation: The XOR operation outputs true (1) if exactly one of the inputs is true.
    • XOR.png

Linearity and Non-Linearity in Neural Networks​

  1. Linearity:
    • Definition: A linear function in the context of neural networks refers to an operation where the output is directly proportional to the input. Mathematically, it can be represented as y=Ax+By = Ax + By=Ax+B.
    • Characteristics: Linear functions are limited in their ability to model complex relationships because they can only create straight-line boundaries in the input space.
  2. Non-Linearity:
    • Definition: Non-linear functions introduce complexity into the neural network by allowing it to capture intricate patterns and relationships in the data.
    • Characteristics: Non-linear functions enable the neural network to learn and represent more complex mappings from inputs to outputs. They allow the model to create curved or more complex decision boundaries, which are essential for solving problems where data cannot be separated by a straight line.
  3. Limitation of Linear Classifiers:
    • A simple linear classifier creates a decision boundary using a straight line.
    • It struggles with datasets that cannot be separated linearly, such as those influenced by the XOR logical operator, where no single straight line can accurately separate the classes.
  4. Solution - Multiple Classifiers and Non-Linearity:
    • Multiple Linear Classifiers: By combining several linear classifiers, it is possible to create a piecewise linear decision boundary that better fits the data. Each classifier handles a portion of the dataset, collectively improving classification accuracy.
    • Non-Linearity: Introducing non-linearity allows the model to learn more complex decision boundaries. Instead of a single straight line, the model can use curves or other shapes, enabling it to separate data that is not linearly separable.
Using these techniques, neural networks can handle more complex data distributions, leading to better performance on tasks where simple linear classifiers fall short.

Next : Neurons, Nature’s Computing Machines
 
Back
Top Bottom