Assignment #1: Done!
Just woke up. Slept at 4am to finish the assignment in data mining class, which also took up most of my Sunday and Friday due to the need to catch up on lessons. I should pay more attention in class so I won’t need to figure out so many pages of complicated textbook text.
Anyway, while it was partly an ordeal, it also was… er, enlightening. It tackled very interesting topics! (NOTE: while true, please allow for some hint of sarcasm in the last sentence.) Cross-validation splits data into k parts, using k-1 parts as training data and the remaining k as test data, doing error rate tests k-1 times until all parts have had their share of being test data, then computing for the average error rate or accuracy or whatever. Probability theory tries to find the probability that A will occur given that B has already occured P(A|B), or that A and B both occur P(A,B) = P(A|B)P(B). Entropy measures how random data is – in context, how useful data will be if used for prediction.
Of all the topics, I enjoyed studying most about entropy. This is because I was able to make myself a nice Linux shell script, using bc as the primary calculation tool (and very powerful! I now find equation solving easy with bc) that automatically computes for the joint entropy given two attributes with two possible values.
#!/bin/bash# find_entropy: calculate entropy for discretization
# created by Joon Guillen for the CSIT 521 data mining class.if [ $# != 4 ]
then
echo "Incomplete or no parameters!"
echo "Syntax: find_entropy.sh a b x y"
exit
fibc -l <
define log2(v) {
return(l(v)/l(2));
}define infoab(a,b) {
c=a+b;
ent_ab=(-1*(a/c)*log2(a/c))-((b/c)*log2(b/c))
return(ent_ab);
}define infoxy(x,y) {
z=x+y;
ent_xy=(-1*(x/z)*log2(x/z))-((y/z)*log2(y/z))
return(ent_xy);
}define finalentropy(a,b,x,y) {
total=a+b+x+y
final=(((a+b)/total)*infoab(a,b))+(((x+y)/total)*infoxy(x,y));
return(final);
}a=$1
b=$2
x=$3
y=$4"The value of int[a,b] is ";infoab(a,b);
"The value of int[x,y] is ";infoxy(x,y);
"The entropy is ";finalentropy(a,b,x,y);quit
END-OF-INPUT
If it looks like I'm showing of, it's because I am so full of myself haha. What I mean is, in the pursuit of knowledge there will be times when one will feel exhiliration at a discovery. This is one of those times for me. I am allowing myself a little time to gloat, as later and for the rest of the week I will have to go back to work on MORE assignments and projects. (In particular, I was a bit disheartened to learn from my object-oriented software engineering prof that my current project proposal does not fit into the software engineering criteria. I would have to dig deeper into more advanced computer science stuff.)
Aside from the interestingness of what has been learned, I imagine these concepts to be actually useable in real life. You can actually make decisions with these! But of course, it also involves the tedious task of data collection.
October 9th, 2007 at 12:52 pm
Yeyyyy!!! Goodluck baby! :D
October 10th, 2007 at 9:20 am
I have no idea what your code means :D