Archive for October, 2007

Funny True

Friday, October 12th, 2007

The title of this strip is “I’m always so happy that I successfully navigated the introduction that I completely forget to pay attention to the name the other person told me.” This happens to me A LOT, especially at this time when meeting new people is a regular occurence.

Thank you, xkcd, for being insightful and funny at the same time.

Assignment #1: Done!

Monday, October 8th, 2007

Just woke up. Slept at 4am to finish the assignment in data mining class, which also took up most of my Sunday and Friday due to the need to catch up on lessons. I should pay more attention in class so I won’t need to figure out so many pages of complicated textbook text.

Anyway, while it was partly an ordeal, it also was… er, enlightening. It tackled very interesting topics! (NOTE: while true, please allow for some hint of sarcasm in the last sentence.) Cross-validation splits data into k parts, using k-1 parts as training data and the remaining k as test data, doing error rate tests k-1 times until all parts have had their share of being test data, then computing for the average error rate or accuracy or whatever. Probability theory tries to find the probability that A will occur given that B has already occured P(A|B), or that A and B both occur P(A,B) = P(A|B)P(B). Entropy measures how random data is – in context, how useful data will be if used for prediction.

Of all the topics, I enjoyed studying most about entropy. This is because I was able to make myself a nice Linux shell script, using bc as the primary calculation tool (and very powerful! I now find equation solving easy with bc) that automatically computes for the joint entropy given two attributes with two possible values.

#!/bin/bash

# find_entropy: calculate entropy for discretization
# created by Joon Guillen for the CSIT 521 data mining class.

if [ $# != 4 ]
then
echo "Incomplete or no parameters!"
echo "Syntax: find_entropy.sh a b x y"
exit
fi

bc -l <

define log2(v) {
return(l(v)/l(2));
}

define infoab(a,b) {
c=a+b;
ent_ab=(-1*(a/c)*log2(a/c))-((b/c)*log2(b/c))
return(ent_ab);
}

define infoxy(x,y) {
z=x+y;
ent_xy=(-1*(x/z)*log2(x/z))-((y/z)*log2(y/z))
return(ent_xy);
}

define finalentropy(a,b,x,y) {
total=a+b+x+y
final=(((a+b)/total)*infoab(a,b))+(((x+y)/total)*infoxy(x,y));
return(final);
}

a=$1
b=$2
x=$3
y=$4

"The value of int[a,b] is ";infoab(a,b);
"The value of int[x,y] is ";infoxy(x,y);
"The entropy is ";finalentropy(a,b,x,y);

quit
END-OF-INPUT

If it looks like I'm showing of, it's because I am so full of myself haha. What I mean is, in the pursuit of knowledge there will be times when one will feel exhiliration at a discovery. This is one of those times for me. I am allowing myself a little time to gloat, as later and for the rest of the week I will have to go back to work on MORE assignments and projects. (In particular, I was a bit disheartened to learn from my object-oriented software engineering prof that my current project proposal does not fit into the software engineering criteria. I would have to dig deeper into more advanced computer science stuff.)

Aside from the interestingness of what has been learned, I imagine these concepts to be actually useable in real life. You can actually make decisions with these! But of course, it also involves the tedious task of data collection.

National Day

Tuesday, October 2nd, 2007

Just got back from Tsim Sha Tsui. Today (Oct. 1) is National Day, and me and three classmates switched to tourist mode and watched the nice 30-minute fireworks display from the harbour. After that we headed to CPK for late dinner and conversation (but I missed cofffee). That was a fun night, and is more productive than if I had just stayed home.

Earlier than that, me and Cheryl went to the Harbour Mall (also in TST) mostly to window-shop. We ended up eating at a neat burger place, me finding a good deal on a Sigma 17-70mm f2.8-4.5, and Cheryl getting tempted to buy that nice LCD TV (I hope she does, haha). Also went to Page One to look for a recommended Java book, “Head First Java”, but not finding it. I did find “Core Java 2″, which I hear is good too. Maybe I’ll buy that soon.

Speaking of Java, I have been learning it these past few days – or at least trying to. I can say that the concept of object-oriented programming is really great, but I find that I cannot yet get myself to think in those terms. I need some practice, much more for conceptualizing programs than for the syntax. But I do find the experience enjoyable.