First of all -- to anyone who cared/noticed sorry about missing last week! My domains expired at the end of last week I took the poor man's approach and ignored the problem until after Thursday. That being said, despite the extra time this is going to be a short post because I'm tired and uninspired[1]. Still getting into the swing of the whole job, school, wahey thing. Enough inanity, today I'm just going to run down the project strategy. I'm hoping by next week I'll have something at least marginally deeper, but lately my thoughts have been pretty mundane.

So, what am I doing over the next three and a half months to at least somewhat tackle the problem I outlined in my previous post?

The specifics of the system actually turned out being pretty simple once I sat down to look at the tools available. The as-yet-unnamed system will take in two words which will be then be tagged and tokenized and presumably have some level of meaning drawn out of them. So, something like 'black sky' or 'corresponding lump'[2] would be understood and then sent to Wordnet to come up with a number of ways the words might be interpreted and to help with the contextualization. From there the understood pair will be sent through a large corpus (presumably a Wikipedia dump) wherein a graph will be generated based on words that most frequently occur around our specific pair of words. I hope that the system will be able to in some ways learn from iterative running of the system, but when I suggest it my professor reminds me of my last project.

Regardless, the graph will be presented, er, graphically with probabilities for different frequencies hopefully being illustrated -- black sky being frequently associated with night, and sometimes with rain. I'm going to use mostly available tools with a healthy dose of Python to tie everything together.

Not a particularly interesting or complex post but hopefully by next week my creative reserves will be less dry.


  1. And apparently rhyming like a pro ↩︎

  2. THIS SITE, LADIES AND GENTLEMEN ↩︎