Note that these are very fuzzy associations the word2vec approach is usually applied to massive corpuses, like the entirety of Wikipedia, and you tend to get much better results in those cases. So all of these words have vectors similar to land because they occur in similar contexts. ![]() So, first, I take an encoded version of the original corpus of Magic cards and I flatten and clean the input so as to make it easier to train a vector model on the words.Įnter word or sentence (EXIT to break): land V(the whole book): This is a book about important things.įortunately for us, our sampling is done over single cards, so we shouldn't run into that kind of trouble. V(a few paragraphs): A famous/infamous person/dog/inanimate object was a leader/general/CEO in a war/conflict/corporate merger. V(a single sentence): George Washington was an American president. So if you were to take a book about George Washington, sample the text, vectorize the samples, and then try to decode the meaning based on those vectors, you'll find it's harder to do as the sample size grows: Impressive! However, while this power is amazing, it gets weaker as you add more words. For example, if you combine the vectors for "french" and "river", the closest matches you get are words like "loire", "garrone", and "scheldt", all of which are names of rivers in France. That is, when you add vectors together, you get a new vector that combines the meaning of the original two. With this model, word-level semantics are at least weakly preserved through the linear combination of vectors. I mentioned that briefly in an earlier post in a very vague way. ![]() ![]() The vectors represent collections of anonymous features that describe syntactic and semantic relationships between words. The idea is that we train a feedforward neural network to map words in a text to vectors such that words that show up in similar contexts have similar vectors. For this, I am using Google's word2vec algorithm, specifically their implementation of the continuous-bag-of-words (CBOW) algorithm. Hello again! I just wanted to share with you some of the work that I've been doing on a mapping of cards to content vectors, which, as we've discussed, could prove to be very useful if we want to do art generation or more precise analytics on card dumps.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |