I checked the combine2vec code and found a bug. In negative sample mode, I did not update the input word vector when the sample is positive. I fixed the bug, but the two objective functions still did not converge together.
I adapted word2vec training startegy in sentence branch. There may be bugs. Because too many biword count values are zero. I need to check the cooccurrence file, bigram code and sentence branch combine2vec code.
I collected gradient statistics yesterday. Interesting phenomenan:
1. If exchange "word" and "last_word" in skip-gram model in the word2vec code, the traing loss become much smaller and the training speed is almost twice higher. This is wired.
2. The gradient variation trend is different between the above two approach.
3. Ployseme may not converge well, so their gradient is large. In reality, words apperaed rearly also produc large gradient.
Read Sequence to sequence learning.
Configured PyCharm.
Checked GroundHog code. It contains enconder-decoder machine translation code. But it is more than 3000 lines. So I will implement sequence to sequence learning code first.
Today I sitll wasted a lot of time.
5:00-6:00 Browsed websites.
7:30-8:30 Gone out and ate litta pizza. Honostly, I did not need the supper.
8:30-10:00 Wasted a lot of time wandering. Checked the GroundHog code, Configured PyCharm, but never face the real problem. Just implement the sequence to sequence learning first.