well ,I have to say it‘s quite wise to trans cube into mutiple feature.Like you have y=2+3x+4x^2 +...and you choose to use y=a+3x1+4x2+....You have one feature ,but you rather more feature than cubed(three times).Also U have to use feature scaling carefully.Cause they are not really multiple features.You need to guarantee their relationship static.
let‘s talk about learning rate.We know it could work when it‘s static.Then the value of it could be crucial.Too big will not convergence.Too small will be slow.Though the video did‘t point out , I think the feature scaling should be the factor affect learning rate.Like your scaling is -1~1.you don‘t want 500 to be your learning rate.Obviously it‘s too big.The count pic of unfited rate have been showed in the video.
I do advise you not using vedio in chinese version.If you can‘t understand through listening.Then use subtitles in English.It will help you with test.
And as I‘ve warned you before ,he asked about the feature scaling .And I‘m sure you can fix that.
Another thing is about decrease your features.Like you can replace width and length with square.