Introduction - If you have any usage issues, please Google them yourself
You can find that the parameters `(W, U, V)` are shared in different time steps. And the output in each time step can be**softmax**. So you can use**cross entropy** loss as an error function and use some optimizing method (e.g. gradient descent) to calculate the optimized parameters `(W, U, V)`.
Let recap the equations of our RNN: