n-Step Q-sigma Off-Policy based Reinforcement Learning for balancing a Pendulum

Github link

To be written

‘Learning Output’