A few months ago, I played with a Julia differentiable programming framework, and I thought what if I make a differentiable virtual machine and use unsorted and sorted numbers as training data. will it learn a sorting algorithm, something similar to deep Turing machine. My conclusion was I can't ....
The million dollar question is if it's possible to construct a theory of computation where the input language itself is automatically differentiable, and where the execution semantics are also so. Perhaps a continuous spatial automata that has been proven to be Turing complete.
"We extend the capabilities of neural networks by coupling them to external memory resources, which they can interact with by attentional processes. The combined system is analogous to a Turing Machine or Von Neumann architecture but is differentiable end-to-end, allowing it to be efficiently trained with gradient descent. Preliminary results demonstrate that Neural Turing Machines can infer simple algorithms such as copying, sorting, and associative recall from input and output examples." https://arxiv.org/pdf/1410.5401.pdf