One can think of todays successful machine learning frameworks (like Tensorflow, Torch) as an offspring of two greater relams - Scientific computing and Software engineering.
The latter helps in realizing the machine learning application by pushing towards deployment, where as the former deals
with the science of solving mathematical problems with computers. The common mediator was Directed Acyclic Graphs (DAGs).
From Software engineering point of view, DAGs help to create language independent representation of your code.
You can build a DAG in Python, save it in some intermediate representation, and restore it as a C++ program for efficient performance at run time.
You can also execute the same code on CPU, GPU or any embedded device. So it gives hardware portability as well.
From Scientific computing perspective, chains of mathematical operations done on a computer can be expressed as a DAG. Doing so, helps to compute
differentials quickly, in a manner suitable for real-time applications. Computing differentials on a DAG forms the basis for automatic differentiation.
In addtion, there were also regular improvements in the last decade to speed up mathmetical operations like dense matrix multiplication, tensor transpositions, which
form the building blocks of all mathematical equations.
Leveraging on all the above mentioned advances, machine learning frameworks are being developed.