https://www.cs.utexas.edu/~eunsol/courses/data/bitter_lesson.pdf

Via an O’Reilly talk with Steve Yegge. Approachable and short essay by Richard Sutton. Even I could read and understand it.

From Wikipedia the principle says:

in the long run, approaches that scale with available computational power (such as brute-force search or statistical learning from large datasets) tend to outperform ones based on domain-specific understanding because they are better at taking advantage of Moore’s law.