Skip to main content

AI Chronicles: Deep Dives Research, and Real-World Implementations of Artificial Intelligence

June 18, 2023

Authors

Kanjun Qiu

Josh Albrecht (email)

Outline

Some highlights from our conversation

Referenced in this podcast

Below are some highlights from our conversation as well as links to the papers, people, and groups referenced in the episode.

Some highlights from our conversation

“I do think that the deeper idea of reverse engineering kernels is powerful and probably holds across architectures. The central message isn’t really like: here’s the particular theory on fully-connected networks. The central message is: let’s think about the inductive bias of architectures in kernel space directly and see if we can do our design work in kernel space instead of in parameter space.”

“At first glance, the idea of an infinite-width neural network as a useful object of study sounds insane; and why should this be a reasonable limit to take? Like, why, if we want to understand a neural network which like obviously has to be finite to do anything useful, could we hope to learn anything by just making something infinite? Like that, especially is baffling from the viewpoint of classical statistics, where you, you hope to find a parsimonious model you wanna like wield Occam’s razor like a sword. So, it seems baffling at first that this should be useful, but it turns out actually a number of like, breakthrough results in the, especially, you know, around the early part of my PhD found that some really, like non-trivial, insightful behavior emerge when you take this infinite width limit.”

“In the case of infinite width: If the neural tangent kernel only has trivial alignment, like just chance alignment with the target function of the data it won’t generalize on it. But in practice, we see very good alignment between this kernel object and then the target function.”

“A question you could ask is, why do convolutional networks do better than fully connect networks on image data? Well, it turns out their kernels have better alignment with image data.”

“Although, people have shown interestingly that if you take the neural tangent kernel of a network after training then the real neural network after training looks a lot as if it had always had its final neural contingent kernel. So like you don’t have to worry so much about the evolution over time so much as where it ended up only.”

Referenced in this podcast

Thanks to Tessa Hall for editing the podcast.