Unlocking TPU performance: Deep kernel profiling with XProf blog.google