In a CNN architecture we have a feature map of shape 128x128x786 (height=128, width=128, channels=786) in some intermediate layer. We pass this to a convolution layer with zero-padding of size 2 (2 additional values on each side), 786 kernels of size 5x3, and a stride of 2 (in both x and y direction). What will be the shape of the resultant feature map?