Let r be the partial with respect to x, and let s be the partial with respect to y, evaluated at the origin. Consider a neighborhood that restricts the partials to within ε of r and s.
Let a,b represent an arbitrary unit vector. Travel a distance of t units in this direction, from f(0,0) to f(a×t,b×t). Of course t will be small, so that we remain inside our restricted neighborhood. We will move along the x axis first, going up at a slope up r, and show that f does not curve away very much. Then, at this new location, the slope in the y direction is pretty close to s, and as we move parallel to the y axis, to our final destination, going up at a slope of nearly s, the surface doesn't curve away very much. Well that's the idea.
As you move parallel to the x axis, from x=0 to x=a×t, f(a×t,0) cannot be more than ε×a×t away from r×a×t. If it were, the mean value theorem would imply a partial that was at least ε away from r. Now move along the surface in the y direction, through a distance of b×t. Again, the change in elevation cannot be more than ε×b×t away from s×b×t, else the mean value theorem iplies a partial that is more than ε away from s. We estimated a change in f of rta+stb, and the actual change was within (a+b)tε of our estimate. Divide by t, and the error ÷ distance is less than sqrt(2)×ε. Since this holds for arbitrarily small ε, the surface approaches its tangent plane, and f is differentiable.
If f is defined by a formula - what you usually think of as a formula - its partials are all formulas, and are continuous, hence f is differentiable.
If the partials are continuous over an entire region, f is differentiable throughout this region. Furthermore, the derivative is continuous. Since the derivative is the gradient, is the partials, and the partials are continuous, the derivative is the direct product of continuous functions, and is continuous.
If ∇f = 0 throughout a region, f is constant on that region. If two points have different values, step from one to the other, coordinate by coordinate, until the elevation changes, and the mean value theorem forces a nonzero partial, hence a nonzero gradient.