I have never used external methods to resolve this. For my models and purposes, I have always found satisfactory results by decreasing the local grid (as suggested in the FEMA manual itself).
To me, determining the optimal cell size seems to require an iterative process, optimizing computational time with the desired level of result.
In some specific cases, you can perform auxiliary analyzes using a 1D model.
In your case, where it is a large watershed, I think that locally reducing the mesh can partially help, depending on the degree of cupping you are experiencing.