Thanks, Yingfanduan!
In general, the more users and traffic you have, the smaller the difference you should expect to see.
With low traffic, even 0.01 can trigger false alarms for SRM, so 0.0005 would definitely not be suitable for low traffic cases.
However, if the traffic is large enough, 0.0005 might be fine. I assume that Microsoft have much more traffic at hand to check to a much finer detail.
Overall, I think there is a tendency to overthink things. For me, I like to check for SRM early, where the traffic is small and the difference between groups can sometimes be large.
Even if I get an alarm for SRM, where I think it might be a false alarm, I'd monitor and expect to see the difference decrease.
I shared a chart in my follow up article that you might find useful: https://miro.medium.com/v2/resize:fit:720/format:webp/1*lau08tM0PzJ2aPlOqj1wKw.png
Full article: