That's true as far as it goes. But as the population increases it becomes more obvious which data points are unduely influencing the overall results. Eliminating these may, or may not, be appropriate. For example when looking at aviation incidents in California ...
- Salton Sea (Death Valley) turns out to be the unsafest airport when incidents are matched to operations. But there's been all of one incident in the last 20 years.
- Equivalently San Francisco International turns out to be the safest because of the immense number of operations compared to incidents. But there's next to no general aviation who are the ones who seem to have the death wish.
Are those two examples outliers or part of the distribution ? I, as a statistician by training, would eliminate Salton Sea because of the small dataset of incidents and operations and bitch about the excessive influence San Francisco (and LAX) have on the overall distribution model.