When my students and colleagues ask me whether a particular statistical method is appropriate, I invariably tell them to state their research question and the answer will be clear. Applying the same approach to regression models reveals the real barrier to using automatic model fitting procedures to answer the question, "Which variables are important?"
Let's back up. It is well-known that when testing whether the mean change produced by a treatment is different for two groups, it is not appropriate to evaluate the mean change for each group separately. That is, it is not appropriate to say the groups are different if the mean change in one group is statistically significant while the other is not. It may be that the mean changes are nearly identical, with the P value for one group being slightly less than 0.05 and the other slightly more than 0.05. To determine whether the mean changes for the two groups differ, the changes have to be compared directly. perhaps by using Student's t test for independent samples applied to changes for the two groups..
There's a similar problem with simplifying multiple regression models. The automatic techniques find a model that fits the data. However, the question isn't just a matter of what model fits the data, but what model is demonstrably better than all other models in terms of fit or relevance. In order to do this, automatic procedures would have to compare models to each other directly, but they don't! At least the stepwise procedures don't.
The "all possible models" approach may suffer from trying to summarize a model in a single number and it certainly overestimates the utility of the models it identifies as best, However, unlike the stepwise procedures, the "all possible models" approach gives the analyst a feel for competing models. Unlike the automatic stepwise procedures which generate a single sequence of models, the "all possible models" approach forces the analyst to come to grips with the fact that there may be many models that look quite different from each other but fit the data almost equally well. However, because the technique overstates the value of the models it identifies as best, it is still necessary for those models to be evaluated in another dataset.