Abstract: Remote sensing image scene classification consists of classifying images of the Earth surface into scene categories that represent different semantic ones based on the ground objects and their spatial arrangement. Finding the objects within a scene is not trivial, because they can appear in different sizes and mutual positions. An open issue in scene classification with CNNs is understating if the network prediction relies on the clues that human Earth Observation experts consider. A suitable approach for investigating the inference process of neural models relies on Class Activation Maps, which emphasize the areas of an image contributing the most to the classification. This work evaluates CAMs for different CNNs methods, in terms of their capacity to identify the objects that determine the classification of scenes for the illegal landfill detection. Quantitative and qualitative analyses show that ECA-Net has consistent performance across all metrics, resulting the most promising approach to obtain CNNs that focus on the most relevant points with the higher IoU. The illustrated analysis is a step towards the computer-aided study of the variations of scene elements positioning and spatial relations that constitute hints of the presence of illegal waste dumps and opens the way to the application of weakly supervised techniques for training detectors of illegal landfills in large scale remote sensing image repositories.