Autonomous vehicles hold the promise of revolutionizing travel by saving time, mitigating monotonous driving tasks, and reducing accidents. Nonetheless, achieving complete autonomy remains elusive due to the challenges of handling complex dynamic environments and unforeseen edge cases. The generalization is often hindered by the perception module which still has problems with perceiving, interpreting, and forecasting the environment. In this PhD project, we will investigate how to exploit maps to improve downstream tasks such as object detection, tracking, and prediction. Maps provide semantic and geometric information that can be useful for these tasks. Additionally, they provide information beyond the sensor range and for temporally occluded regions. We will explore different types of map representations including human-interpretable (e.g. HD maps) and learned latent maps, which can be obtained by self-supervised learning or distillation from a foundational model. Furthermore, we will conduct research to perform all tasks, including the map estimation, jointly in a unified manner.