The University of Texas at Austin has announced the development of an innovative artificial intelligence model capable of generating images of city streets from audio recordings. Audio-visual clips containing sounds and images of streets from different countries were used to train the system. The training process utilized clips with static images and ambient sounds collected from YouTube videos. They covered both urban and rural streets in North America, Asia, and Europe. Using algorithms, the AI learned to match sounds to certain objects in the images. As part of the experiment, photos were taken along with 2 generated images of other streets, while playing a sound track corresponding to the first image. The task was to determine which of the 3 images corresponded to the auditioned sound. The result was an 80% accuracy in identifying the correct photos. According to the authors of the project, this study opens new perspectives for forensics, for studying the impact of the perception of sounds and images on human mental health, as well as for the development of urban design methods in populated areas