The perception of emotional expressions allows animals to evaluate the social intentions and motivations of each other. This usually takes place within species; however, in the case of domestic dogs, it might be advantageous to recognize the emotions of humans as well as other dogs. In this sense, the combination of visual and auditory cues to categorize others’ emotions facilitates the information processing and indicates high-level cognitive representations. Using a cross-modal preferential looking paradigm, we presented dogs with either human or dog faces with different emotional valences (happy/playful versus angry/aggressive) paired with a single vocalization from the same individual with either a positive or negative valence or Brownian noise. Dogs looked significantly longer at the face whose expression was congruent to the valence of vocalization, for both conspecifics and heterospecifics, an ability previously known only in humans. These results demonstrate that dogs can extract and integrate bimodal sensory emotional information, and discriminate between positive and negative emotions from both humans and dogs.