Multimodal Models: Fusing Multiple Inputs for Advanced AI