Artificial Intelligence (AI) systems dealing with visual problems rely on knowledge representation and reasoning of the underlying spatial and temporal information. The representation many a time suffer from information loss due to lack of explicit relational representation of the underlying structure. The work in this thesis explores the explicitness in the direct mapping of objects and relations through diagrams for spatial problem analysis. Diagram preserves the physical structure of any visual scenario. Information perception and visualization over it are performed through combined Qualitative Spatial and Temporal Reasoning (QSTR) and Diagrammatic Reasoning (DR) techniques. The hybrid QSTR-DR framework facilitates the interpretation of smaller sub-set or unique relational information within concerned objects in visual scenarios. Hybrid QSTR-DR methodologies for video data representation and analysis toward relational abstractions among concerning objects for activity recognition are worked out. Abstracted relations in onward time are the basis for the description of short duration activities, Short-Term Activities (STAs). The sequence of STAs is considered as features toward classifications of associated long-duration activities, Long-Term Activities (LTA) in videos. LTA representation considers aggregation of all possible patterns of associated short-term activities using mealy machine prototypes, LTAMMPs. Hard-coded LTAMMPs are used as long-term activity recognizers. Further, inductive learning of LTAMMPs over temporal pati terns of associated short-term activities in known scenarios is performed. Learned LTAMMPs perform long-term activity recognition in unknown scenarios