The invention discloses a multisource semantic analysis based
information retrieval method. The method comprises the steps that document acquisition and preprocessing are performed; document modeling is performed by utilizing an LDA model, and a
reverse index is established; obtaining and preprocessing of user's initial query are performed; multi-
dimensional analysis is performed according to the judgment whether queried lexical items are professional medical vocabularies or not,
lexical item weighting and query extension are performed based on
WordNet and UMLS Metathesaurus; the similarity between a queried extended word set and documents undergoing
dimensionality reduction of LDA is calculated,
ranking is performed according to progressively decreasing similarity, and the documents which are not lower than a preset threshold value are extracted and returned to a user. The multisource semantic analysis based
information retrieval method integrates the characteristics of the
WordNet and the UMLS Metathesaurus, conducts multi-
dimensional analysis, weighting and extension on the initial query, can make the user's query intention more accurately understood, utilizes the LDA model to perform document modeling, analyzes the
document representation capacity of lexical items at hidden theme level and improves the
document retrieval performance for the user.