The invention discloses a scene text recognition method based on man-machine cooperation, and the method comprises the following steps: S1, carrying out preliminary processing of an existing scene text data set, and selecting a pre-training data set, a training set and a test set from the scene text data set; S2, training an SEE network by using the pre-training data set to obtain a pre-training model; S3, predicting the unmarked training set by adopting a pre-training model, and dividing the unmarked training set into a Hard sample and an Easy sample according to the degree of confidence of aprediction label generated by the model for the unmarked training set; performing manual annotation on the Hard sample, performing pseudo annotation on the Easy sample by using a model, and then performing fine tuning on the scene text recognition model by using the annotated sample; and S4, repeating the step S3 until the performance of the model meets the expected requirements. According to themethod, the marking cost of the scene text data set can be reduced, and the character recognition model performance can be improved.