运筹与管理 ›› 2021, Vol. 30 ›› Issue (11): 168-175.DOI: 10.12005/orms.2021.0366

• 应用研究 • 上一篇    下一篇

基于网络搜索数据和机器学习的票房预测模型

李培志1, 董清利2   

  1. 1.东北财经大学 金融学院,辽宁 大连 116025;
    2.大连理工大学 经济管理学院,辽宁 大连 116024
  • 收稿日期:2019-12-18 出版日期:2021-11-25
  • 作者简介:李培志(1992-),男,河北邯郸人,讲师,博士,研究方向:应用统计和机器学习;董清利(1991-),男,河南开封人,讲师,博士,研究方向:模型优化及预测。
  • 基金资助:
    国家社科基金资助项目(21CTJ012,21CTJ011);辽宁省社科规划基金资助项目(L19CTJ001,L20CTI001);辽宁省教育厅高等学校基本科研项目(LJKZ1037);中央高校基本科研业务(DUT19RC(3)042)

Box Office Prediction Model Based on Web Search Data and Machine Learning

LI Pei-zhi1, DONG Qing-li2   

  1. 1. School of Finance, Dongbei University of Finance and Economics, Dalian 116025, China;
    2. School of Economics and Management, Dalian University of Technology, Dalian 116024, China
  • Received:2019-12-18 Online:2021-11-25

摘要: 电影票房预测对于管理部门一直是一项重要而复杂的工作。电影票房相关变量复杂多变,且数据获取难度较大是制约当前研究的主要因素。相比之下,网络搜索数据是互联网公司发布的用于记录网民搜索行为的结构化数据,能客观及时反映事物的发展趋势。本研究建立了基于网络搜索数据的混合预测模型。首先,匹配与测试集最相似的训练数据构建最优训练集(OTS)。其次,应用帝国竞争算法(ICA)选择最小二乘支持向量机(LSSVM)的最佳参数组合。最后,使用优化模型进行预测。为了测试模型的效果,使用中国大陆上映的电影票房数据进行模拟实验。结果表明混合模型具有更高的预测精度。本研究所构建的模型适用于中国电影业的票房预测,可为有关部门提供决策参考。

关键词: 网络搜索数据, 机器学习, 票房预测

Abstract: Movie box office prediction has always been an important and complex task for the relevant departments of industry management. Film-related variables are complex and difficult to choose objectively, and the difficulty of data acquisition restricts the main factors of this kind of models. In contrast, web search data is structured data released by Internet companies to record the search behavior of netizens. It has clear meaning, easy-access, and can reflect the development trend of things objectively and timely. Based on the web search data, a hybrid prediction model including data selection method and machine learning algorithm is established in this study. Firstly, the optimal training set is constructed by selecting the training data with the greatest similarity to the test set. Secondly, Imperialist Competition Algorithms is applied to select the best combination of parameters of Least Squares Support Vector Machine. Finally, the optimized model is used for prediction. In order to test the stability and applicability, an empirical study is carried out using the box office data of the films released in mainland China in 2017, which shows that the proposed hybrid model has higher prediction accuracy. The built model is suitable for box office prediction and can provide decision-making reference for industry management departments.

Key words: web search data, machine learning, box office prediction

中图分类号: