Using the decisions of the Random Forest algorithm for the purposes of forecasting pink salmon runs on north-eastern Kamchatka
https://doi.org/10.15853/2072-8212.2020.59.76-96
Abstract
Forecasting of pink salmon runs in Kamchatka uses modern powerful method of machine learning Random Forest (random forest of decision trees). Monthly data of climate indices are used as predictors. Forecasting applies iterative way of selection of the most important factors. Decision about the best model is based on the least error on test data. The algorithm of the method is written in R language.
About the Author
M. H. FeldmanRussian Federation
Leading Scientist
683000 Petropavlovsk-Kamchatsky, Naberezhnaya Str., 18
Tel.: +7 (4152) 41-27-01
References
1. Bugaev A.V., Tepnin O.B., Radchenko V.I. Climate variability and Pacific salmon productivity in Russian Far East. The researches of the aquatic biological resources of Kamchatka and of the north–west part of the Pacific Ocean, 2018, vol. 49, pp. 5–50. (In Russian with English abstract)
2. James G., Whitton D., Hastie T., Tibshirani R. An Introduction to Statistical Learning with Applications. Moscow: DMK Press, 2016, 450 p.
3. Karpenko V.I. Ranniy morskoy period zhizni tikhookenskikh lososey [Early marine period of life of Pacific Ocean salmon: monograph]. Moscow: VNIRO, 1998, 165 p.
4. Klyashtorin L.B., Lyubushin A.A. Tsiklicheskiye izmeneniya klimata i ryboproduktivnosti [Cyclical changes in climate and fish productivity]. Moscow: VNIRO, 2005, 235 p.
5. Markevich N.B., Vilenskaya N.I. Effects of the time and thermal regime of spawning on the survival and growth of juvenile pink salmon Oncorhynchus gorbusha in the spring and river main body spawning grounds of Western Kamchatka. The researches of the aquatic biological resources of Kamchatka and of the north-west part of the Pacific Ocean, 2018, vol. 49, pp. 5–50. (In Russian)
6. Radchenko V. Machine Learning Open Course. Topic 5. Compositions: bagging, random forest. 2017, Electronic blog of the Open Data Science company, access address: https://habr.com/ru/company/ods/blog/324402/
7. Feldman M.G., Shevlyakov E.A. Survival of Kamchatka pink salmon as a result of combined effect of density and environmental regulating factors. Izvestia TINRO, 2015, vol. 182, pp. 88–114. (In Russian) Feldman M.G., Shevlyakov E.A., Artukhina N.B. Evaluation of teh Pacific salmon spawning escapement parameters for the river basins of North-East Kamchatka. The researches of the aquatic biological resources of Kamchatka and of the north-west part of the Pacific Ocean, 2018, vol. 51, pp. 5–26. (In Russian) Shitikov V.K., Mastitsky S.E. Classification, regression and R-using algorithms of data mining. Electronic book, 2017, access address: https://github.com/ranalytics/data-mining
8. Shuntov V.P., Temnykh O.S. Main results of the TINROCenter research of marine period of Pacific salmon. Izvestia TINRO, 2005, vol. 141, pp. 30–55. (In Russian) Shuntov V.P., Temnykh O.S. Pacific salmon in marine and ocean ecosystems: monograph. Vladivostok: TINRO-Center, 2011, vol. 2, 473 p.
9. Surowiecki J. The Wisdom of Crowds: Why the Many are Smarter than the the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations. New York: Anchor Books, 2005. 304 p. (Translated into Russian)
10. Breiman L. Bagging Predictors. Machine Learning, 1996a, № 24, pp. 123–140.
11. Breiman L. Out-of-bag estimation. Technical report, Dept. of Statistics, Univ. of Calif., Berkeley, 1996b. Electronic source, access address: https://www.stat.berkeley.edu/~breiman/OOBestimation.pdf.
12. Breiman L., Friedman J.H., Olshen R.A., Stone C.J. Classification and regression trees. Wadsworth International Group, Belmont CA, 1984, 368 p.
13. Delgado F.M., Cernadas E., Barro S., Amorim D. Do we need hundreds of сlassifiers to solve real world classification problems? J. of Machine Learning Research, 2014, № 15, pp. 3133–3181.
14. Efron B. Bootstrap Methods: Another Look at the Jackknife. Annals of Statistics, 1979, vol. 7. № 1, pp. 1–26. Galton F. Vox populi. Nature, 1907, № 75, pp. 450– 451.
15. Haeseker S., Dorner B., Peterman R., Su Z. An improved sibling model for forecasting Chum Salmon and Sockeye Salmon abundance. North American Journal of Fisheries Management, 2007, № 27, pp. 634–642. Hare S.R. Low frequency climate variability and salmon production. Ph.D. Dissertation. Univ. of Washington, Seattle, WA, 1996, 306 p.
16. Hare S.R., Francis R.C. In Climate Change and Northern Fish Populations. Can. Spec. Publ. Fish. Aquat. Sci., 1995, vol. 121, pp. 357–372.
17. Ho T.K. Random Decision Forests. Proceedings of the 3rd International Conference on Document Analysis and Recognition. Montreal, QC, 14–16 August, 1995, pp. 278–282.
18. Kleinberg E. Stochastic Discrimination. Annals of Mathematics and Artificial Intelligence, 1990, vol. 1 (1–4), pp. 207–239.
19. Kleinberg E. An Overtraining-Resistant Stochastic Modeling Method for Pattern Recognition. Annals of Statistics, 1996, vol. 24 (6), pp. 2319–2349.
20. Kursa M. Boruta for those in a hurry. 2020. Electronic source, access address: https://cran.r-project.org/web/packages/Boruta/vignettes/inahurry.pdf.
21. Kursa M., Rudnicki W. Feature Selection with the Boruta Package. J. of Statistical Software, 2010, vol. 36 (11), pp. 2–12.
22. Linkin M.E., Nigam S. The North Pacific Oscillation – West Pacific Teleconnection Pattern: Mature-Phase Structure and Winter Impacts. J. Climate, 2008, vol. 21, № 9, pp. 1979–1997.
23. Mantua N., Hare S., Zhang Y., Wallace J., Francis R.A. Pacific interdecadal climate oscillation with impacts on salmon production. Bull. Amer. Meteor. Soc., 1997, № 78, pp. 1069–1079.
24. Mantua N.J., Hare S.R. The Pacific Decadal Oscillation. J. of Oceanography, 2002, vol. 58, pp. 35–44. Paluszyńska A. Understanding random forests with randomForestExplainer. 2021. Electronic source, access address: https://cran.rstudio.com/web/packages/randomForestExplainer/vignettes/randomForestExplainer.html.
25. Peterman R.M. Model of salmon age structure and its use in preseason forecasting and studies of marine survival. Canadian Journal of Fisheries and Aquatic Sciences, 1982, № 39, pp. 1444–1452.
26. Quinlan J.R. Induction of Decision Trees. Machine Learning. Kluwer Academic Publishers, 1986, № 1, pp. 81–106.
27. Ricker W.E. Stock and Recruitment. J. of the Fisheries Research Board of Canada, 1954, vol. 11, № 5, pp. 559–623.
28. Thompson D., Wallace J. The Arctic Oscillation signature in the wintertime geopotential height and temperature fields. Geophys. Res. Lett., 1998, vol. 25, № 9, pp. 1297–1300.
Review
For citations:
Feldman M.H. Using the decisions of the Random Forest algorithm for the purposes of forecasting pink salmon runs on north-eastern Kamchatka. The researches of the aquatic biological resources of Kamchatka and the North-West Part of the Pacific Ocean. 2020;(59):76-96. (In Russ.) https://doi.org/10.15853/2072-8212.2020.59.76-96