#CatBoost

1 posts loaded — scroll for more

Text
russianseo
russianseo

Yandex released a new machine learning library in public access

Yandex has developed a new method of machine learning called CatBoost. It allows you to efficiently train models in line with heterogeneous data - such as user location, operation history and device type. The library of computer training CatBoost is released in public access, it can be used by all comers.
To work with CatBoost, it’s enough to install it on your computer. The library supports Linux, Windows and macOS operating systems and is available in Python and R programming languages. Yandex has also developed a CatBoost Viewer visualization program that allows you to monitor the learning process on the charts. You can download CatBoost and CatBoost Viewer on GitHub.

CatBoost is the heir of the Matrixnet machine learning method, which is used in almost all Yandex services. Like Matrixnet, CatBoost employs the mechanism of gradient boosting: it is well suited for working with heterogeneous data. But whereas Matrixnet teaches models on numerical data, CatBoost also takes into account the non-numerical ones, for example cloud types or types of buildings. Previously, such data had to be translated into the language of figures, which could change their essence and affect the accuracy of the model. Now they can be used in their original form. Thanks to this, CatBoost shows a higher quality of training than similar methods for working with heterogeneous data. It can be used in a variety of areas - from the banking sector to industrial needs.

Mikhail Bilenko, head of the of Yandex machine intelligence and research department:

Yandex has been engaged in machine training for many years, and CatBoost was created by the best specialists in this field. By releasing the library CatBoost in open access, we want to contribute to the development of machine learning. I must say that CatBoost is the first Russian method of machine learning, which became available in the open source. We hope that the community of experts will appreciate it and will help to do even better.

The new method has already been tested on Yandex services. As part of the experiment, it was used to improve search results, to rank the Yandex.Den recommendations tape and to calculate the weather forecast in Meteum technology - and in all cases proved to be better than Matrixnet. In the future, CatBoost will work on other services. It is also used by the Yandex Data Factory team - for their solutions for the industry, in particular for optimizing raw material consumption and predicting defects.

In addition, CatBoost was implemented by the European Center for Nuclear Research (CERN): they are using it to combine data obtained from different parts of the LHCb detector.