TensorFlow Declares Decision Forests "Production Ready," Boasts of Rapid Training And High Accuracy

Designed to work best with tabular data, decision forests can significantly boost your machine learning projects.

ghalfacree
almost 3 years ago AI & Machine Learning

Google's TensorFlow team has announced that its decision forests functionality is now "production ready" — promising fast training and improved prediction performance on tabular datasets.

"Two years ago, we open sourced the experimental version of TensorFlow Decision Forests and Yggdrasil Decision Forests, a pair of libraries to train and use decision forest models such as Random Forests and Gradient Boosted Trees in TensorFlow," say Mathieu Guillame-Bert, Richard Stotz, and Luiz Gustavo Martins of their team's work. "Since then, we've added a lot of new features and improvements. Today, we are happy to announce that TensorFlow Decision Forests is production ready."

TensorFlow now includes "production ready" support for decision trees, slashing training time and boosting performance. (📷: TensorFlow)

Designed for tabular data, decision forests are best visualized as a collection of decision trees — branching flow charts which split out at each decision to cover a broad range of possibilities. For suitable datasets, decision forests promise a reduction in training time and an improvement in prediction performance — and for TensorFlow users, the functionality is now considered stable enough for production use.

"To to maximize the quality of your model you need to tune the hyper-parameters. However, this operation takes time. If you don't have the time to tune your hyper-parameters, we have a new solution for you: hyper-parameter templates," the trio explain of the new release. "Hyper-parameter templates are a set of hyper-parameters that have been discovered by testing hundreds of datasets. To use them, you simply need to set the hyperparameter_template argument. The results are almost as good as with manual hyper-parameter tuning."

The new functionality can be used in just a few lines of code, its creators say. (📷: TensorFlow)

As for performance, the TensorFlow team claims that training on datasets below a million samples is "almost instantaneous," though admits that things can take longer once you've above that limit — that is why the production-ready version comes with distributed training support, which can spread the load across any number of machines.

More information on TensorFlow Decision Forests is available on the TensorFlow blog, along with links to tutorials for those looking to experiment with the newly-stable feature.

ghalfacree

Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.

Latest Articles