DeWolff Consulting in data science

Development of applications and models to solve intelligence problems using state-of-the-art knowledge in the field of artificial intelligence and machine learning. Our solutions help business decision-making by extracting key information from data and predicting the likelihood of future events.

Services

Models and applications

We specialize in creating solutions for clients who need to extract key information from their processes. With large quantities of data available in databases, we can draw conclusions, predicts events, condense information, impute missing data, find trends, etc. in order to make smart decisions and save human resources.

Using various objectives (regression, classification, clustering, etc.) and models (neural networks, Gaussian processes, auto encoders, etc.) to respond to questions such as:

When will my system fail to anticipate and prevent the failure?
How can I minimize risks when giving out loans and credits to clients?
Where is the highest likelihood of finding a mineral for mining?
Where to focus efforts to mitigate forest fires and deforestation?
How to optimize the production of agricultural products with respect to climate?

Big Data

When we define big data, we use the three Vs: volume, velocity, and variety. In order to manage a high amount of data flow, new techniques have marked the recent past in solving parts of the problem. Data is recorded in sensors or data sources, where it is extracted, transformed, and loaded (ETL) into a data lake or warehouse (e.g. SQL, Snowflake, Databricks). The data is then processed by managing cluster systems where separate nodes work jointly and concurrently to feed a model. This can be done with software such as Hadoop (incl. Spark), Kubernetes, Azure and many others.

Computer Vision

Deep learning neural networks have made great progress in gaining higher-level understanding of images and video. The domain includes advances in image classification, object detection, and object tracking, allowing new models to interpret e.g. medical images, satellite imagery, road scenery for automatically-driven cars or to be used for facial recognition, reading text in images, and so on. It is a fast moving field with recent deep convolutional neural network (CNN) models including Yo Only Look Once (YOLO), EfficientNet + Feature Pyramid Network (FPN) using TensorFlow or PyTorch.

Prediction

A classic field in Machine Learning is classification and regression, usually for interpolation (imputation) or extrapolation (prediction) of data. Widely used toolkits include scikit-learn, a complete Python toolkit with algorithms for classification; clustering; regression; and dimensionality reduction, and other general numerical computing tools such as NumPy and SciPy. Data is loaded and handled using Pandas and can be run using Jupyter Notebooks.

We have done extensive work on a Gaussian process regression toolkit widely used by industry and academia.

Projects

AI4Manatees conservation

Analyzing and discovering manatee vocalizations from marine audio recordings in collaboration with C Minds, AI4Climate, ECOSUR, and the University of Chile with funding from Google. Using Audio Spectrogram Transformers (ASTs), a state-of-the-art machine learning model for audio, it is possible to detect manatee vocalizations and estimate their group size, allowing researchers and conservation practitioners to understand the behaviour and communication pattern of manatees better.

FairTrade deforestation detection

Detection of deforestation in Latin America and the Caribbean using remote sensing in collaboration with Inria, ECLAC, and FairTrade. Using a deep learning model for segmentation (such as UNet and Feature Pyramid Networks) it is possible to learn the characteristics of deforestation from the visible and non-visible bands of satellite imagery (Sentinel1 and Sentinel2, Copernicus).

VZOR Brain

Prediction of system failures from alerts and logs of servers and applications. Using alerts from CPU, memory, connections, etc. of critical use or failure, a classifier learns to predict if the system will break down soon and which components are responsible. We use an XGBoost classifier together with LDA to extract topics from the alert messages, SMOTE to balance the data, and LIME to interpret the results.

BancoEstado course

Together with the Center for Mathematical Modelling from the University of Chile, we have designed a machine learning course including: introduction to artificial intelligence, basic usage of scientific Python (NumPy, SciPy, Pandas, scikit-learn, PyTorch), regression, classification, optimization, clustering, support vector machines, K-nearest neighbours, cross-validation, neural networks, random forests, XGBoost, Bayesian networks and graphs. Furthermore, the course was concluded with an implementation of a cost-sale model from real bank data on credits.

Multi-output Gaussian process toolkit

Development of a Python and PyTorch toolkit for regression and classification using Gaussian processes for multi-output. The library implements from loading and manipulating the data, the initialization and training of hyperparameters, to the visualization and interpretation of the model. The toolkit has a great variety of models, likelihoods, and kernels implements, and allows high-performance training on the GPU.

GitHub repository: https://github.com/GAMES-UChile/mogptk

Partners and clients

Contact

Taco de Wolff
Founder, Data Scientist
Master in Physics (Univerity of Groningen)