Let us have a joint look into our toolbox in this blog entry. The topic provides material for more than one blog entry and we will get back to it time and again in this blog.
Table of Contents
It is always easy for a consultant if the client already has an extensive data science infrastructure. But how do I practice data science if my client does not already have an established software environment with statistic tools, databases and visualization tools? For this purpose, we use our “Data Science Survival Kit”. This is a compilation of software tools which is designed to enable us to commence our work quickly if there is not much available on the client side. It consists of tools which ideally require minimum installation effort and have no licensing issues, but which are efficient and work well together anyway.
Of course, there are the usual suspects such as Python which we will also write about – however, I want to introduce the survival kit starting with a very different point: geovisualization. Map views are extremely valuable in order to make results manageable. In addition, modern geographic information systems are not only visualization tools, but also powerful processing tools. At this point, our toolbox contains the geographic information system QGIS (see screenshot). QGIS is open source, quickly installed and very powerful.
The excellent data base integration and the possibility to expand the tool ourselves by means of Python are of particular importance to us. Simple visualizations such as postcode-based screen shots can be created quickly; however, complex and uncommon map views are also viable.
In addition to the mere geovisualization, QGIS is perfectly suitable to process geodata prior to further analysis outside of QGIS. Various geo-algorithms are available, from the shortest distances, to the detection of geometric relations (allocation of geographical points to territorial areas such as e.g. postcode areas) to various trend calculations. As QGIS also offers access to the algorithms of other open source projects such as SAGA, challenging data science tasks such as the classification of satellite images can also be implemented.
This is just an appetizer – we will report about specific examples of how QGIS can be integrated into projects in future blog entries.
Who is b.telligent?
Do you want to replace the IoT core with a multi-cloud solution and utilise the benefits of other IoT services from Azure or Amazon Web Services? Then get in touch with us and we will support you in the implementation with our expertise and the b.telligent partner network.
Neural Networks for Tabular Data: Ensemble Learning Without Trees
Neural networks are applied to just about any kind of data (images, audio, text, video, graphs, ...). Only with tabular data, tree-based ensembles like random forests and gradient boosted trees are still much more popular. If you want to replace these successful classics with neural networks, ensemble learning may still be a key idea. This blog post tells you why. It is complemented by a notebook in which you can follow the practical details.
Azure AI Search, Microsoft’s top serverless option for the retrieval part of RAG, has unique sizing, scaling, and pricing logic. While it conceals many complexities of server based solutions, it demands specific knowledge of its configurations.
Polars, the Pandas challenger written in Rust, is much faster, not only in executing the code, but also in development. Pandas has always suffered from an API that "grew historically" in many places. Polars is completely different: it ensures significantly faster development, since its API is designed to be logically consistent from the outset, carefully maintaining stringency with every release (sometimes at the expense of backwards compatibility). Polars can often easily replace Pandas: for example, in Ibis Analytics projects and, of course, for all kinds of daily data preparation tasks. Polars’ superior performance is also helpful in interactive environments like Power BI.