Categorical variables can be used as original texts in SPSS, which results in a substantial loss of performance in the case of large amounts of data, or as numerical codes with labels. The second way is not only drastically more performant but also the right way because although it makes the code in the SPSS syntax more difficult to read, it also makes it absolutely immune to changes in notation.
Table of Contents
In SPSS, it is preconfigured whether the numerical codes, the labels or both are displayed as label in the result outputs e.g. the FREQUENCIES command. Everything has pros and cons...
The labels by themselves are best if the output is embedded into a document as complete table.
Codes with labels simultaneously facilitate the explorative data analysis and the development of the syntax as, on the one hand, one can directly extract the codes e.g. for filter conditions but, on the other hand, immediately sees the meaning next to it, as well. However, if one copies the result e.g. into Excel for further work steps, both is combined in one cell and can only be separated manually using formulas.
Thus, the codes by themselves are best for further processing, but otherwise this format is not really suitable for anything.
Workaround in 'Options'
One can switch between the different formats in the options. Under Edit ->Options->Output, there is the field 'structure labelling' on the left. Here, one can switch between labels, values/names and both via pull-downs for the variable names and the variable values.
Best Practice Using Syntax
It is rather laborious to call up this menu item each time in order to change the settings as required. It is easier to use option commands directly in the syntax.
*** Bei Werten:
*** Wechsel auf "nur Codes":
SET TNUMBER VALUES.
*** Wechsel auf "nur Beschriftungen":
SET TNUMBER LABELS.
*** Wechsel auf beides:
SET TNUMBER BOTH.
*** Bei Variablen
*** Wechsel auf "nur Spaltenname":
SET TVAR NAMES.
*** Wechsel auf "nur Beschriftungen":
SET TVAR LABELS.
*** Wechsel auf beides:
SET TVAR BOTH.
Thus, one can quickly switch between two notations for one single output in a current syntax:
FREQ spalteA spalteB spalteC.
SET TNUMBERS BOTH.
FREQ spalte_special.
SET TNUMBERS CODE.
FREQ spalteD spalteE spalteF.
Example
Here is a specific example for automobile brands. The column 'brand' in the dataset contains automobile brands as numerical codes with labels.
SET TNUMBERS VALUES.
FREQU marke.
SET TNUMBERS BOTH.
FREQU marke.
SET TNUMBERS LABELS.
FREQ marke.
The code executed above leads to the following three alternative output formats:
Who is b.telligent?
Do you want to replace the IoT core with a multi-cloud solution and utilise the benefits of other IoT services from Azure or Amazon Web Services? Then get in touch with us and we will support you in the implementation with our expertise and the b.telligent partner network.
Neural Networks for Tabular Data: Ensemble Learning Without Trees
Neural networks are applied to just about any kind of data (images, audio, text, video, graphs, ...). Only with tabular data, tree-based ensembles like random forests and gradient boosted trees are still much more popular. If you want to replace these successful classics with neural networks, ensemble learning may still be a key idea. This blog post tells you why. It is complemented by a notebook in which you can follow the practical details.
Azure AI Search, Microsoft’s top serverless option for the retrieval part of RAG, has unique sizing, scaling, and pricing logic. While it conceals many complexities of server based solutions, it demands specific knowledge of its configurations.
Polars, the Pandas challenger written in Rust, is much faster, not only in executing the code, but also in development. Pandas has always suffered from an API that "grew historically" in many places. Polars is completely different: it ensures significantly faster development, since its API is designed to be logically consistent from the outset, carefully maintaining stringency with every release (sometimes at the expense of backwards compatibility). Polars can often easily replace Pandas: for example, in Ibis Analytics projects and, of course, for all kinds of daily data preparation tasks. Polars’ superior performance is also helpful in interactive environments like Power BI.