Categorical variables can be used as original texts in SPSS, which results in a substantial loss of performance in the case of large amounts of data, or as numerical codes with labels. The second way is not only drastically more performant but also the right way because although it makes the code in the SPSS syntax more difficult to read, it also makes it absolutely immune to changes in notation.
Inhaltsverzeichnis
In SPSS, it is preconfigured whether the numerical codes, the labels or both are displayed as label in the result outputs e.g. the FREQUENCIES command. Everything has pros and cons...
The labels by themselves are best if the output is embedded into a document as complete table.
Codes with labels simultaneously facilitate the explorative data analysis and the development of the syntax as, on the one hand, one can directly extract the codes e.g. for filter conditions but, on the other hand, immediately sees the meaning next to it, as well. However, if one copies the result e.g. into Excel for further work steps, both is combined in one cell and can only be separated manually using formulas.
Thus, the codes by themselves are best for further processing, but otherwise this format is not really suitable for anything.
Workaround in 'Options'
One can switch between the different formats in the options. Under Edit ->Options->Output, there is the field 'structure labelling' on the left. Here, one can switch between labels, values/names and both via pull-downs for the variable names and the variable values.
Best Practice Using Syntax
It is rather laborious to call up this menu item each time in order to change the settings as required. It is easier to use option commands directly in the syntax.
*** Bei Werten:
*** Wechsel auf "nur Codes":
SET TNUMBER VALUES.
*** Wechsel auf "nur Beschriftungen":
SET TNUMBER LABELS.
*** Wechsel auf beides:
SET TNUMBER BOTH.
*** Bei Variablen
*** Wechsel auf "nur Spaltenname":
SET TVAR NAMES.
*** Wechsel auf "nur Beschriftungen":
SET TVAR LABELS.
*** Wechsel auf beides:
SET TVAR BOTH.
Thus, one can quickly switch between two notations for one single output in a current syntax:
FREQ spalteA spalteB spalteC.
SET TNUMBERS BOTH.
FREQ spalte_special.
SET TNUMBERS CODE.
FREQ spalteD spalteE spalteF.
Example
Here is a specific example for automobile brands. The column 'brand' in the dataset contains automobile brands as numerical codes with labels.
SET TNUMBERS VALUES.
FREQU marke.
SET TNUMBERS BOTH.
FREQU marke.
SET TNUMBERS LABELS.
FREQ marke.
The code executed above leads to the following three alternative output formats:
Let’s Unlock the Full Potential of Your Data – Together!
Looking to become more data-driven, optimize processes, or leverage cutting-edge technologies? Our blog provides valuable insights – but the best way to tackle your specific challenges is through a direct conversation.
Let’s talk – our experts are just one click away!
Want To Learn More? Contact Us!
Your contact person
Dr. Sebastian Petry
Domain Lead Data Science & AI
Who is b.telligent?
Do you want to replace the IoT core with a multi-cloud solution and utilise the benefits of other IoT services from Azure or Amazon Web Services? Then get in touch with us and we will support you in the implementation with our expertise and the b.telligent partner network.
With Snowflake Document AI, information can be easily extracted from documents, such as invoices or handwritten documents, within the data platform. Document AI is straightforward and easy to use: either via a graphical user interface, via code in a pipeline or integrated into a Streamlit application. In this article, we explain the feature, describe how the integration into the platform works and present interesting application possibilities.
Neural Networks for Tabular Data: Ensemble Learning Without Trees
Neural networks are applied to just about any kind of data (images, audio, text, video, graphs, ...). Only with tabular data, tree-based ensembles like random forests and gradient boosted trees are still much more popular. If you want to replace these successful classics with neural networks, ensemble learning may still be a key idea. This blog post tells you why. It is complemented by a notebook in which you can follow the practical details.
Azure AI Search, Microsoft’s top serverless option for the retrieval part of RAG, has unique sizing, scaling, and pricing logic. While it conceals many complexities of server based solutions, it demands specific knowledge of its configurations.