Are assessment errors really to blame for observers' weaknesses?

Reading Time: 7 minutes

Observer training sessions are among the most popular training courses in terms of the number of sessions conducted. In the run-up to almost all Assessment or Development Centers, training sessions are held to teach observers how to approach their role more competently. Do the psychological contents frequently conveyed in these sessions (“perception and assessment errors,” “the separation of observation and evaluation”) really help to improve assessment competence? This article explains why the final evaluation quality is often influenced by completely different issues and not by observation errors.

Observer training sessions in the run-up to ACs and DCs usually follow a classic structure. First, the observers are introduced to the organizational aspects of the process, exercise modules are explained, the use of the observation sheets is conveyed, relevant process questions are clarified, and certain standards are shared. All these aspects are extremely important and indispensable. But perception and assessment errors also play a major role and belong to the common wisdom of observer briefings. The observers should be sensitized to how fragile and flawed our perceptual system is and therefore learn about “halo effects” (meaning a particularly dominant characteristic “overshadows” other characteristics), mood and environmental effects, leniency and strictness bias, or even subjective personality theories (“people who wear glasses are more intelligent”) and stereotypes (“Northern Germans are aloof while Bavarians are homey”).

Conveying these observation errors is often quite entertaining. However, it can be difficult to derive clear messages from this about what actually constitutes good evaluation and how the evaluation process should work properly. The good intention is usually to encourage observers to be “humbler,” to make them take their role very seriously, and not to trust certain first impressions. However, the psychological perception effects are characterized precisely by the fact that they describe how our perception itself functions. Therefore it is difficult to explain how we could override these through sheer willpower and convince our perceptual system to be less “error-prone.” Leniency and strictness bias are probably the most relevant effects, however, they are not observation errors in the strict sense, but merely managers’ actual evaluation efforts that are based on their general experience and that they naturally also bring with them to a Development Center. After all, some managers are tougher while others are more generous. Some perhaps become successful through their emphatic demands, others through granting freedom and establishing an error-friendly culture. But what is actually better and which of the observers is now making an “observation error”? If Fritz evaluates Marie and tells us about her, then we simply learn at least as much about Fritz as we do about Marie! In most cases, the observers are taught as a “solution” that observing and evaluating should be two separate processes. But here we must once again ask critically: Is this really possible? “Objective” observation is only possible in cases of pure “data acquisition” (“Mr. Meier coughed seven times”). As soon as the observation is processed cognitively, it is already interpreted. During the observation we are already pushed toward interpretation and already have a feeling whether we find an observation positive, negative, or neutral. After all, we are not objective evaluation robots with technical sensors. By the way, the subjectivity of the observers, their benchmark, experience, and intuition are also the reason why you want them there. As humans, we cannot objectively observe anything in the strict sense, because our individual and subjective brains have been processing data from the very beginning!

If we want to sensitize observers to what they actually do in an evaluation process, then there are certain contents from which observers can probably benefit more. For example, the psychological understanding of the concepts of competence and potential. In any case, evaluations are more accurate when we understand how certain psychological characteristics actually “work.” The principle is the same as for wine connoisseurs. For those who know little about wine, every wine tastes either sweet or dry. The more you know about wine, the more you taste and diversify. The more you know about leadership, the more and the better you are able to assess leadership quality. Assessment quality therefore has much more to do with the content-related understanding of the constructs to be assessed than perception errors (see Paschen & Fritz: “Die Psychologie von Potenzial und Kompetenz” [The Psychology of Potential and Competence]). If we look at the process of evaluation in more detail, we realize that many of the observers’ evaluation uncertainties or evaluation inaccuracies have nothing to do with their faulty or fragile perceptual system, but are caused by completely different aspects of the evaluation process.

How do psychological diagnostics work?

In psychology, instead of the term “evaluation,” the term “diagnostics” is used much more frequently. Diagnostics first appeared in medicine and was initially used to detect symptoms. When you go to the doctor, you will first direct the doctor to obvious symptoms that you perceive as unpleasant, such as a headache or a ringing in your ears. The physician will continue to look for related symptoms, because groups of symptoms are often combined into syndromes in medicine. Often not all symptoms need to be present for an illness to be diagnosed, but your doctor could, for example, refer to his diagnostic manual to find out that of the ten symptoms that belong to a particular syndrome, at least seven need to be present, and definitely symptoms 1 to 3, in order to make a diagnosis. It could be that your physician diagnoses a “flu-like infection,” although you can never see flu-like infections but only their symptoms. Illnesses are the established factor and, in a certain sense, first of all constructs. We always see only certain symptoms in diagnostics. These symptoms are combined into specific groups and are then given the heading of a very specific construct (usually this construct is called “competence” in a company).

Competencies and their “symptoms”

Unlike in medicine, psychological diagnostics (at least in an economic context) has not agreed on a uniform diagnostic system. Instead, companies create their own diagnostic systems (which often do not differ greatly). The symptoms are frequently simply structured differently, grouped, and given a political message, creating a company-specific competence model. If, for example, the “ability to work in a team” has to be evaluated, the assessor is faced with a very specific list of symptoms (for example, enjoys working with others, constructively familiarizes himself with a group, helps others, actively integrates others into a common group process, etc.). The assessor is then asked to evaluate the existence of these symptoms, and he may come to the conclusion that perhaps four of the six symptoms mentioned are strongly pronounced (usually measured in relation to others) and that the candidate’s ability to work in a team is therefore above average.

Unfortunately there are no competence models that do not overlap

Companies often look for “non-overlapping” competencies. But this search is doomed to failure from the outset. This is also hardly possible in medicine. If you have a headache, it can be a sign of countless illnesses and only in the context of other symptoms will this one symptom become interpretable. The symptom “listens to others attentively” can therefore be seen as an indicator of empathy, communication skills, or team orientation. In medicine, too, the combination of certain symptoms into a syndrome is constantly changing and is not a self-contained and ultimately coherent system of categories. Health professionals often argue about whether certain groups of symptoms must be assigned their own disease diagnosis or, for example, are only a subcategory of another illness. If this results in different treatments, this question is highly relevant. Otherwise this is first of all about the conclusiveness and precision of a category system.

What makes observations inaccurate?

The observation forms used in Assessment Centers are the “diagnostic manual” that the observers have to work with. They are given a list of competencies and must now determine whether the symptoms are present and conclude from this whether the competence exists (analogous to an illness). Now we can claim (as is implied in many observer training sessions, as described above) that the fragility of our perceptual system is one of the main reasons why observers become uncertain or inaccurate in their evaluation. Upon closer inspection, however, we see that the actual reasons also have to do with problems that are inherent in the evaluation process and the underlying system of categories, and have nothing to do with the observers’ faulty perceptual system. There are, for example, very one-dimensional competencies with little inner aspect diversity (“apple basket competencies,” such as analytical thinking), but also multidimensional competencies (“fruit basket competencies,” with very many aspects, such as entrepreneurial mindset and approach). So there are very “low-cost competencies” (e.g., the ability to convince others, which you always see in an AC, even without having to make the competence visible somehow) and complex competencies, which we have to make visible very specifically (e.g., change management competence). Some competencies often have very unfavorable symptom lists or describe special and newly coined terms that are difficult to interpret. Some competencies can be made visible in an AC through “point measurement,” others can, strictly speaking, only be measured longitudinally (e.g., learning ability). Then there are competencies with a more attitude-based focus (e.g., performance orientation) or a more skills-oriented focus (project management competence). Only someone who has a really good understanding of what is to be measured and which particularities must be taken into account when assessing a specific competence can make a precise evaluation.

What can observer training sessions accomplish?

If you want to prepare observers for an Assessment or Development Center during a well-designed observer training course, then it must ensure that observers gain an understanding of the assessment process and that they develop a good grasp of the constructs they need to evaluate as well as of the symptoms that can or cannot be made visible in certain components. Only if you have really understood a competence can you evaluate it and then flexibly deal with certain symptoms or groups of symptoms, and sometimes detach yourself from operationalizations that did not occur in a particular AC exercise, in order to be able to use other, more appropriate indicators.

Ultimately, the main conclusion is that in most cases, evaluation inaccuracies in Assessment or Development Centers are not necessarily caused by the psychology of the observers, but by unfortunate designs. For example, in cases where exercises are not able to really evoke the desired competencies. The symptom lists contain aspects that are not readily visible. The category system of symptoms is so poorly formulated that no distinctive impressions are generated. Too many “fruit basket competencies” result in “averages” of such facets, which would actually require the status of a separate competence in order to remain precise. If Assessment or Development Center exercises are designed in such a way that they can relatively easily evoke a clear, well-structured competence, which observers have really understood, then observation and assessment errors only play a minor role.

Are assessment errors really to blame for observers’ weaknesses?

Leave a Reply Cancel reply

Latest articles

Author

Michael Paschen

Developing Leadership

Contact