Basic provisions of testing. Theoretical foundations of testing

The applications, goals, and objectives of software testing are varied, so testing is evaluated and explained in different ways. Sometimes it is difficult for testers themselves to explain what “as is” software testing is. Confusion ensues.

To untangle this confusion, Alexey Barantsev (practitioner, trainer and consultant in software testing; native of the Institute of System Programming Russian Academy Sciences) precedes its testing trainings with an introductory video about the main provisions of testing.

It seems to me that in this report the lecturer was able to most adequately and balancedly explain “what testing is” from the point of view of a scientist and programmer. It’s strange that this text has not yet appeared on Habré.

I give here a condensed retelling of this report. At the end of the text there are links to full version, as well as the mentioned video.

Testing Basics

Dear colleagues,

First, let's try to understand what testing is NOT.

Testing is not development,

Even if testers know how to program, including tests (automation testing = programming), they can develop some auxiliary programs (for themselves).

However, testing is not a development activity software.

Testing is not analysis,

And not the activity of collecting and analyzing requirements.

Although, during the testing process, sometimes you have to clarify the requirements, and sometimes you have to analyze them. But this activity is not the main one; rather, it has to be done simply out of necessity.

Testing is not management,

Despite the fact that in many organizations there is such a role as “test manager”. Of course, testers need to be managed. But testing in itself is not management.

Testing is not technical writing,

However, testers have to document their tests and their work.

Testing cannot be considered one of these activities simply because during the development process (or analyzing requirements, or writing documentation for their tests), testers do all this work for myself, and not for someone else.

An activity is significant only when it is in demand, that is, testers must produce something “for export.” What do they do “for export”?

Defects, defect descriptions, or test reports? This is partly true.

But this is not the whole truth.

Main activities of testers

is that they provide participants in a software development project with negative feedback about the quality of the software product.

“Negative feedback” does not have any negative connotation, and does not mean that the testers are doing something bad, or that they are doing something bad. It's just a technical term that means a fairly simple thing.

But this thing is very significant, and probably the single most significant component of the activities of testers.

There is a science - “systems theory”. It defines the concept of “feedback”.

“Feedback” is some data that goes back to the input from the output, or some part of the data that goes back to the input from the output. This feedback can be positive or negative.

Both varieties feedback are equally important.

In software systems development, positive feedback is, of course, some kind of information we receive from end users. These are requests for some new functionality, this is an increase in sales (if we release a quality product).

Negative feedback can also come from end users in the form of some negative reviews. Or it can come from testers.

The sooner negative feedback is provided, the less energy is needed to modify that signal. That is why testing needs to start as early as possible, at the most early stages project, and provide this feedback both at the design stage and, perhaps, earlier, at the stage of collecting and analyzing requirements.

By the way, this is where the understanding grows that testers are not responsible for quality. They help those who are responsible for it.

Synonyms for the term "testing"

From the point of view that testing is the provision of negative feedback, the world-famous abbreviation QA (Quality Assurance) is definitely NOT synonymous with the term “testing”.

Merely providing negative feedback cannot be considered quality assurance, because Assurance is some positive measures. It is understood that in this case we ensure quality and take timely measures to ensure that the quality of software development improves.

But “quality control” - Quality Control, can be considered in a broad sense synonymous with the term “testing”, because quality control is the provision of feedback in its most varied varieties, on the most different stages software project.

Sometimes testing is meant as some separate form of quality control.

The confusion comes from the history of testing development. At different times, the term “testing” meant various actions that can be divided into 2 large classes: external and internal.

External definitions

The definitions that Myers, Beiser, and Kaner gave at different times describe testing precisely from the point of view of its EXTERNAL significance. That is, from their point of view, testing is an activity that is intended FOR something, and does not consist of something. All three of these definitions can be summarized as providing negative feedback.

Internal Definitions

These are definitions that are contained in a standard for terminology used in software engineering, such as a de facto standard called SWEBOK.

Such definitions constructively explain WHAT the testing activity is, but do not give the slightest idea of WHY testing is needed, for which all the results obtained from checking the correspondence between the actual behavior of the program and its expected behavior will then be used.

testing is

checking program compliance with requirements,
carried out by observing its work
in special, artificially created situations, chosen in a certain way.

From here on we will consider this to be the working definition of “testing”.

The general testing scheme is approximately as follows:

The tester receives the program and/or requirements at the entrance.
He does something with them, observes the work of the program in certain situations artificially created by him.
At the output, it receives information about matches and non-matches.
This information is then used to improve the existing program. Or in order to change the requirements for a program that is still being developed.

What is a test

This is a special, artificially created situation, chosen in a certain way,
and a description of what observations to make about the program's operation
to check whether it meets some requirement.

There is no need to assume that the situation is something momentary. The test can be quite long, for example, when testing performance, this artificially created situation can be a load on the system that continues for quite a long time. And the observations that need to be made are a set various graphs or metrics that we measure while running this test.

The test developer is engaged in selecting a limited set from a huge, potentially infinite set of tests.

Well, thus we can conclude that the tester does two things during testing.

1.Firstly, it controls the execution of the program and creates these very artificial situations in which we are going to check the behavior of the program.

2.And, secondly, he observes the behavior of the program and compares what he sees with what is expected.

If a tester automates tests, then he does not himself observe the behavior of the program - he delegates this task special tool or a special program that he himself wrote. It is she who observes, she compares the observed behavior with the expected one, and gives the tester only some final result - whether the observed behavior coincides with the expected one or does not coincide.

Any program is a mechanism for processing information. The input is information in one form, the output is information in some other form. At the same time, a program can have many inputs and outputs, they can be different, that is, a program can have several different interfaces, and these interfaces can have different types:

User Interface (UI)
Application Programming Interface (API)
Network protocol
File system
Environment state
Events

The most common interfaces are

custom,
graphic,
text,
cantilevered,
and speech.

Using all these interfaces, the tester:

somehow creates artificial situations,
and checks how the program behaves in these situations.

This is testing.

Other classifications of testing types

The most commonly used division into three levels is

unit testing,
integration testing,
system testing.

Unit testing usually means testing at a fairly low level, that is, testing individual operations, methods, and functions.

System testing refers to testing at the user interface level.

Some other terms are sometimes used, such as "component testing", but I prefer to highlight these three, due to the fact that the technological division between unit and system testing does not make much sense. The same tools and the same techniques can be used at different levels. The division is conditional.

Practice shows that tools that are positioned by the manufacturer as unit testing tools can be used with equal success at the level of testing the entire application as a whole.

And tools that test the entire application at the user interface level sometimes want to look, for example, into the database or call some separate stored procedure there.

That is, the division into system and unit testing is generally speaking purely conditional, speaking from a technical point of view.

The same tools are used, and this is normal, the same techniques are used, at each level we can talk about testing of a different type.

We combine:

That is, we can talk about unit testing of functionality.

We can talk about system testing of functionality.

We can talk about unit testing, for example, efficiency.

We can talk about system effectiveness testing.

Either we consider the effectiveness of a single algorithm, or we consider the effectiveness of the entire system as a whole. That is, the technological division into unit and system testing does not make much sense. Because the same tools, the same techniques can be used at different levels.

Finally, during integration testing we check if, within a system, modules interact with each other correctly. That is, we actually perform the same tests as during system testing, only we additionally pay attention to how exactly the modules interact with each other. We perform some additional checks. That's the only difference.

Let us once again try to understand the difference between system and unit testing. Since this division occurs quite often, this difference should exist.

And this difference manifests itself when we perform not a technological classification, but a classification by purpose testing.

Classification by goals can be conveniently done using the “magic square”, which was originally invented by Brian Marik and then improved by Ari Tennen.

In this magic square, all types of testing are located in four quadrants, depending on what the tests pay more attention to.

Vertically - the higher the testing type is located, the more attention is paid to certain external manifestations behavior of the program, the lower it is, the more attention we pay to its internal technological structure of the program.

Horizontally - the further to the left our tests are, the more attention we pay to their programming, the further to the right they are, the more attention we pay to manual testing and human research of the program.

In particular, terms such as acceptance testing, Acceptance Testing, and unit testing can easily be entered into this square in the sense in which it is most often used in the literature. This is low-level testing with a large, overwhelming share of programming. That is, all tests are programmed, carried out completely automatically, and attention is paid primarily to the internal structure of the program, precisely to its technological features.

In the upper right corner we will have manual tests aimed at some external behavior of the program, in particular, usability testing, and in the lower right corner we will most likely have tests of various non-functional properties: performance, security, and so on.

So, based on the classification by purpose, unit testing is in the lower left quadrant, and all other quadrants are system testing.

Thank you for your attention.

REPORT

student 137 gr. Ivanova I.

on testing the effectiveness of training methods
using methods mathematical statistics

Sections of the report are drawn up in accordance with the samples given in this manual at the end of each stage of the game. The completed reports are stored at the Department of Biomechanics until consultation before the exam. Students who have not reported for the work done and have not submitted a notebook with a report to the teacher are not allowed to take the sports metrology exam.

Stage I business games
Control and measurement in sports

Target:

1. Familiarize yourself with the theoretical foundations of control and measurement in sports and physical education.

2. Acquire skills in measuring speed performance indicators in athletes.

1. Physical control
education and sports

Physical education and sports training is not a spontaneous, but a controlled process. At each moment of time, a person is in a certain physical state, which is determined mainly by health (compliance of vital signs with the norm, the degree of resistance of the body to adverse sudden influences), physique and the state of physical functions.

It is advisable to manage a person’s physical condition by changing it in the right direction. This management is carried out by means of physical education and sports, which, in particular, include physical exercises.

It just seems like the teacher (or coach) is in control. physical condition, influencing the athlete’s behavior, i.e. offering certain physical exercises, as well as monitoring the correctness of their implementation and the results obtained. In reality, the athlete’s behavior is controlled not by the coach, but by the athlete himself. During sports training, the self-governing system (the human body) is influenced. Individual differences in the condition of athletes, they do not provide confidence that the same impact will cause the same response. Therefore, the question of feedback is relevant: information about the athlete’s condition received by the coach during control of the training process.

Control in physical education and sports is based on measuring indicators, selecting the most significant ones and their mathematical processing.

Management of the educational and training process includes three stages:

1) collection of information;

2) its analysis;

3) decision making (planning).

Information collection is usually carried out during comprehensive control, the objects of which are:

1) competitive activity;

2) training loads;

3) the athlete’s condition.

There are (V.A. Zaporozhanov) three types of athlete’s states depending on the duration of the interval required for the transition from one state to another.

1. Staged(permanent) condition. Saved relatively long – weeks or months. A complex characteristic of an athlete’s staged state, reflecting his ability to demonstrate sporting achievements, is called preparedness, and the state of optimal (best for a given training cycle) preparedness is called sports uniform. Obviously, a state of fitness cannot be achieved or lost within one or several days.

2. Current state. Changes under the influence of one or several classes. Often the consequences of participation in competitions or performance in one of the classes training work lasts for several days. In this case, the athlete usually notes phenomena of both an unfavorable nature (for example, muscle pain) and positive ones (for example, a condition increased performance). Such changes are called delayed training effect.

The current state of the athlete determines the nature of the next training sessions and the magnitude of the loads in them. Special case current state, characterized by readiness to perform a competitive exercise in the coming days with a result close to the maximum, is called current readiness.

3. Operational state. Changes under the influence one-time execution physical exercise and is temporary (for example, fatigue caused by running a distance once; a temporary increase in performance after warming up). The athlete’s operational state changes during the training session and should be taken into account when planning rest intervals between approaches, repeated races, when deciding on the advisability of additional warm-up, etc. A special case of an operational state, characterized by immediate readiness to perform a competitive exercise with a result close to the maximum, is called operational readiness.

In accordance with the above classification, there are three main types of monitoring the athlete’s condition:

1) stage control. Its purpose is to assess the stage condition (readiness) of the athlete;

2) current control. Its main task is to determine everyday (current) fluctuations in the athlete’s condition;

3) operational control . Its purpose is a rapid assessment of the athlete’s condition at the moment.

A measurement or test performed to determine the condition or ability of an athlete is called test. The measurement or test procedure is called testing.

Any test involves measurement. But not every measurement serves as a test. Only those that satisfy the following metrological requirements can be used as tests: requirements:

2) standardization;

3) the presence of a rating system;

4) reliability and information content (quality factor) of tests;

5) type of control (stage-by-stage, current or operational).

A test based on motor tasks is called motor. There are three groups of motor tests:

1. Control exercises, in which the athlete is tasked to show maximum results. The test result is a motor achievement. For example, the time it takes an athlete to run a distance of 100 m.

2. Standard functional tests, during which the task, the same for everyone, is dosed either according to the amount of work performed, or according to the magnitude of physiological changes. The test result is physiological or biochemical indicators at standard work or motor achievements with a standard amount of physiological changes. For example, the percentage increase in heart rate after 20 squats or the speed at which an athlete runs with a fixed heart rate of 160 beats per minute.

3. Maximum functional tests, during which the athlete must show maximum results. The test result is physiological or biochemical indicators at maximum work. For example, maximum oxygen consumption or maximum oxygen debt.

High quality testing requires knowledge of measurement theory.

Fundamentals of test theory 1. Basic concepts of test theory 2. Test reliability and ways to determine it

Test questions 1. What is the test called? 2. What are the requirements for the test? 3. What tests are called authentic? 4. What is the reliability of a test? 5. List the reasons that cause variation in results during repeated testing. 6. How does intraclass variation differ from interclass variation? 7. How to practically determine the reliability of a test? 8. What is the difference between test consistency and stability? 9. What is the equivalence of tests? 10. What is a homogeneous set of tests? 11. What is a heterogeneous set of tests? 12. Ways to improve the reliability of tests.

A test is a measurement or test carried out to determine a person's condition or ability. Not all measurements can be used as tests, but only those that meet special requirements. These include: 1. standardization (the testing procedure and conditions must be the same in all cases of using the test); 2. reliability; 3. information content; 4. Availability of a rating system.

Test requirements: n Information content - the degree of accuracy with which it measures the property (quality, ability, characteristic) for which it is used to evaluate. n Reliability is the degree to which results are consistent when the same people are tested repeatedly under the same conditions. Consistency - ( different people, but the same devices and the same conditions). n n Standardity of conditions - (same conditions for repeated measurements). n Availability of a grading system - (translation into a grading system. Like in school 5 -4 -3...).

Tests that meet the requirements of reliability and information content are called sound or authentic (Greek authentiko - in a reliable manner)

The testing process is called testing; the numerical value obtained as a result of the measurement is the test result (or test result). For example, the 100 m run is a test, the procedure for conducting races and timing is testing, and the time of the race is the test result.

Tests based on motor tasks are called motor or motor tests. Their results can be either motor achievements (time to complete the distance, number of repetitions, distance traveled, etc.), or physiological and biochemical indicators.

Sometimes not one, but several tests are used that have a single final goal (for example, assessing the athlete’s condition during the competitive training period). Such a group of tests is called a set or battery of tests.

The same test, applied to the same subjects, should give identical results under the same conditions (unless the subjects themselves have changed). However, even with the most stringent standardization and precise equipment, test results always vary somewhat. For example, a subject who has just shown a result of 215 kG in the deadlift dynamometry test, when repeated, shows only 190 kG.

Reliability of tests and ways to determine it Reliability of a test is the degree of agreement of results when repeated testing of the same people (or other objects) under the same conditions.

Variation in test-retest results is called within-individual, or within-group, or within-class. Four main reasons cause this variation: 1. Changes in the state of the subjects (fatigue, training, “learning,” changes in motivation, concentration, etc.). 2. Uncontrolled changes external conditions and equipment (temperature, wind, humidity, voltage in the electrical network, the presence of unauthorized persons, etc.), i.e., everything that is united by the term “random measurement error.”

Four main reasons cause this variation: 3. A change in the condition of the person administering or scoring the test (and, of course, the replacement of one experimenter or judge by another). 4. Imperfection of the test (there are tests that are obviously unreliable. For example, if the subjects make free throws into a basketball basket, then even a basketball player who has high percentage hits, may accidentally make a mistake on the first throws).

The concept of a true test result is an abstraction (it cannot be measured experimentally). Therefore we have to use indirect methods. Most preferable for reliability assessment analysis of variance followed by calculation of intraclass correlation coefficients. Analysis of variance allows us to decompose the experimentally recorded variation in test results into components due to the influence of individual factors.

If you register the results of the subjects in any test, repeating this test in different days, and make several attempts every day, periodically changing experimenters, then variations will occur: a) from subject to subject; n b) from day to day; n c) from experimenter to experimenter; n d) from attempt to attempt. Analysis of variance makes it possible to isolate and evaluate these variations. n

Thus, in order to assess the practical reliability of the test, it is necessary, n firstly, to perform an analysis of variance, n secondly, to calculate the intraclass correlation coefficient (reliability coefficient).

Speaking about the reliability of tests, it is necessary to distinguish between their stability (reproducibility), consistency, and equivalence. n n Test stability refers to the reproducibility of results when repeated after a certain time under the same conditions. Repeated testing is usually called a retest. Test consistency is characterized by the independence of test results from the personal qualities of the person administering or evaluating the test.

If all the tests included in a test set are highly equivalent, it is called homogeneous. This entire complex measures one property of human motor skills (for example, a complex consisting of standing long, up and triple jumps; the level of development of speed-strength qualities is assessed). If there are no equivalent tests in the complex, then the tests included in it measure different properties, then it is called heterogeneous (for example, a complex consisting of deadlift dynamometry, Abalakov jump, 100 m run).

Test reliability can be improved to a certain extent by: n n n a) more stringent standardization of testing; b) increasing the number of attempts; c) increasing the number of evaluators (judges, experiments) and increasing the consistency of their opinions; d) increasing the number of equivalent tests; e) better motivation of the subjects.

What is testing

In accordance with IEEE Std 829-1983 Testing is a process of software analysis aimed at identifying differences between its actually existing and required properties (defect) and at assessing the properties of the software.

According to GOST R ISO IEC 12207-99 c life cycle The software defines, among others, supporting processes for verification, certification, joint analysis and audit. The verification process is the process of determining that software products function in full accordance with the requirements or conditions implemented in previous work. This process may include analysis, verification and testing (testing). The certification process is the process of determining the completeness of compliance of the established requirements, the created system or software product with them functional purpose. The joint review process is the process of assessing the states and, if necessary, the results of the work (products) of the project. The audit process is the process of determining compliance with requirements, plans and contract terms. Together, these processes make up what is usually called testing.

Testing is based on test procedures with specific input data, initial conditions and expected result, designed for a specific purpose, such as testing a particular program or verifying conformance to a specific requirement. Test procedures can test various aspects of a program's functioning, from proper operation separate function until business requirements are adequately met.

When carrying out a project, it is necessary to consider in accordance with what standards and requirements the product will be tested. What tools (if any) will be used to find and document defects found. If you remember about testing from the very beginning of the project, testing the product under development will not bring unpleasant surprises. This means that the quality of the product will most likely be quite high.

Product life cycle and testing

Increasingly nowadays, iterative software development processes are used, in particular, technology RUP - Rational Unified Process(Fig. 1). With this approach, testing ceases to be an “off-the-cuff” process that occurs after programmers have written all the necessary code. Work on tests starts from the very beginning initial stage identifying requirements for a future product and closely integrating with current tasks. And this places new demands on testers. Their role is not limited to simply identifying errors as fully and as early as possible. They must participate in general process identifying and eliminating the most significant project risks. To do this, for each iteration the testing goal and methods for achieving it are determined. And at the end of each iteration, it is determined to what extent this goal has been achieved, whether additional tests are needed, and whether the principles and tools for conducting tests need to be changed. In turn, each detected defect must go through its own life cycle.

Rice. 1. Product life cycle according to RUP

Testing is usually carried out in cycles, each of which has a specific list of tasks and goals. The testing cycle may coincide with an iteration or correspond to a specific part of it. Typically, a testing cycle is carried out for a specific system build.

The life cycle of a software product consists of a series of relatively short iterations (Figure 2). An iteration is a complete development cycle leading to the release of a final product or some shortened version of it, which expands from iteration to iteration to eventually become a complete system.

Each iteration usually includes tasks of work planning, analysis, design, implementation, testing and evaluation of achieved results. However, the relationship between these tasks can change significantly. In accordance with the relationship between various tasks in an iteration, they are grouped into phases. The first phase, Beginning, focuses on the analysis tasks. The second phase iterations, Development, focus on designing and testing key design solutions. In the third phase - Construction - the largest proportion of development and testing tasks. And in the last phase - Transfer - the tasks of testing and transferring the system to the Customer are solved to the greatest extent.

Rice. 2. Iterations of the software product life cycle

Each phase has its own specific goals in the product life cycle and is considered complete when those goals are achieved. All iterations, except perhaps the Beginning phase iterations, end with the creation of a functioning version of the system being developed.

Test categories

Tests vary significantly in the problems they solve and the technology they use.

Test categories	Category description	Types of testing
Current testing	A set of tests performed to determine the functionality of new system features added.	load testing; business cycle testing; stress testing.
Regression testing	The purpose of regression testing is to verify that additions to the system do not reduce its capabilities, i.e. testing is carried out according to requirements that have already been met before adding new features.	load testing; business cycle testing; stress testing.

Testing subcategories

Testing subcategories	Description of the type of testing	Subtypes of testing
Load testing	Used to test all application functions without exception. IN in this case The order in which the functions are tested does not matter.	functional testing; interface testing; database testing
Business cycle testing	Used to test application functions in the sequence they are called by the user. For example, simulating all the actions of an accountant for the 1st quarter.	unit testing (unit testing); functional testing; interface testing; database testing.
Stress testing	Used for testing Application performance. The purpose of this testing is to determine the scope of stable operation of the application. During this testing, all available functions are called.	unit testing (unit testing); functional testing; interface testing; database testing.

Testing subcategories

Description of the type of testing

Subtypes of testing

Load testing

Used to test all application functions without exception. IN in this case The order in which the functions are tested does not matter.

functional testing;
interface testing;
database testing

Business cycle testing

Used to test application functions in the sequence they are called by the user. For example, simulating all the actions of an accountant for the 1st quarter.

unit testing (unit testing);
functional testing;
interface testing;
database testing.

Stress testing

Used for testing

Application performance. The purpose of this testing is to determine the scope of stable operation of the application. During this testing, all available functions are called.

unit testing (unit testing);
functional testing;
interface testing;
database testing.

Types of testing

Unit testing (unit testing) - this type involves testing individual application modules. To obtain maximum results, testing is carried out simultaneously with the development of modules.

Functional testing - The purpose of this testing is to ensure that the test item is functioning properly. The correct navigation through the object is tested, as well as the input, processing and output of data.

Database testing - checking the functionality of the database when normal operation applications, in moments of overload and multi-user mode.

Unit testing

For OOP, the usual way to organize unit testing is to test the methods of each class, then the class of each package, and so on. We are gradually moving on to testing the entire project, and the previous tests are of the regression type.

The output documentation of these tests includes test procedures, input data, code executing the test, and output data. The following is the type of output documentation.

Functional testing

Functional testing of the test item is planned and conducted based on the testing requirements specified during the requirements definition phase. The requirements include business rules, use-case diagrams, business functions, and, if available, activity diagrams. The purpose of functional tests is to verify that the developed graphical components meet the specified requirements.

This type of testing cannot be fully automated. Therefore, it is divided into:

Automated testing (will be used in the case where it is possible to check the output information).

Purpose: to test data input, processing and output;

Manual testing (in other cases).

Purpose: Tests whether user requirements are met correctly.

It is necessary to execute (play) each of the use-cases, using both correct values and obviously erroneous ones, to confirm correct functioning, according to the following criteria:

the product responds adequately to all input data (expected results are output in response to correctly entered data);
the product responds adequately to incorrectly entered data (corresponding error messages appear).

Database testing

The purpose of this testing is to ensure the reliability of database access methods, their correct execution, without violating data integrity.

It is necessary to use as many database calls as possible sequentially. An approach is used in which the test is designed in such a way as to “load” the database with a sequence of both correct values and obviously erroneous ones. The reaction of the database to data input is determined, and the time intervals for their processing are estimated.