• Does HDR-VDP-2 work with any pair of HDR images?
No. The images passed to the HDR-VDP-2 must be (approximately) calibrated in absolute luminance units. Most of the HDR images in public databases are represented in relative units (relative trichromatic RGB color values). This means that luminance computed from those values ($Y = 0.2126 R + 0.7152 G + 0.0722 B$) is NOT absolute luminance in cd/m^2, but relative luminance. To transform images to absolute units, the RGB vales must be multiplied by an appropriate constant. Such constant will depend on an application and the brightness at which the scenes or images are going to be seen.
If an HDR image is to be shown on an HDR display, it needs to be tone-mapped for that display. Such tone-mapping will mostly involve adjusting brightness and clipping some very bright and dark pixels, which cannot be shown on the display. If you have pfstools installed, you can use display adaptive tone-mapping operator for automatically tone-map for an HDR display:

pfsin relative.exr | pfstmo_mantiuk08 -e 1 -d l=4000:b=0.01:a=0 | pfsdisplayfunction -d l=4000:b=0.01:a=0 -l | pfsout absolute.exr

pfstmo_mantiuk08 tone-maps the image so that it fits within the dynamic range from 4000 cd/m^2 to 0.01 cd/m^2, and pfsdisplayfunction converts "pixel values" to proper absolute colorimetric values.
• Why do HDR images need to be calibrated in absolute units before they can be used with HDR-VDP-2?
Images shown on a bright display will reveal more distortions than images shown on a dark display. HDR-VDP-2 accounts for this effect and reduces the visibility for low absolute luminance values.
If an image is uncalibrated and all values are less than 1 (which is not uncommon for some HDR images), HDR-VDP-2 will assume that the image is very dark (luminance below 1 cd/m^2) and is seen mostly using rod vision. Since rod vision is less sensitive than cone vision (in terms of contrast sensitivity), fewer distortions will be detected.
• Does HDR-VDP-2 report differences in color?
No. However, color information is used to correctly compute photopic (daylight) and scotopic (night vision) luminance, i.e. Purkinje shift.
• Does HDR-VDP-2 work with ordinary (LDR) images?
Yes. You need to specify 'sRGB-display' as the color_encoding parameter and pass an RGB image in which the values range from 0 to 1. This color encoding assumes that the peak luminance of the display is 100 $cd/m^2$. Note that the matlab function imread will return a matrix of uint8 or unit16, which needs to be converted to the normalized floating point matrix: single(img)/2^8 or single(img)/2^16.

• How to interpret the visibility predictions: P_det and P_map ?
P_det is probability of detecting a difference for the entire image assuming that each part of the image is equally attended. P_map is a map (2D matrix) of per-pixel probabilities. It should be interpreted as the probability of detection when an observer is focusing on a particular pixel.
For the current version of the HDR-VDP-2, P_det = max(P_map ). It was found that this choice gives the best match to psychophysical data and is also the most conservative estimate. Intuitively, if a pattern starts to be noticeable, it is detected in the part of the image where it is the most visible (hence the maximum function) regardless whether that part takes 90% or 10% of a display. In fact the probability of detection map already accounts for the higher visibility of larger areas (spatial integration).
P_det is NOT the portion of population that will notice an artefact but rather a probability that an average observer will notice an artefact. This is an important distinction, because the intra-observer variance (within observer) is much smaller than inter-observer variance (within population). As the result of that, the probability values (P_det) will be reaching extreme values (0 and 1) much sooner than can be expected. For example, HDR-VDP-2 may be predicting a certain detection (P_det=1) while you can hardly see the artefact. This means that an average observer is more sensitive than your actual observer. In such a case, you should adjust the sensitivity parameter 'peak_sensitivity'.
The probabilities P_det and P_map are NOT the probabilities of the two-alternative-forced-choice (2AFC) experiment, in which the probability is affected by the chance of guessing the right answer. The probability $P_{det}$ should be interpreted as the probability that an average observer correctly detects the difference while his or her chance of guessing is zero. For example an observer is presented a just noticeable pattern and a very large number (in fact infinite number) of flat luminance maps, from which he or she has to choose the one that contains the pattern. The $P_{det}$ can vary from 0 to 1, while the probability in the 2AFC experiment is normally within the range from 0.5 to 1 because an observer has at least 0.5 chance of guessing right. The following paragraphs explain in detail the difference between $P_{det}$ and the probabilities found in forced choice experiments.
The probability of a positive answer in a forced choice experiment is equal probability of detection ($P_d$) plus the probability of a chance (correct guess) conditional on the case when the pattern is not detected:
$P_p=P_d+(1-P_d){\cdot}P_c$.
From the above equation we get:
$P_d=\frac{P_p-P_c}{1-P_c}$.
In the two-alternative-choice experiment there is 50% chance of guessing the right answer ($P_c=0.5$). Therefore, in order to find the threshold for 50% probability of detection ($P_d=0.5$), the psychometric procedure is adjusted to converge at 75% probability of positive answer ($P_p=0.75$). Thus probability of detection $P_d$, which is equivalent to $P_{det}$ in HDR-VDP-2, is different to the probability of giving correct answer in an 2AFC experiment.
• Why are the HDR-VDP-2 visibility predictions so conservative?
You may be surprised that the HDR-VDP-2 reports differences in the regions where you can barely see any difference. This is because the assumption is that the observer knows exactly where the distortion is located, which is not the case in the typical situations. Also, even if the difference is visible, it may not be objectionable. In that case the quality predictors $Q$ and $Q_{MOS}$ may be more suitable.
Another reason for high sensitivity is that the HDR-VDP-2 was calibrated for the distortions that are localized within a Gaussian window of about 1-2 visual degree diameter. If the distortion spreads over larger areas, the visbility may be overpredicted. It is possible to reduce the sensitivity of the metric using the option "peak_sensitivity", but the amount of the adjustment should be validated by performing some calibration experiment for a subset of the images you need to process.
If the distortions are not localized, a better prediction can be obtained if the spatial pooling component of the metric is disabled. To disable it, pass { 'do_spatial_pooling', false, 'peak_sensitivity', 2.355708 } in the option list. The second argument adjust overall sensitivity to compensate for the lack of spatial integration and was found by fitting to the 'complex images' dataset. You may still want to further adjust this parameter for your particular application.
• How to improve HDR-VDP-2 predictions?
HDR-VDP-2 does not model all possible factors that may affect the detection and discrimination performance. For example, the sensitivity can vary greatly if the task is to detect temporal difference when flicking between two images, as compared to the side-by-side presentation of a test and reference images. The default calibration values, which were determined in the side-by-side task, may not be optimal for other tasks.
Most of these factors can be accounted for by adjusting the "peak_sensitivity" parameter. The best way to fine-tune the "peak_sensitivity" value is to run an experiment for one particular stimuli and determine the threshold. Then, adjust the "peak_sensitivity" parameter until HDR-VDP-2 predictions are consistent with the experimentally determined threshold. Once HDR-VDP-2 is calibrated this way, it should produce better predictions for the rest of the stimuli.
• Why the HDR-VDP-2 reports visible difference for two images that are identical?
The problem comes from the limited numerical precision and the way spatial integration is implemented. It will be address in the next revision of the metric. If this is causing any problems, the best is to disable spatial pooling as explained in the answer above.
• The difference between two images are restricted to a small region (e.g. a single object), but the HDR-VDP-2 reports distortions outside that region where there are no differences. Why is that?
The HDR-VDP-2 analyzes a full range of spatial frequencies when predicting visibility. When the differences affect lower frequencies, the distortion cannot be precisely localized and the visible differences tend to 'spill over' the boundary of the affected region. This is the current limitation of the HDR-VDP-2, which will hopefully be addressed in the next revisions.
The older HDR-VDP-1.7 contained a 'hack' that restricted the reported differences to areas that were actually different. The feature was enabled by default, and could be switched off with --no-abs-fix option. It worked by testing the luminance difference on the pixel level and masking the distortion for the pixels that did not differ. It was removed from the HDR-VDP-2 as not very elegant and undermining the purpose of the visual model.
• Can HDR-VDP predict the effect of viewing distance?
The effect of the viewing distance (or spatial resolution in pixels per visual degree) is predicted for visibility predictors P_map, P_det, C_map, but currently it is NOT correctly predicted for the quality predictor Q. The quality prediction assumes a typical viewing distance of 2 to 3 screen heights.