@@ -32,6 +32,9 @@ The dataset includes 9 big subclasses, i.e., Baby Stuffs (e.g., *Baby Diapers* a
Electrical Appliances (e.g., *Microwave Oven* and *Socket*), Storage Appliances (e.g., *Trash* and *Stool*), Kitchen Utensils (e.g., *Forks* and *Food Box*), and Stationery and Sporting Goods (e.g., *Skate* and *Notebook*).
There are various factors challenging the performance of algorithms, such as scale changes, illumination variations, occlusion, similar appearance, clutter background, blurring and deformation, *etc*.

## Evaluation protocol
To fairly compare algorithms on the *Locount* task, we design a new evaluation protocol, which penalizes algorithms for missing object instances,
@@ -49,6 +52,9 @@ multiple stages for localization and counting, i.e., S_{1},..., S_{N} are cascad
instances enclosed in the bounding box, where N is the total number of stages. For more detailed definitions, please refer to the [paper](http://arxiv.org/abs/2003.08230).
The counting accuracy threshold for the positive/negative sample generation is determined by the architecture design of CLCNet, which is described as follows.

We use the same architecture and configuration as Cascade R-CNN for the box-regression and box-classification layers. For the instance counting layer,
a direct strategy is to use a FC layer to regress a floating point number, indicating the number of instances, called *count-regression strategy*.
However, the numbers of instances enclosed in the bounding boxes are integers, leading challenges for the network to regress accurately.