Anton Milan
Anton Milan (né Andriyenko)
I am a senior researcher at the Australian Centre for Visual Technologies (ACVT) at the University of Adelaide. My main research interests are computer vision and machine learning and I am particularly interested in object detection and tracking. You can find more information about me here.
  • 27/02/2017 - Two CVPR papers accepted
  • 15/01/2017 - Our APC paper got accepted to ICRA 2017.
  • 21/12/2016 - We will hold a joint BMTT-PETS Workshop at CVPR 2017 in Hawaii.
  • 29/11/2016 - New arxiv paper on set learning.
  • 12/11/2016 - Two AAAI papers accepted.
  • 04/07/2016 - Our team achieved 2nd and 3rd places at the Amazon Picking Challenge.
  • 01/03/2016 - MOT16: a new benchmark published

Google Scholar stats:
number of citations: 1081
h-index: 14
i10-index: 14


RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation

G. Lin, A. Milan, C. Shen, I. Reid. In CVPR 2017
bibtex | abstract | paper | code

        Author = {Lin, G. and Milan, A. and Shen, C. and Reid, I.},
	Title = {Refine{N}et: {M}ulti-Path Refinement Networks for High-Resolution Semantic Segmentation},
	Booktitle = {CVPR},
	Year = {2017}	
Recently, very deep convolutional neural networks (CNNs) have shown 
outstanding performance in object recognition and have also been the 
first choice for dense classification problems such as semantic 
segmentation. However, repeated subsampling operations like pooling or 
convolution striding in deep CNNs lead to a significant decrease in the 
initial image resolution. Here, we present RefineNet, a generic 
multi-path refinement network that explicitly exploits all the 
information available along the down-sampling process to enable 
high-resolution prediction using long-range residual connections. In 
this way, the deeper layers that capture high-level semantic features 
can be directly refined using fine-grained features from earlier 
convolutions. The individual components of RefineNet employ residual 
connections following the identity mapping mindset, which allows for 
effective end-to-end training. Further, we introduce chained residual 
pooling, which captures rich background context in an efficient manner. 
We carry out comprehensive experiments and set new state-of-the-art 
results on seven public datasets. In particular, we achieve an 
intersection-over-union score of 83.4 on the challenging PASCAL VOC 2012 
dataset, which is the best reported result to date.

Pose-Track: Joint Multi-Person Pose Estimation and Tracking

U. Iqbal, A. Milan, J. Gall. In CVPR 2017
bibtex | abstract | paper | project

        Author = {Iqbal, U. and Milan, A. and Gall, J.},
	Title = {Pose-{T}rack: {J}oint Multi-Person Pose Estimation and Tracking},
	Booktitle = {CVPR},
	Year = {2017}	
In this work, we introduce the challenging problem of joint multi-person pose 
estimation and tracking of an unknown number of persons in unconstrained videos. 
Existing methods for multi-person pose estimation in images cannot be applied 
directly to this problem, since it also requires to solve the problem of person 
association over time in addition to the pose estimation for each person. We 
therefore propose a novel method that jointly models multi-person pose 
estimation and tracking in a single formulation. To this end, we represent body 
joint detections in a video by a spatio-temporal graph and solve an integer 
linear program to partition the graph into sub-graphs that correspond to 
plausible body pose trajectories for each person. The proposed approach 
implicitly handles occlusions and truncations of persons. Since the problem has 
not been addressed quantitatively in the literature, we introduce a challenging 
"Multi-Person Pose-Track" dataset, and also propose a completely unconstrained 
evaluation protocol that does not make any assumptions on the scale, size, 
location or the number of persons. Finally, we evaluate the proposed approach 
and several baseline methods on our new dataset.

NimbRo Picking: Versatile Part Handling for Warehouse Automation

M. Schwarz, A. Milan, C. Lenz, A. Muñoz, A. S. Periyasamy, M. Schreiber, S. Schüller, S. Behnke.
In ICRA 2017
bibtex | abstract | paper

	title = {Nimb{R}o {P}icking: {V}ersatile Part Handling for Warehouse Automation},
	booktitle = {ICRA},
	author = {Schwarz, M. and Milan, A. and Lenz C. and Mu{\~n}oz A. and Periyasamy A. S. and Schreiber M. and Sch{\"u}ller S. and Behnke S.},
	month = {June},
	year = {2017}
Part handling in warehouse automation is challenging if a large variety of items 
must be accommodated and items are stored in unordered piles. To foster research 
in this domain, Amazon holds picking challenges. We present our system which 
achieved second and third place in the Amazon Picking Challenge 2016 tasks. The 
challenge required participants to pick a list of items from a shelf or to stow 
items into the shelf. Using two deep-learning approaches for object detection 
and semantic segmentation and one item model registration method, our system 
localizes the requested item. Manipulation occurs using suction on points 
determined heuristically or from 6D item model registration. Parametrized motion 
primitives are chained to generate motions. We present a full-system evaluation 
during the APC 2016 and component level evaluations of the perception system on 
an annotated dataset.

Data-Driven Approximations to NP-Hard Problems

A. Milan, S. H. Rezatofighi, R. Garg, A. Dick, I. Reid. In AAAI 2017
bibtex | abstract | paper | slides | code

	title = {Data-driven approximations to {NP}-hard problems},
	booktitle = {AAAI},
	author = {Milan, A. and Rezatofighi, S. H. and Garg, R. and Dick, A. and Reid, I.},
	month = {February},
	year = {2017}
There exist a number of problem classes, for which obtaining
the exact solution becomes exponentially expensive with
increasing problem size.  The quadratic assignment problem (QAP)
or the travelling salesman problem (TSP) are just two examples of 
such NP-hard problems. In practice, approximate algorithms
are employed to obtain a suboptimal solution, where one must
face a trade-off between computational complexity and 
solution quality.
In this paper, we propose to learn  to solve these problem from
approximate examples, using recurrent neural networks (RNNs). 
Surprisingly, such architectures are capable of producing 
highly accurate solutions at minimal computational cost. 
Moreover, we introduce a simple,
yet effective technique for improving the initial (weak) training
set by incorporating the objective cost into the training procedure.
We demonstrate the functionality of our approach on three 
exemplar applications: marginal distributions of a joint
matching space, feature point matching and the travelling
salesman problem. We show encouraging results on synthetic and real data
in all three cases.

Online Multi-Target Tracking Using Recurrent Neural Networks

A. Milan, S. H. Rezatofighi, A. Dick, I. Reid, K. Schindler
In AAAI 2017
bibtex | abstract | paper | poster | code

	title = {Online Multi-Target Tracking using Recurrent Neural Networks},
	booktitle = {AAAI},
	author = {Milan, A. and Rezatofighi, S. H. and Dick, A. and Reid, I. and Schindler, K.},
	month = {February},
	year = {2017}
We present a novel approach to online multi-target tracking based on 
recurrent neural networks (RNNs). Tracking multiple objects in 
real-world scenes involves many challenges, including a) an a-priori 
unknown and time-varying number of targets, b) a continuous state 
estimation of all present targets, and c) a discrete combinatorial 
problem of data association. Most previous methods involve complex 
models that require tedious tuning of parameters. Here, we propose for 
the first time, a full end-to-end learning approach for online 
multi-target tracking based on deep learning. Existing deep learning 
methods are not designed for the above challenges and cannot be 
trivially applied to the task. Our solution addresses all of the above 
points in a principled way. Experiments on both synthetic and real data 
show competitive results obtained at 300 Hz on a standard CPU, and pave 
the way towards future research in this direction. 


DeepSetNet: Predicting Sets with Deep Neural Networks

S. H. Rezatofighi, V. Kumar, A. Milan, E. Abbasnejad, A. Dick, I. Reid. In arXiv:1611.08998
bibtex | abstract | paper

	title = {Deep{S}et{N}et: {P}redicting Sets with Deep Neural Networks},
	shorttitle = {DeepSetNet},
	url = {},
	journal = {arXiv:1611.08998 [cs]},
	author = {Rezatofighi, S. H. and Kumar BG, V. and Milan, A. and Abbasnejad, E. and Dick, A. and Reid, I.},
	month = nov,
	year = {2016},
	note = {arXiv: 1611.08998},
	keywords = {Computer Science - Computer Vision and Pattern Recognition}
This paper addresses the task of set prediction using deep learning. This is 
important because the output of many computer vision tasks, including image 
tagging and object detection, are naturally expressed as sets of entities rather 
than vectors. As opposed to a vector, the size of a set is not fixed in advance, 
and it is invariant to the ordering of entities within it. We define a 
likelihood for a set distribution and learn its parameters using a deep neural 
network. We also derive a loss for predicting a discrete distribution 
corresponding to set cardinality. Set prediction is demonstrated on the problems 
of multi-class image classification and pedestrian detection. Our approach 
yields state-of-the-art results in both cases on standard datasets.

Joint Probabilistic Matching Using m-Best Solutions

S. H. Rezatofighi, A. Milan, Z. Zhang, A. Dick, Q. Shi, I. Reid
CVPR 2016 (oral presentation)
bibtex | abstract | paper | supplemental | slides | poster | video | project

	Author = {Rezatofighi, S. H. and Milan, A. and Zhang, Z. and Shi, Q. and Dick, A. and Reid, I.},
	Booktitle = {CVPR},
	Title = {Joint Probabilistic Matching Using m-Best Solutions},
	Year = {2016}
Matching between two sets of objects is typically ap-
proached by finding the object pairs that collectively maxi-
mize the joint matching score. In this paper, we argue that
this single solution does not necessarily lead to the opti-
mal matching accuracy and that general one-to-one assign-
ment problems can be improved by considering multiple hy-
potheses before computing the final similarity measure. To
that end, we propose to utilize the marginal distributions for
each entity. Previously, this idea has been neglected mainly
because exact marginalization is intractable due to a com-
binatorial number of all possible matching permutations.
Here, we propose a generic approach to efficiently approx-
imate the marginal distributions by exploiting the m-best
solutions of the original problem. This approach not only
improves the matching solution, but also provides more ac-
curate ranking of the results, because of the extra informa-
tion included in the marginal distribution. We validate our
claim on two distinct objectives: (i) person re-identification
and temporal matching modeled as an integer linear pro-
gram, and (ii) feature point matching using a quadratic cost
function. Our experiments confirm that marginalization in-
deed leads to superior performance compared to the single
(nearly) optimal solution, yielding state-of-the-art
results in both applications on standard benchmarks.


MOT16: A Benchmark for Multi-Object Tracking

A. Milan, L. Leal-Taixé, I. Reid, S. Roth, and K. Schindler
bibtex | abstract | paper | project page

	title = {{MOT}16: {A} Benchmark for Multi-Object Tracking},
	shorttitle = {MOT16},
	url = {},
	journal = {arXiv:1603.00831 [cs]},
	author = {Milan, A. and Leal-Taix\'{e}, L. and Reid, I. and Roth, S. and Schindler, K.},
	month = mar,
	year = {2016},
	note = {arXiv: 1603.00831},
	keywords = {Computer Science - Computer Vision and Pattern Recognition}
Standardized benchmarks are crucial for the majority of computer vision 
applications. Although leaderboards and ranking tables should not be 
over-claimed, benchmarks often provide the most objective measure of 
performance and are therefore important guides for reseach. Recently, a 
new benchmark for Multiple Object Tracking, MOTChallenge, was launched 
with the goal of collecting existing and new data and creating a 
framework for the standardized evaluation of multiple object tracking 
methods. The first release of the benchmark focuses on multiple people 
tracking, since pedestrians are by far the most studied object in the 
tracking community. This paper accompanies a new release of the 
MOTChallenge benchmark. Unlike the initial release, all videos of MOT16 
have been carefully annotated following a consistent protocol. Moreover, 
it not only offers a significant increase in the number of labeled 
boxes, but also provides multiple object classes beside pedestrians and 
the level of visibility for every single object of interest.


Multi-Target Tracking by Discrete-Continuous Energy Minimization

A. Milan, K. Schindler and S. Roth
PAMI 38(1), 2016
bibtex | abstract | paper | video | project page

	author = {Milan, A. and Schindler, K. and Roth, S.},	
	title = {Multi-Target Tracking by Discrete-Continuous Energy Minimization},
	volume = {38},
	number = {10},
	journal = {IEEE TPAMI},
	year = {2016}
The task of tracking multiple targets is often addressed with the 
so-called tracking-by-detection paradigm, where the first step is to 
obtain a set of target hypotheses for each frame independently. Tracking 
can then be regarded as solving two separate, but tightly coupled 
problems. The first is to carry out data association, i.e. to determine 
the origin of each of the available observations. The second problem is 
to reconstruct the actual trajectories that describe the spatio-temporal 
motion pattern of each individual target. The former is inherently a 
discrete problem, while the latter should intuitively be modeled in 
continuous space. Having to deal with an unknown number of targets, 
complex dependencies, and physical constraints, both are challenging 
tasks on their own and thus most previous work focuses on one of these 
subproblems. Here, we present a multi-target tracking approach that 
explicitly models both tasks as minimization of a unified 
discrete-continuous energy function. Trajectory properties are captured 
through global label costs, a recent concept from multi-model fitting, 
which we introduce to tracking. Specifically, label costs describe 
physical properties of individual tracks, e.g. linear and angular 
dynamics, or entry and exit points. We further introduce pairwise label 
costs to describe mutual interactions between targets in order to avoid 
collisions. By choosing appropriate forms for the individual energy 
components, powerful discrete optimization techniques can be leveraged 
to address data association, while the shapes of individual trajectories 
are updated by gradient-based continuous energy minimization. The 
proposed method achieves state-of-the-art results on diverse benchmark 


Joint Probabilistic Data Association Revisited

S. H. Rezatofighi, A. Milan, Z. Zhang, A. Dick, Q. Shi, I. Reid
ICCV 2015
bibtex | abstract | paper | code | video 1 video 2

	Author = {Rezatofighi, S. H. and Milan, A. and Zhang, Z. and Shi, Q. and Dick, A. and Reid, I.},
	Booktitle = {ICCV},
	Title = {Joint Probabilistic Data Association Revisited},
	Year = {2015}
In this paper, we revisit the joint probabilistic data association (JPDA) technique and propose a novel solution based on recent developments in finding the m-best solutions to an integer linear program. The key advantage of this approach is that it makes JPDA computationally tractable in applications with high target and/or clutter density, such as spot tracking in fluorescence microscopy sequences and pedestrian tracking in surveillance footage. We also show that our JPDA algorithm embedded in a simple tracking framework is surprisingly competitive with state-of-the-art global tracking methods in these two applications, while needing considerably less processing time.

MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking

L. Leal-Taixé, A. Milan, I. Reid, S. Roth, and K. Schindler
bibtex | abstract | paper | project page

	title = {MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking},
	shorttitle = {MOTChallenge 2015},
	url = {},
	journal = {arXiv:1504.01942 [cs]},
	author = {Leal-Taix\'{e}, Laura and Milan, Anton and Reid, Ian and Roth, Stefan and Schindler, Konrad},
	month = apr,
	year = {2015},
	note = {arXiv: 1504.01942},
	keywords = {Computer Science - Computer Vision and Pattern Recognition}
In the recent past, the computer vision community has developed 
centralized benchmarks for the performance evaluation of a variety of 
tasks, including generic object and pedestrian detection, 3D 
reconstruction, optical flow, single-object short-term tracking, and 
stereo estimation. Despite potential pitfalls of such benchmarks, they 
have proved to be extremely helpful to advance the state of the art in 
the respective area. Interestingly, there has been rather limited work 
on the standardization of quantitative benchmarks for multiple target 
tracking. One of the few exceptions is the well-known PETS dataset, 
targeted primarily at surveillance applications. Despite being widely 
used, it is often applied inconsistently, for example involving using 
different subsets of the available data, different ways of training the 
models, or differing evaluation scripts. This paper describes our work 
toward a novel multiple object tracking benchmark aimed to address such 
issues. We discuss the challenges of creating such a framework, 
collecting existing and new data, gathering state-of-the-art methods to 
be tested on the datasets, and finally creating a unified evaluation 
system. With MOTChallenge we aim to pave the way toward a unified 
evaluation framework for a more meaningful quantification of 
multi-target tracking. 


Joint Tracking and Segmentation of Multiple Targets

A. Milan, L. Leal-Taixé, K. Schindler and I. Reid
CVPR 2015
bibtex | abstract | ext. abstract | paper | video | project page

	Author = {Anton Milan and Laura Leal-Taixé and Konrad Schindler and Ian Reid},
	Booktitle = {CVPR},
	Title = {Joint Tracking and Segmentation of Multiple Targets},
	Year = {2015}
Tracking-by-detection has proven to be the most successful strategy to 
address the task of tracking multiple targets in unconstrained 
scenarios. Traditionally, a set of sparse detections, generated in a 
preprocessing step, serves as input to a high-level tracker whose goal 
is to correctly associate these “dots” over time. An obvious shortcoming 
of this approach is that most information available in image sequences 
is simply ignored by thresholding weak detection responses and applying 
non-maximum suppression. We propose a multi-target tracker that exploits 
low level image information and associates every (super)-pixel to a 
specific target or classifies it as background. As a result, we obtain a 
video segmentation in addition to the classical bounding-box 
representation in unconstrained, realworld videos. Our method shows 
encouraging results on many standard benchmark sequences and 
significantly outperforms state-of-the-art tracking-by-detection 
approaches in crowded scenes with long-term partial occlusions.


Privacy Preserving Multi-target Tracking

A. Milan, S. Roth, K. Schindler and M. Kudo
Workshop on Human Identification for Surveillance (HIS)
bibtex | abstract | paper | video | slides | project page

  author = {A.Milan and S. Roth and K. Schindler and M. Kudo},
  title = {Privacy Preserving Multi-target Tracking},
  booktitle = {Workshop on Human Identification for Surveillance},
  year = {2014}
Automated people tracking is important for a wide range
of applications. However, typical surveillance cameras are controversial
in their use, mainly due to the harsh intrusion of the tracked individ-
uals’ privacy. In this paper, we explore a privacy-preserving alternative
for multi-target tracking. A network of infrared sensors attached to the
ceiling acts as a low-resolution, monochromatic camera in an indoor en-
vironment. Using only this low-level information about the presence of
a target, we are able to reconstruct entire trajectories of several peo-
ple. Inspired by the recent success of offline approaches to multi-target
tracking [1–3], we apply an energy minimization technique to the novel
setting of infrared motion sensors. To cope with the very weak data term
from the infrared sensor network we track in a continuous state space
with soft, implicit data association. Our experimental evaluation on both
synthetic and real-world data shows that our principled method clearly
outperforms previous techniques.

Improving Global Multi-target Tracking with Local Updates

A. Milan, R. Gade, A. Dick, T. B. Moeslund, I. Reid
Workshop on Visual Surveillance and Re-Identification
bibtex | abstract | paper | video | slides

  author = {A. Milan and R. Gade and A. Dick and T. B. Moeslund and I. Reid},
  title = {Improving Global Multi-target Tracking with Local Updates},
  booktitle = {Workshop on Visual Surveillance and Re-Identification},
  year = {2014}
We propose a scheme to explicitly detect and resolve ambiguous 
situations in multiple target tracking. During periods of uncertainty,
our method applies multiple local single target trackers to hypothesise
short term tracks. These tracks are combined with the tracks obtained
by a global multi-target tracker, if they result in a reduction in the global
cost function. Since tracking failures typically arise when targets become
occluded, we propose a local data association scheme to maintain the
target identities in these situations. We demonstrate a reduction of up
to 50% in the global cost function, which in turn leads to superior 
performance on several challenging benchmark sequences. Additionally, we
show tracking result in sports videos where poor video quality and 
frequent and severe occlusion

Energy Minimization for Multiple Object Tracking

A. Milan
PhD Thesis
bibtex | thesis | slides |

	address = {Darmstadt},
	type = {{PhD}},
	title = {Energy Minimization for Multiple Object Tracking},
	url = {},
	school = {{TU} Darmstadt},
	author = {Milan, Anton},
	year = {2014},

Continuous Energy Minimization for Multitarget Tracking

A. Milan, S. Roth and K. Schindler
PAMI 36(1), 2014
bibtex | abstract | paper | video | slides | project page

	author = {Milan, A. and Roth, S. and Schindler, K.},	
	title = {Continuous Energy Minimization for Multitarget Tracking},
	volume = {36},
	issn = {0162-8828},
	doi = {10.1109/TPAMI.2013.103},
	number = {1},
	journal = {IEEE TPAMI},
	year = {2014},
	pages = {58--72}
Many recent advances in multiple target tracking aim at finding a 
(nearly) optimal set of trajectories within a temporal window. To handle 
the large space of possible trajectory hypotheses, it is typically 
reduced to a finite set by some form of data-driven or regular 
discretization. In this work we propose an alternative formulation of 
multi-target tracking as minimization of a continuous energy. Contrary 
to recent approaches, we focus on designing an energy that corresponds 
to a more complete representation of the problem, rather than one that 
is amenable to global optimization. Besides the image evidence, the 
energy function takes into account physical constraints, such as target 
dynamics, mutual exclusion, and track persistence. In addition, partial 
image evidence is handled with explicit occlusion reasoning, and 
different targets are disambiguated with an appearance model. To 
nevertheless find strong local minima of the proposed non-convex energy 
we construct a suitable optimization scheme that alternates between 
continuous conjugate gradient descent and discrete trans-dimensional 
jump moves. These moves, which are executed such that they always reduce 
the energy, allow the search to escape weak minima and explore a much 
larger portion of the search space of varying dimensionality. We 
demonstrate the validity of our approach with an extensive quantitative 
evaluation on several public datasets. 


Learning People Detectors for Tracking in Crowded Scenes

S. Tang, M. Andriluka, A. Milan, K. Schindler, S. Roth and B. Schiele
ICCV 2013
bibtex | abstract | paper | poster | video | project page

  author = {S. Tang and M. Andriluka and A. Milan and K. Schindler and S. Roth and B. Schiele},
  title = {Learning People Detectors for Tracking in Crowded Scenes},
  booktitle = {ICCV},
  year = {2013}
People tracking in crowded real-world scenes is challenging due to 
frequent and long-term occlusions. Recent tracking methods obtain the 
image evidence from object (people) detectors, but typically use 
off-the-shelf detectors and treat them as black box components. In this 
paper we argue that for best performance one should explicitly train 
people detectors on failure cases of the overall tracker instead. To 
that end, we first propose a novel joint people detector that combines a 
state-of-the-art single person detector with a detector for pairs of 
people, which explicitly exploits common patterns of person-person 
occlusions across multiple viewpoints that are a common failure case for 
tracking in crowded scenes. To explicitly address remaining failure 
cases of the tracker we explore two methods. First, we analyze typical 
failure cases of trackers and train a detector explicitly on those 
failure cases. And second, we train the detector with the people tracker 
in the loop, focusing on the most common tracker failures. We show that 
our joint multi-person detector significantly improves both detection 
accuracy as well as tracker performance, improving the state-of-the-art 
on standard benchmarks. 

Challenges of Ground Truth Evaluation of Multi-Target Tracking

A. Milan, K. Schindler and S. Roth
CVPR Workshop on Ground Truth
bibtex | abstract | paper | poster

	Author = {Anton Milan and Konrad Schindler and Stefan Roth},
	Booktitle = {Proc. of the CVPR 2013 Workshop on Ground Truth - What is a good dataset?},
	Title = {Challenges of Ground Truth Evaluation of Multi-Target Tracking},
	Year = {2013}
Evaluating multi-target tracking based on ground truth data is a 
surprisingly challenging task. Erroneous or ambiguous ground truth 
annotations, numerous evaluation protocols, and the lack of standardized 
benchmarks make a direct quantitative comparison of different tracking 
approaches rather difficult. The goal of this paper is to raise 
awareness of common pitfalls related to objective ground truth 
evaluation. We investigate the influence of different annotations, 
evaluation software, and training procedures using several publicly 
available resources, and point out the limitations of current 
definitions of evaluation metrics. Finally, we argue that the 
development an extensive standardized benchmark for multi-target 
tracking is an essential step toward more objective comparison of 
tracking approaches. 

Detection- and Trajectory-Level Exclusion in Multiple Object Tracking

A. Milan, K. Schindler and S. Roth
CVPR 2013
bibtex | abstract | paper | poster | video | slides | data | project page

	Author = {Anton Milan and Konrad Schindler and Stefan Roth},
	Booktitle = {CVPR},
	Title = {Detection- and Trajectory-Level Exclusion in Multiple Object Tracking},
	Year = {2013}
When tracking multiple targets in crowded scenarios, modeling mutual 
exclusion between distinct targets becomes important at two levels: (1) 
in data association, each target observation should support at most one 
trajectory and each trajectory should be assigned at most one 
observation per frame; (2) in trajectory estimation, two trajectories 
should remain spatially separated at all times to avoid collisions. Yet, 
existing trackers often sidestep these important constraints. We address 
this using a mixed discrete-continuous conditional random field (CRF) 
that explicitly models both types of constraints: Exclusion between 
conflicting observations with supermodular pairwise terms, and exclusion 
between trajectories by generalizing global label costs to suppress the 
co-occurrence of incompatible labels (trajectories). We develop an 
expansion move-based MAP estimation scheme that handles both 
non-submodular constraints and pairwise global label costs. Furthermore, 
we perform a statistical analysis of ground-truth trajectories to derive 
appropriate CRF potentials for modeling data fidelity, target dynamics, 
and inter-target occlusion. 


Discrete-Continuous Optimization for Multi-Target Tracking

A. Andriyenko, K. Schindler and S. Roth
CVPR 2012
bibtex | abstract | paper | poster | video | data | project page

	Author = {Anton Andriyenko and Konrad Schindler and Stefan Roth},
	Booktitle = {CVPR},
	Title = {Discrete-Continuous Optimization for Multi-Target Tracking},
	Year = {2012}
The problem of multi-target tracking is comprised of two distinct, but 
tightly coupled challenges: (i) the naturally discrete problem of data 
association, i.e. assigning image observations to the appropriate 
target; (ii) the naturally continuous problem of trajectory estimation, 
i.e. recovering the trajectories of all targets. To go beyond simple 
greedy solutions for data association, recent approaches often perform 
multi-target tracking using discrete optimization. This has the 
disadvantage that trajectories need to be pre-computed or represented 
discretely, thus limiting accuracy. In this paper we instead formulate 
multi-target tracking as a discretecontinuous optimization problem that 
handles each aspect in its natural domain and allows leveraging powerful 
methods for multi-model fitting. Data association is performed using 
discrete optimization with label costs, yielding near optimality. 
Trajectory estimation is posed as a continuous fitting problem with a 
simple closed-form solution, which is used in turn to update the label 
costs. We demonstrate the accuracy and robustness of our approach with 
state-of-theart performance on several standard datasets. 


An Analytical Formulation of Global Occlusion Reasoning for Multi-Target Tracking

A. Andriyenko, S. Roth and K. Schindler
ICCV Workshop on Visual Surveillance
bibtex | abstract | paper | poster | video | slides | project page

	Author = {Anton Andriyenko and Stefan Roth and Konrad Schindler},
	Booktitle = {Proc. of the 11th International IEEE Workshop on Visual Surveillance},
	Title = {An Analytical Formulation of Global Occlusion Reasoning for Multi-Target Tracking},
	Year = {2011}
We present a principled model for occlusion reasoning in complex 
scenarios with frequent inter-object occlusions, and its application to 
multi-target tracking. To compute the putative overlap between pairs of 
targets, we represent each target with a Gaussian. Conveniently, this 
leads to an analytical form for the relative overlap – another Gaussian 
– which is combined with a sigmoidal term for modeling depth relations. 
Our global occlusion model bears several advantages: Global target 
visibility can be computed efficiently in closed-form, and varying 
degrees of partial occlusion can be naturally accounted for. Moreover, 
the dependence of the occlusion on the target locations – i.e. the 
gradient of the overlap – can also be computed in closedform, which 
makes it possible to efficiently include the proposed occlusion model in 
a continuous energy minimization framework. Experimental results on 
seven datasets confirm that the proposed formulation consistently 
reduces missed targets and lost trajectories, especially in challenging 
scenarios with crowds and severe inter-object occlusions. 

Multi-target Tracking by Continuous Energy Minimization

A. Andriyenko and K. Schindler
CVPR 2011
bibtex | abstract | paper | poster | video | slides | project page

	Author = {Anton Andriyenko and Konrad Schindler},
	Booktitle = {CVPR},
	Title = {Multi-target Tracking by Continuous Energy Minimization},
	Year = {2011}
We propose to formulate multi-target tracking as minimization of a 
continuous energy function. Other than a number of recent approaches we 
focus on designing an energy function that represents the problem as 
faithfully as possible, rather than one that is amenable to elegant 
optimization. We then go on to construct a suitable optimization scheme 
to find strong local minima of the proposed energy. The scheme extends 
the conjugate gradient method with periodic trans-dimensional jumps. 
These moves allow the search to escape weak minima and explore a much 
larger portion of the variable-dimensional search space, while still 
always reducing the energy. To demonstrate the validity of this approach 
we present an extensive quantitative evaluation both on synthetic data 
and on six different real video sequences. In both cases we achieve a 
significant performance improvement over an extended Kalman filter 
baseline as well as an ILP-based state-of-the-art tracker. 


Globally Optimal Multi-target Tracking on a Hexagonal Lattice

A. Andriyenko and K. Schindler
ECCV 2010
bibtex | abstract | poster | video

	Author = {Anton Andriyenko and Konrad Schindler},
	Booktitle = {ECCV},
	Title = {Globally Optimal Multi-target Tracking on a Hexagonal Lattice},
	Year = {2010}
We propose a global optimisation approach to multi-target tracking. The 
method extends recent work which casts tracking as an integer linear 
program, by discretising the space of target locations. Our main 
contribution is to show how dynamic models can be integrated in such an 
approach. The dynamic model, which encodes prior expectations about 
object motion, has been an important component of tracking systems for a 
long time, but has recently been dropped to achieve globally optimisable 
objective functions. We re-introduce it by formulating the optimisation 
problem such that deviations from the prior can be measured 
independently for each variable. Furthermore, we propose to sample the 
location space on a hexagonal lattice to achieve smoother, more accurate 
trajectories in spite of the discrete setting. Finally, we argue that 
non-maxima suppression in the measured evidence should be performed 
during tracking, when the temporal context and the motion prior are 
available, rather than as a preprocessing step on a per-frame basis. 
Experiments on five different recent benchmark sequences demonstrate the 
validity of our approach.


A Practical Approach for Photometric Acquisition of Hair Color

A. Zinke, M. Rump, T. Lay, A. Weber, A. Andriyenko and R. Klein
bibtex | abstract

	Author = {Zinke, Arno and Rump, Martin and Lay, Tom\'{a}s and Weber, Andreas and Andriyenko, Anton and Klein, Reinhard},
	Booktitle = {ACM SIGGRAPH Asia 2009 papers},
	Title = {A Practical Approach for Photometric Acquisition of Hair Color},
	Year = {2009}
In this work a practical approach to photometric acquisition of hair 
color is presented. Based on a single input photograph of a simple setup 
we are able to extract physically plausible optical properties of hair 
and to render virtual hair closely matching the original. Our approach 
does not require any costly special hardware but a standard consumer 
camera only. 


Our code is available for the following papers


Annotations, detections, and other data for selected datasets are available here.


ACVT, University of Adelaide
Level 5, Ingkarni Wardli Building
Adelaide SA 5005, Australia
Phone: +61 (08) 8313-6168
Fax: +61 (08) 8313-4366
E-Mail: anton.milan@adelaide (replace adelaide with
You're welcome to encrypt your message using this PGP key