cardinality(unique count)를 샘플링해서 알아내는데 사용하는 확률자료구조 세익스피어 전 작품에서의 영어단어 세기 Philippe Flajolet 등이 2007년에 발표 HyperLogLog는 상대오차범위 조절 가능 상대오차는

cardinality가 N이라면 레지스터 크기는 log_2log_2N + O(1) Philippe Flajolet and Éric Fusy and Olivier Gandouet and Frédéric Meunier, “HyperLoglog: the analysis of a near-optimal cardinality estimation algorithm”, Discrete Mathematics and Theoretical Computer Science (DMTCS) Proceedings, vol AH, (2007) pp. 127–146. Matt Abrams, “Big Data Counting: How To Count A Billion Distinct Objects Using Only 1.5KB Of Memory”, High Scalability, Ilya Katsov, “Probabilistic Data Structures for Web Analytics and Data Mining”, Highly Scalable Blog, “stream-lib”, “Iterated logarithm”, Wikipedia,