OK, there's a billion questions:
- SQL Server
- stackoverflow.com/questions/485409/generating-a-histogram-from-column-values-in-a-database OP did not know the difference between count and histogram :-) But it's the number one Google result.
- stackoverflow.com/questions/19103991/create-range-bins-from-sql-server-table-for-histograms has a minor extra group by twist, but otherwise fine
- stackoverflow.com/questions/16268441/generate-histogram-in-sql-server
- SQLite
- stackoverflow.com/questions/67514208/how-to-optimise-creating-histogram-bins-in-sqlite perf only, benchmarking would be needed. SQLite.
- stackoverflow.com/questions/32155449/create-a-histogram-with-a-dynamic-number-of-partitions-in-sqlite variable bin size, same number of entries per bin
- stackoverflow.com/questions/60348109/histogram-for-time-periods-using-sqlite-regular-buckets-1h-wide time
- MySQL: stackoverflow.com/questions/1764881/getting-data-for-histogram-plot MySQL appears to extend
ROUND
to also round by integers:ROUND(numeric_value, -2)
, but this is not widely portable which is a shame - stackoverflow.com/questions/72367652/populating-empty-bins-in-a-histogram-generated-using-sql specifically asks about empty bins, which is amazing. Amazon Redshift dialect unfortunately, but answer provided works widely, and Redshift was forked from PostgreSQL, so there's hope. Those newb open source server focused projects that don't use AGPL!
Let's try it on SQLite 3.40.1, Ubuntu 23.04. Data setup:
sqlite3 tmp.sqlite 'create table t(x integer)'
sqlite3 tmp.sqlite <<EOF
insert into t values (
0,
2,
2,
3,
5,
6,
6,
8,
9,
17,
)
EOF
sqlite3 tmp.sqlite 'create index tx on t(x)'
For a bin size of 5 ignoring empty ranges we can:which produces the desired:
sqlite3 tmp.sqlite <<EOF
select floor(x/5)*5 as x,
count(*) as cnt
from t
group by 1
order by 1
EOF
0|4
5|5
15|1
And to consider empty ranges we can use SQL which outputs the desired:
genenerate_series
+ as per stackoverflow.com/questions/72367652/populating-empty-bins-in-a-histogram-generated-using-sql:sqlite3 tmp.sqlite <<EOF
select x, sum(cnt) from (
select floor(x/5)*5 as x,
count(*) as cnt
from t
group by 1
union
select *, 0 as cnt from generate_series(0, 15, 5)
)
group by x
EOF
0|4
5|5
10|0
15|1