Distribution Table
In the trader product, distribution table for price is shown. This page explains how this table is created.
Last updated
In the trader product, distribution table for price is shown. This page explains how this table is created.
Last updated
The table can be formed for a specific crop variety for all mandi or for a specific mandi-crop-variety level.
Show price distribution based on percentile.
If there are extreme higher or lower prices then isolate them in different bins.
Quartiles (Q1, Q2, Q3): These divide the data into four equal parts. Q1 is the median of the lower half of the data, Q2 is the median of all data, and Q3 is the median of the upper half.
Interquartile Range (IQR): This is the difference between Q3 and Q1 and represents the middle 50% of the data.
Whiskers: These extend from Q1 and Q3 to the lowest and highest data points within 1.5 times the IQR. They help in identifying outliers.
Bins are defined intervals or ranges of crop prices.
Counting Logic: The counting logic involves determining how many data points (crop prices) fall within each bin's range. The key here is to ensure that each price is counted exactly once, respecting the inclusive or exclusive nature of bin boundaries.
No Overlap: By ensuring that the upper limit of one bin is the starting point of another and treating the upper limit as exclusive (except for the last bin), each price is categorized uniquely. This method avoids the possibility of a price being counted in two bins.
Complete Coverage: Inclusive lower bounds and an inclusive upper bound for the last bin ensure that all data points are counted, including the extremes.
Loops and Conditions: The for
loop iterates over each bin, and the if
condition inside the loop applies the correct counting logic based on the bin's position (general or last).
Summation and Comparison: The sum(1 for price in prices if condition)
pattern is a Pythonic way to count items satisfying a condition, translating directly from the mathematical notion of conditional counting.
Bins are initially set up to capture the range of prices from extreme values (min and max prices) through the distribution (quartiles and whiskers).
The bins are created to ensure no overlap, where the end of one bin is the start of another. This is critical for avoiding double-counting.
The lower boundary of bins is inclusive, meaning if a price is equal to the lower boundary, it's counted in the bin.
The upper boundary of bins (except the last one) is exclusive, meaning prices equal to the upper boundary are not included in the bin but rather in the next bin up.
The process iterates over each bin, applying a condition to count prices within the bin's boundaries.
For most bins, prices are counted if they are greater than or equal to the lower boundary and less than the upper boundary (bin_bounds[1] <= price < bin_bounds[0]
).
For the last bin, which includes the max price, the upper boundary is inclusive (bin_bounds[1] <= price <= bin_bounds[0]
). This ensures the max price is always included.
For each bin, the lower limit, upper limit, and count of prices within those limits are recorded.
Limits are adjusted (floor for lower, ceil for upper) to ensure they are integers, reflecting the discrete nature of price counts.