英文:
Find a best practice for Business Intelligent data processing
问题
我正在一个管理人力资源的系统中工作,其中有一个BI(商业智能)部分,用于从主系统收集和处理数据,然后将处理后的数据可视化成图表、表格等。
例如,我们想要看到年龄在[18-38]范围内的人员(在轴1上)与他们的月薪(在轴2上)[在完整的薪水范围内]之间的关系。聚合值是按人数计算的。还有一个额外的步骤称为过滤器,用于仅筛选组织A的结果。
预期结果如下所示:
Age_18<28 Age_28<38 Age_38<48
Salary_<1000 12 25 45
Salary_1000<5000 12 10 2
Salary_>5000 1 1 2
当前的处理步骤如下:
- 搜索轴1:在组织A中搜索年龄范围为[18-38]的所有人员
- 搜索轴2:在组织A中搜索所有人员
- 合并轴1和轴2的结果
- 对每个条件计算人数,例如年龄在18<28且薪水<1000的人数为12,依此类推。
- 转换为JSON响应
由于要处理的情况很多,逻辑变得复杂难以维护。所有步骤都像上面手动处理一样。
因此,我想知道这是否是一个常见的问题,是否应该有一种通用的处理方式,例如设计模式、算法、Java库或以前我不知道的特定概念。
目标:
- 使代码更简单、可读性更强,易于维护
- 易于扩展,例如添加新的情况
我打算尝试的方法:
- 应用责任链和策略模式
- 不确定Apache Kafka是否是合适的方法
注意:上述只是一个非常简单的案例,可能在一个轴上包含多个项目,并带有一些额外条件。
英文:
I'm working in a system that manages human resources and it has a BI (Business Intelligent) part to collect and process data from main system, then visualize processed data into charts, tables, ..
For example, we want to see the relation between person age [in range 18 - 38] (in axis 1) and their monthly salary (in axis 2) [in full salary range]. The aggregation value is counting in person. There is also an additional step called Filter, to filter the result only in the organization A.
The expected result is like this:
Age_18<28 Age_28<38 Age_38<48
Salary_<1000 12 25 45
Salary_1000<5000 12 10 2
Salary_>5000 1 1 2
The current processing steps are as below:
- Search for axis1: Search all people with age range [18-38] in organization A
- Search for axis2: Seach all people in in organization A
- Merge results for axis1 and axis 2
- Counting people for each condition, for example number of people that has Age_18<28 AND Salary_<1000 is 12, and so on.
- Convert to json response
Because there are a lot of cases to handle, the logic becomes complicated to maintain. All steps are handled manually like above.
So I just wonder if this is a common problem and should have a common way to handle, For example a design pattern, or algorithm, or library (Java) or a specific concept to handle such things that I never know before.
Target:
- make code more simple, readable and maintainable
- easy to extend, i.e add new cases
What I'm about to try:
- Apply chain of responsibility + strategy patterns
- Just wonder if Apache Kafka would be the right way
Note: the above is just a very simple case, it might contains multiple items in 1 axis, with some additional conditions
答案1
得分: 0
这种思路可以这样理解,你正在一个3x3的频率表中累积计数。
-
编写一个简单的方法,按照以下规则将薪水映射到索引:
< 1000 => 0
1000 to < 5000 => 1
>= 5000 => 2有多种编写这个方法的方式。
-
编写一个简单的方法,按照以下规则将年龄映射到索引:
18 to < 28 => 0
28 to < 38 => 1
38 to < 48 => 2 -
组合起来,就像这样:
int counts[][] = new int[3][3];
对于每个人:p 在 ...
counts[ageIndex(p.age)]][salaryIndex(p.salary)] += 1;
你可以很容易地在Java中实现这一点,而且可能也可以在SQL或者你的商业智能系统的查询语言中实现,如果它有的话。
你可以将这个方法推广到M x M,以及更多的维度。如果你稍微努力一下,你实际上可以将这些映射实现为数据驱动的函数;例如:
public int mapToIndex(int value, int[] ranges) { ... }
需要注意的是,你的做法存在一个缺陷。员工的年龄可能小于18岁或大于48岁。
英文:
One way to think of this is that you are accumulating counts in a 3 x 3 frequency table.
-
Write a simple method to map the salary to an index as follows:
< 1000 => 0 1000 to < 5000 => 1 >= 5000 => 2
There are various ways to code this method.
-
Write a simple method to map the age to an index as follows:
18 to < 28 => 0 28 to < 38 => 1 38 to < 48 => 2
-
Put it together like this:
int counts[][] = new int[3][3]; for each person: p in ... counts[ageIndex(p.age)]][salaryIndex(p.salary)] += 1;
You could easily implement that in Java, and probably in SQL or in your BI system's query language as well .. if it has one.
You can generalize this to M x M, and more dimensions. If you put it in bit of effort about it, you can actually implement the mappings as a data driven function; e.g.
public int mapToIndex(int value, int[] ranges) { ... }
Note there is a flaw in what you are doing. Employees could be younger than 18 or older than 48.
专注分享java语言的经验与见解,让所有开发者获益!
评论