寻找业务智能数据处理的最佳实践

huangapple 未分类评论47阅读模式
英文:

Find a best practice for Business Intelligent data processing

问题

我正在一个管理人力资源的系统中工作,其中有一个BI(商业智能)部分,用于从主系统收集和处理数据,然后将处理后的数据可视化成图表、表格等。

例如,我们想要看到年龄在[18-38]范围内的人员(在轴1上)与他们的月薪(在轴2上)[在完整的薪水范围内]之间的关系。聚合值是按人数计算的。还有一个额外的步骤称为过滤器,用于仅筛选组织A的结果。

预期结果如下所示:

                 Age_18<28   Age_28<38 Age_38<48
Salary_<1000         12          25       45
Salary_1000<5000     12          10       2
Salary_>5000         1           1        2

当前的处理步骤如下:

  1. 搜索轴1:在组织A中搜索年龄范围为[18-38]的所有人员
  2. 搜索轴2:在组织A中搜索所有人员
  3. 合并轴1和轴2的结果
  4. 对每个条件计算人数,例如年龄在18<28且薪水<1000的人数为12,依此类推。
  5. 转换为JSON响应

由于要处理的情况很多,逻辑变得复杂难以维护。所有步骤都像上面手动处理一样。

因此,我想知道这是否是一个常见的问题,是否应该有一种通用的处理方式,例如设计模式、算法、Java库或以前我不知道的特定概念。

目标:

  • 使代码更简单、可读性更强,易于维护
  • 易于扩展,例如添加新的情况

我打算尝试的方法:

  • 应用责任链和策略模式
  • 不确定Apache Kafka是否是合适的方法

注意:上述只是一个非常简单的案例,可能在一个轴上包含多个项目,并带有一些额外条件。

英文:

I'm working in a system that manages human resources and it has a BI (Business Intelligent) part to collect and process data from main system, then visualize processed data into charts, tables, ..

For example, we want to see the relation between person age [in range 18 - 38] (in axis 1) and their monthly salary (in axis 2) [in full salary range]. The aggregation value is counting in person. There is also an additional step called Filter, to filter the result only in the organization A.

The expected result is like this:

                 Age_18&lt;28   Age_28&lt;38 Age_38&lt;48
Salary_&lt;1000         12          25       45
Salary_1000&lt;5000     12          10       2
Salary_&gt;5000         1           1        2

The current processing steps are as below:

  1. Search for axis1: Search all people with age range [18-38] in organization A
  2. Search for axis2: Seach all people in in organization A
  3. Merge results for axis1 and axis 2
  4. Counting people for each condition, for example number of people that has Age_18<28 AND Salary_<1000 is 12, and so on.
  5. Convert to json response

Because there are a lot of cases to handle, the logic becomes complicated to maintain. All steps are handled manually like above.

So I just wonder if this is a common problem and should have a common way to handle, For example a design pattern, or algorithm, or library (Java) or a specific concept to handle such things that I never know before.

Target:

  • make code more simple, readable and maintainable
  • easy to extend, i.e add new cases

What I'm about to try:

  • Apply chain of responsibility + strategy patterns
  • Just wonder if Apache Kafka would be the right way

Note: the above is just a very simple case, it might contains multiple items in 1 axis, with some additional conditions

答案1

得分: 0

这种思路可以这样理解,你正在一个3x3的频率表中累积计数。

  1. 编写一个简单的方法,按照以下规则将薪水映射到索引:

    < 1000 => 0
    1000 to < 5000 => 1
    >= 5000 => 2

    有多种编写这个方法的方式。

  2. 编写一个简单的方法,按照以下规则将年龄映射到索引:

    18 to < 28 => 0
    28 to < 38 => 1
    38 to < 48 => 2

  3. 组合起来,就像这样:

    int counts[][] = new int[3][3];
    对于每个人:p 在 ...
    counts[ageIndex(p.age)]][salaryIndex(p.salary)] += 1;

你可以很容易地在Java中实现这一点,而且可能也可以在SQL或者你的商业智能系统的查询语言中实现,如果它有的话。

你可以将这个方法推广到M x M,以及更多的维度。如果你稍微努力一下,你实际上可以将这些映射实现为数据驱动的函数;例如:

public int mapToIndex(int value, int[] ranges) { ... }


需要注意的是,你的做法存在一个缺陷。员工的年龄可能小于18岁或大于48岁。

英文:

One way to think of this is that you are accumulating counts in a 3 x 3 frequency table.

  1. Write a simple method to map the salary to an index as follows:

    &lt; 1000         =&gt; 0
    1000 to &lt; 5000 =&gt; 1
    &gt;= 5000        =&gt; 2
    

    There are various ways to code this method.

  2. Write a simple method to map the age to an index as follows:

    18 to &lt; 28     =&gt; 0
    28 to &lt; 38     =&gt; 1
    38 to &lt; 48     =&gt; 2
    
  3. Put it together like this:

    int counts[][] = new int[3][3];
    for each person: p in ...
        counts[ageIndex(p.age)]][salaryIndex(p.salary)] += 1;
    

You could easily implement that in Java, and probably in SQL or in your BI system's query language as well .. if it has one.

You can generalize this to M x M, and more dimensions. If you put it in bit of effort about it, you can actually implement the mappings as a data driven function; e.g.

 public int mapToIndex(int value, int[] ranges) { ... }

Note there is a flaw in what you are doing. Employees could be younger than 18 or older than 48.

huangapple
  • 本文由 发表于 2020年5月29日 10:48:30
  • 转载请务必保留本文链接:https://java.coder-hub.com/62077873.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定