2020年5月29日 10:48:30未分类评论70阅读模式

英文:

Find a best practice for Business Intelligent data processing

问题

我正在一个管理人力资源的系统中工作，其中有一个BI（商业智能）部分，用于从主系统收集和处理数据，然后将处理后的数据可视化成图表、表格等。

例如，我们想要看到年龄在[18-38]范围内的人员（在轴1上）与他们的月薪（在轴2上）[在完整的薪水范围内]之间的关系。聚合值是按人数计算的。还有一个额外的步骤称为过滤器，用于仅筛选组织A的结果。

预期结果如下所示：

                 Age_18<28   Age_28<38 Age_38<48
Salary_<1000         12          25       45
Salary_1000<5000     12          10       2
Salary_>5000         1           1        2

当前的处理步骤如下：

搜索轴1：在组织A中搜索年龄范围为[18-38]的所有人员
搜索轴2：在组织A中搜索所有人员
合并轴1和轴2的结果
对每个条件计算人数，例如年龄在18<28且薪水<1000的人数为12，依此类推。
转换为JSON响应

由于要处理的情况很多，逻辑变得复杂难以维护。所有步骤都像上面手动处理一样。

因此，我想知道这是否是一个常见的问题，是否应该有一种通用的处理方式，例如设计模式、算法、Java库或以前我不知道的特定概念。

目标：

使代码更简单、可读性更强，易于维护
易于扩展，例如添加新的情况

我打算尝试的方法：

应用责任链和策略模式
不确定Apache Kafka是否是合适的方法

注意：上述只是一个非常简单的案例，可能在一个轴上包含多个项目，并带有一些额外条件。

英文:

I'm working in a system that manages human resources and it has a BI (Business Intelligent) part to collect and process data from main system, then visualize processed data into charts, tables, ..

For example, we want to see the relation between person age [in range 18 - 38] (in axis 1) and their monthly salary (in axis 2) [in full salary range]. The aggregation value is counting in person. There is also an additional step called Filter, to filter the result only in the organization A.

The expected result is like this:

                 Age_18&lt;28   Age_28&lt;38 Age_38&lt;48
Salary_&lt;1000         12          25       45
Salary_1000&lt;5000     12          10       2
Salary_&gt;5000         1           1        2

The current processing steps are as below:

Search for axis1: Search all people with age range [18-38] in organization A
Search for axis2: Seach all people in in organization A
Merge results for axis1 and axis 2
Counting people for each condition, for example number of people that has Age_18<28 AND Salary_<1000 is 12, and so on.
Convert to json response

Because there are a lot of cases to handle, the logic becomes complicated to maintain. All steps are handled manually like above.

So I just wonder if this is a common problem and should have a common way to handle, For example a design pattern, or algorithm, or library (Java) or a specific concept to handle such things that I never know before.

Target:

make code more simple, readable and maintainable
easy to extend, i.e add new cases

What I'm about to try:

Apply chain of responsibility + strategy patterns
Just wonder if Apache Kafka would be the right way

Note: the above is just a very simple case, it might contains multiple items in 1 axis, with some additional conditions

答案1

得分: 0

这种思路可以这样理解，你正在一个3x3的频率表中累积计数。

编写一个简单的方法，按照以下规则将薪水映射到索引：

< 1000 => 0
1000 to < 5000 => 1
>= 5000 => 2

有多种编写这个方法的方式。
编写一个简单的方法，按照以下规则将年龄映射到索引：

18 to < 28 => 0
28 to < 38 => 1
38 to < 48 => 2
组合起来，就像这样：

int counts[][] = new int[3][3];
对于每个人：p 在 ...
counts[ageIndex(p.age)]][salaryIndex(p.salary)] += 1;

你可以很容易地在Java中实现这一点，而且可能也可以在SQL或者你的商业智能系统的查询语言中实现，如果它有的话。

你可以将这个方法推广到M x M，以及更多的维度。如果你稍微努力一下，你实际上可以将这些映射实现为数据驱动的函数；例如：

public int mapToIndex(int value, int[] ranges) { ... }

需要注意的是，你的做法存在一个缺陷。员工的年龄可能小于18岁或大于48岁。

英文:

One way to think of this is that you are accumulating counts in a 3 x 3 frequency table.

Write a simple method to map the salary to an index as follows:
```
&lt; 1000         =&gt; 0
1000 to &lt; 5000 =&gt; 1
&gt;= 5000        =&gt; 2
```
There are various ways to code this method.

Write a simple method to map the age to an index as follows:

18 to &lt; 28     =&gt; 0
28 to &lt; 38     =&gt; 1
38 to &lt; 48     =&gt; 2

Put it together like this:

int counts[][] = new int[3][3];
for each person: p in ...
    counts[ageIndex(p.age)]][salaryIndex(p.salary)] += 1;

You could easily implement that in Java, and probably in SQL or in your BI system's query language as well .. if it has one.

You can generalize this to M x M, and more dimensions. If you put it in bit of effort about it, you can actually implement the mappings as a data driven function; e.g.

 public int mapToIndex(int value, int[] ranges) { ... }

Note there is a flaw in what you are doing. Employees could be younger than 18 or older than 48.

专注分享java语言的经验与见解，让所有开发者获益！

寻找业务智能数据处理的最佳实践

问题

答案1

Go like channels in Java

在低资源环境下使用Apache Cassandra和Go服务器

avatica-go客户端读取Phoenix查询服务器：[驱动程序：连接错误]

向Spring端点发送POST请求，返回状态码400。

Spring Boot控制器从Golang应用程序接收到的重定向请求会被重复执行两次。

可以在不将其读入内存的情况下多次重用HTTP请求体吗？

How to register my go lang microservice in Spring Eureka Service Discovery

在应用程序-go + BDD-java中模拟第三方服务

What is value, reference vs pointer and what these three example used to pass?

Do goroutines and light-weight Java threads mean we never need use thread pools and async code again?

发表评论