I have a table containing call records, and made a mining model from that table only. The model has 3 columns : calling_number, called_number, and target_operator, using Association Rule algorithm. The key is calling_number, input was operator, and predicted column called_number.
The result shows no rule, but there are results with item-set size of 1 (column) and 2 (column). On the top record of the result, SQL Server says there are 1891 support for called_number = 1891 and operator = 'INDOSAT'.
I queried the table with this query
SELECT DISTINCT calling_number
FROM call_records
WHERE called_number = '07786000815'
AND target_operator = 'INDOSAT';
It returns 2162 records instead of 1891. If I removed the DISTINCT qualifier, SQL Server returns 2159 records. Why is this differences with the result of mining?
Thank you,
BernaridhoSorry, the called_number was '07786000815' as in the query, not 1891 as I said before.
This is the correct statement.
The
result shows no rule, but there are results with item-set size of 1
(column) and 2 (column). On the top record of the result, SQL Server
says there are 1891 support for called_number = '07786000815' and operator =
'INDOSAT'.
Thank you,
Bernaridho
|||
Please check the following:
1) Algorithm parameters: in BI Dev Stuido, click Mining Models, right click the Microsoft_Association_Rules, select (Set Algorithm Parameters). For your case, you need to watch the follwing parameters:
1.a) Minimum_Itemset_Size should be 1 if you want to include all item set
1.b) Maximum_Itemset_Size should be 0 if you want to include all item set (default is 3)
Please only try with the above parameter setting with small data set. It may take very long on large dataset. You may also tune Minmum_Support and Maximum_Support to generate some rules.
BTW, in order to verfiy the number is right, you might want to use the following SQL query:
SELECT count(*)
FROM call_records
WHERE called_number = '07786000815'
AND target_operator = 'INDOSAT';
Thanks,
|||This could be because you have duplicate keys in your data
Try this operation
SELECT Count(calling_number) FROM call_records
and
SELECT Count(DISTINCT calling_number) FROM call_records
If these numbers are not the same, this is likely your problem. Each key represents a distinct case for the algorithm. If you have duplicate keys, the behavior is undefined and the infrastructure will simply pick one of the cases to mine. For example if I had these rows
calling_number called_number target_operator
222222222 111111111 A
222222222 888888888 B
333333333 888888888 A
444444444 111111111 B
Then you would be in a situation where either 11111111 or 888888888 will have support=2, but not both. One of the rows with key 22222222 would be ignored
|||Hi Yimin,I've set your two suggestions about values of parameter Maximum_Itemset_Size. They don't solve the problem.
I did query like this
SELECT count(*)
FROM call_records
WHERE called_number = '07786000815'
AND target_operator = 'INDOSAT';
It returns 2259.
Visual Studio still returns 1891 as the support for called_number = '07786000815' AND target_operator = 'INDOSAT' if the value of parameter Maximum_Itemset_Size = 0. Visual Studio returns 1891 as the support for the called_number '07786000815'.
BTW, I don't think setting the value of Maximum_Itemset_Size to 0 is sensible since I see that the default and minimum value of Minimum_Itemset_Size is 1. If the minimum value of Minimum_Itemset_Size is 1, the minimum value of Maximum_Itemset_Size should not be less than 1.
Considering Lennan's post, I will try to add another column as (unique) record identity. Essentially I'm finding who (what subscriber) from competitor worth taking over.
I'll get back to you as soon as I finish try using the unique record is as the key.
Thank you,
Bernaridho
|||Correction for my own post
Visual Studio returns 1891 as the support for the called_number '07786000815'.
should be read
Visual Studio returns 1891 as the support for the called_number '07786000815' if the value of Maximum_Itemset_Size is 1. There's no result combining it with the value of column operator,
which is sensible since the value of Maximum_Itemset_Size is 1.
Thanks
Bernaridho
|||Hi Jamie,
In my case the first query returns 47664, and the second query returns 28762. Yes, they are different. I altered the table adding a unique record-id. Then I remake mining structure and mining model, with the record-id as key column in the model.
This time, the answer is correct. The result viewer displays 2259 support for
the top associated called_number and operator.
Thank you,
Bernaridho|||
Glad I could help
Cheers
No comments:
Post a Comment