Wednesday, December 07, 2005

After meeting with my supervisor

Still survive!!!
We already discussed about the possible combination of the 4 results from 4 methods.
He said what I wrote should be correct. Hooley!
So that means 15 combinations as I posted before.
The reason for this is the answer that I am not quite sure yet but I got some idea from a guy in Pantip (HotChoc). The following sentences is what he said:

"You can view the problem as putting letters (a,b,c,d) into slots (1,2,3,4).
The letter you pick first is not important, but the position is.

There are 4 'big' cases:
  1. all 4 slots are occupied by the letter you pick. In this case, the number of results is 4C4 = 4!/(4-4)!4! = 1. 4C4 means from 4 slots, pick 4 slots to be occupied by the letter.
  2. 3 slots are occupied by the letter picked. Number of results is 4C3 (its a combination because order is not important; the letter goes into slot 1,2,3 is the same as 2,3,1). 4C3 = 4!/(4-3)!1! = 4
  3. 2 slots are occupied by the letter picked. Number of results is 4C2 = 6. the remaining 2 slots can have 2 more cases...both slots occupied by the same letter: 2C2 = 1 and 1 slot occupied by one letter 2C1 = 2. so total results is 6 * (2+1) = 18.
    But that is wrong. because the case where 1 of the remain two slots occupied by 1 letter is duplicated: aabc is the same as aacb. so instead of 2C1 = 2, we must somehow deduct the duplicate and make it = 1.
  4. Another problem is that the case where the remain 2 slots occupied by 1 letter...
    this is in fact the mirror image of the big case so there are more duplicates.
    axay and xaya will be the same if x = y.

sorry i cannot think of an elegant solution right now. hopefully someone else can help or you can get some idea to continue it :)p.s. another option is to look at the problem as comparison like my post #7. you have 2^6 results. but of those results...there are some that will never happen (such as a=b, b=c a!=c). you have to deduct those out from 2^6."

Thanks for his comment!

Thursday, December 01, 2005

Future work within this few months

From the time planner in mini-thesis,
here is the list that have been done so far:

CART (Jul-Aug 05)
Done
  • install program and run with CMUDICT using 3 left and 3 right context letters
  • result for 10-fold cross validation = 58.48% words correct, 90.78% phonemes correct
  • result for leave-one-out = 59.40% words correct, 91.02% phonemes correct

Must do

  • Fine tune to find out the best result using 10-fold cross validation

Determine the effect of the different lexicon types (Sep-Oct 05)
Done

  • run table lookup I and II with 3 types of lexicon: common word, proper name, mix

Must do

  • run CART and Multi-strategy PbA with BEEP and BEEP+CMUDICT lexicon

Multi-strategy PbA
Done

  • implement and run with CMUDICT
  • result of the best combination for CMUDICT is 101010
  • run with CMUDICT to get the pronunciations for subjective evaluation

Permutation/Combination Problem

I am thinking about the possible combinations of the results from 3 methods plus 1 lexicon.
So it will be 4 pronunciations per name.
How many possible combinations of this 4 pronunciation will be?
First, I think it is a piece of cake problem.
Then, I realize that it's not easy to write the formula for this solution,
even I can list all posibilities manually.

To make sure that I listed all combinations correctly,
I also listed 4 to the power of 4 (= 4 x 4 x 4 x 4) = 256 combinations,
then grouped them into categories.
Finally, it came up with 15 patterns as following:
  1. {abcd} - all methods have all different pronunciations
  2. {a},{bcd} - only TLII different
  3. {b},{acd} - only CART different
  4. {c},{abd} - only PbA different
  5. {d},{abc} - only CMUDICT different
  6. {ab},{cd} - TLII same as CART, PbA same as CMUDICT
  7. {ac},{bd} - TLII same as PbA, CART same as CMUDICT
  8. {ad},{bc}
  9. {ab},{c},{d}
  10. {ac}, {b},{d}
  11. {ad}, {b},{c}
  12. {bc},{a},{d}
  13. {bd},{a},{c}
  14. {cd},{a},{b}
  15. {a},{b},{c},{d}

Note:

  • a, b, c, d means the results of methods: Table lookupII, CART, PbA and CMUDICT.
  • if the results are in the same set {} means the results have the same pronunciation.