![]() |
| ||||||||
|
|||||||
![]() |
|
|
LinkBack | Thread Tools |
|
|
#1 |
|
www.chandan.in
Join Date: Oct 2008
Posts: 7
Thanks: 7
Thanked 0 Times in 0 Posts
|
how to do word splitting
if i give buynow it should give buy now if i give worldtraveltour then world travel tour even (world travel rave our tour ) such combo if i give domainsitea it should give domain site a etc any dictionary tools , class files are available for this task ? thanks
|
|
|
|
|
|
|
|
|
#2 |
|
Business Pro
War Room Member
Join Date: Jul 2009
Location: Scarborough
Posts: 99
Thanks: 1
Thanked 8 Times in 7 Posts
|
|
|
Need a Wordpress shopping cart plugin with themes and Amzon feeds??
★★★ SPECIAL OFFER ★★★ Turn $40 into $4,000 with shopperpress Free Wordpress Plugins - Wordpress Shopping Cart |
|
|
|
|
| The Following User Says Thank You to markfail For This Useful Post: |
|
|
#3 |
|
www.chandan.in
Join Date: Oct 2008
Posts: 7
Thanks: 7
Thanked 0 Times in 0 Posts
|
thanks
actually the input is random can be anything so explode function not fits i just given example with buynow , worldtraveltour but it can be like ksadas a junk name which should be splitted with sad das words too |
|
|
|
|
|
|
|
|
#4 |
|
Business Pro
War Room Member
Join Date: Jul 2009
Location: Scarborough
Posts: 99
Thanks: 1
Thanked 8 Times in 7 Posts
|
how would it know which words to split?
u can add the words u want to an array and then just check the array, if the word is found then split it. |
|
Need a Wordpress shopping cart plugin with themes and Amzon feeds??
★★★ SPECIAL OFFER ★★★ Turn $40 into $4,000 with shopperpress Free Wordpress Plugins - Wordpress Shopping Cart |
|
|
|
|
|
|
#5 |
|
Advanced Warrior
War Room Member
Join Date: Apr 2006
Location: Tucson, AZ, USA.
Posts: 972
Thanks: 106
Thanked 143 Times in 103 Posts
|
No, explode isn't going to help because you don't know where the words are divided. That's the whole point. The only way to do this is with a dictionary lookup, as the OP implied.
I don't know of any existing classes that do this. It wouldn't be too hard to write one if you had a good dictionary, but the tricky part would be making it quick and efficient. (Obviously, Google is very good at it.) Steve |
|
Executive I.T. consulting for small/medium business
Website development | PHP - MySQL - JavaScript expert programming Software requirements analysis | Specification writing Project management | Vendor relationship management |
|
|
|
|
| The Following User Says Thank You to Steve Diamond For This Useful Post: |
|
|
#6 |
|
Business Pro
War Room Member
Join Date: Jul 2009
Location: Scarborough
Posts: 99
Thanks: 1
Thanked 8 Times in 7 Posts
|
Steve, you miss understand,
If you have an array of words already, you can use this array to check if the word exists within a string and then extract it using explode. Either than or you can do it manually but i know which one i would prefer... |
|
Need a Wordpress shopping cart plugin with themes and Amzon feeds??
★★★ SPECIAL OFFER ★★★ Turn $40 into $4,000 with shopperpress Free Wordpress Plugins - Wordpress Shopping Cart |
|
|
|
|
|
|
#7 | |
|
www.chandan.in
Join Date: Oct 2008
Posts: 7
Thanks: 7
Thanked 0 Times in 0 Posts
|
Quote:
because it will be too lengthy to to put the dictionary words in array
| |
|
|
||
|
|
|
|
|
#8 | |
|
Advanced Warrior
War Room Member
Join Date: Apr 2006
Location: Tucson, AZ, USA.
Posts: 972
Thanks: 106
Thanked 143 Times in 103 Posts
|
Quote:
If you have a dedicated server with plenty of RAM, you could possibly write a C application taking this approach. Or you could virtualize the array. Or you could pre-load only a subset of the most common words in the dictionary, then do a database lookup as a last resort. As I indicated in my first post, the tricky part is to do it quickly and efficiently. Steve | |
|
Executive I.T. consulting for small/medium business
Website development | PHP - MySQL - JavaScript expert programming Software requirements analysis | Specification writing Project management | Vendor relationship management |
||
|
|
|
|
|
#9 |
|
Lisa Dozois
War Room Member
Join Date: Jan 2006
Location: Florida, USA.
Posts: 316
Thanks: 39
Thanked 108 Times in 44 Posts
|
Sometimes us programmers are guilty of trying to provide a solution to a problem we don't fully understand.
Chandan, you told us WHAT you want to do, but not WHY you want to do it. If we understand why you are trying to do this, maybe a clear solution will pop up. |
|
-- Lisa G
>> Best Under $5 Christmas Treat For Kids http://hohohomail.com #1 Rated Copywriter On RentACoder www.lisa.myfreelanceportfolio.com Check out my WSO: http://tinyurl.com/lvkjq9 |
|
|
|
|
| The Following User Says Thank You to lisag For This Useful Post: |
|
|
#10 | |
|
Business Pro
War Room Member
Join Date: Jul 2009
Location: Scarborough
Posts: 99
Thanks: 1
Thanked 8 Times in 7 Posts
|
Quote:
| |
|
Need a Wordpress shopping cart plugin with themes and Amzon feeds??
★★★ SPECIAL OFFER ★★★ Turn $40 into $4,000 with shopperpress Free Wordpress Plugins - Wordpress Shopping Cart |
||
|
|
|
|
|
#11 | |
|
www.chandan.in
Join Date: Oct 2008
Posts: 7
Thanks: 7
Thanked 0 Times in 0 Posts
|
Quote:
like when user searching a whois of domain, or simple name search
| |
|
|
||
|
|
|
|
|
#12 |
|
Lisa Dozois
War Room Member
Join Date: Jan 2006
Location: Florida, USA.
Posts: 316
Thanks: 39
Thanked 108 Times in 44 Posts
|
I would start here:
Eight word lists to help you creating the perfect word game : Emanuele Feronato Grab those keyword lists and build a MySQL table. Since you aren't looking for anagrams; that is you don't want to find characters in random order, just linear order, you need to iterate through the string, one character at a time, concatenating the next character as you go. So, you take the string and you search for the first character. If a word is found you push it on to an array. Here's a matrix for the 11 character string: isthisright Character Position 1 1,2 1,2,3 1,2,3,4 1,2,3,4,5 1,2,3,4,5,6 1,2,3,4,5,6,7 1,2,3,4,5,6,7,8 1,2,3,4,5,6,7,8,9 1,2,3,4,5,6,7,8,9,10,11 2 2,3 2,3,4 2,3,4,5 2,3,4,5,6 2,3,4,5,6,7 2,3,4,5,6,7,8 2,3,4,5,6,7,8,9 2,3,4,5,6,7,8,9,10,11 3 3,4 3,4,5 3,4,5,6 3,4,5,6,7 3,4,5,6,7,8 3,4,5,6,7,8,9 3,4,5,6,7,8,9,10,11 ... Continue through all permutations until you have tested all the combinations against your word list. I think this is the correct progression order but someone feel free to chime in if I got it wrong. let's test: isthisright *= found word 1=I* 1,2=IS* 1,2,3 = IST 1,2,3,4 = ISTH 1,2,3,4,5 = ISTHI 1,2,3,4,5,6 = ISTHIS (ISTHIS is NOT a word). You already found Is, the word This will come later in the progression. 1,2,3,4,5,6,7 = ISTHISR 1,2,3,4,5,6,7,8 = ISTHISRI 1,2,3,4,5,6,7,8,9 = ISTHISRIG 1,2,3,4,5,6,7,8,9,10 = ISTHISRIGH 1,2,3,4,5,6,7,8,9,10,11 = ISTHISRIGHT 2 = S 2,3 = ST 2,3,4 = STH ... Continue through the matrix and you'll eventually make all the words. |
|
-- Lisa G
>> Best Under $5 Christmas Treat For Kids http://hohohomail.com #1 Rated Copywriter On RentACoder www.lisa.myfreelanceportfolio.com Check out my WSO: http://tinyurl.com/lvkjq9 |
|
|
|
|
| The Following User Says Thank You to lisag For This Useful Post: |
|
|
#13 |
|
HyperActive Warrior
War Room Member
Join Date: Oct 2002
Location: Portugal
Posts: 284
Thanks: 12
Thanked 16 Times in 12 Posts
|
Whatever solution you use be careful with situations like:
wordsexpress wordsexchange Doing it on a character by character case to find dictionary words might give you some unexpected/undesired results ![]() Even Google makes mistakes when analyzing/splitting such kind of strings into words... and it was (don't know if still is) one of the reasons that many domains were flagged as adult domains. Carlos |
|
|
|
| The Following User Says Thank You to CMartin For This Useful Post: |
|
|
#14 | |
|
Lisa Dozois
War Room Member
Join Date: Jan 2006
Location: Florida, USA.
Posts: 316
Thanks: 39
Thanked 108 Times in 44 Posts
|
Quote:
| |
|
-- Lisa G
>> Best Under $5 Christmas Treat For Kids http://hohohomail.com #1 Rated Copywriter On RentACoder www.lisa.myfreelanceportfolio.com Check out my WSO: http://tinyurl.com/lvkjq9 |
||
|
|
|
|
|
#15 | |
|
Lisa Dozois
War Room Member
Join Date: Jan 2006
Location: Florida, USA.
Posts: 316
Thanks: 39
Thanked 108 Times in 44 Posts
|
Quote:
** WARNING ** This link leads to a dirty word list that you may find offensive. It's intended use is to build a dirty word filter and not to cater to anyone's prurient interests. If dirty words offend you, don't click. http://drupal.org/files/issues/dirtywords.txt | |
|
-- Lisa G
>> Best Under $5 Christmas Treat For Kids http://hohohomail.com #1 Rated Copywriter On RentACoder www.lisa.myfreelanceportfolio.com Check out my WSO: http://tinyurl.com/lvkjq9 |
||
|
|
|
| The Following User Says Thank You to lisag For This Useful Post: |
|
|
#16 | |
|
HyperActive Warrior
War Room Member
Join Date: Oct 2002
Location: Portugal
Posts: 284
Thanks: 12
Thanked 16 Times in 12 Posts
|
Quote:
- wordsexpress should be split as: words express - wordsexchange should be split as: words exchange Hmmm... but then who guarantees me or anyone else if the way they are split are in fact the correct way? Maybe the domain owner really registered "word sex press" or "word sex change" ![]() In other words... there will be always domain strings that can be split in several ways with very different meanings. Developing an algorithm to deal with these (and many others) type of situations can be very complex if there's a need to be somewhat "perfect" when splitting domain strings into words, not to mention if there's also a need to optimize it for speed. Carlos | |
|
|
|
|
|
#17 |
|
HyperActive Warrior
War Room Member
Join Date: Jun 2009
Location: Chesterton, IN
Posts: 289
Thanks: 6
Thanked 40 Times in 36 Posts
|
Interesting...
Here's you some dictionaries Kevin's Word List Page Now find a good one and loop through the words counting characters and picking out words from your concatenated string. Via PHP use strpos() to grab your first word,mark the position of the next character to start a new loop at and check what is left over... then print or assign these to your array and discard garbage. Of course you have to deal with if someone tries xapplesandoranges So for every first word loop you have that finds an initial match you have to run inner pattern matching until you run out of characters or dictionary words. Then move to your next potential phrase. All while checking currently found phrases. I think there are close to 3/4 of a million words in the English dialect not counting slang..not sure how many words are in any of those dictionaries ![]() Of course you would want to include a thesaurus so you can have related phrases sent back also. jokesYeah, that would take some thought how to optimize.... The system would need to "learn" somehow so it would record common phrases in order to become faster over time utilizing the dictionary less and less. Might make for an interesting project. We'll develop it on your servers though since it may take hours to run killing everything else while it ran LOL good luck |
|
|
|
|
|
|
| The Following User Says Thank You to HomeComputerGames For This Useful Post: |
|
|
#18 |
|
HyperActive Warrior
War Room Member
|
Hi,
As per my programming experience goes the logic can be very simple excluding some special cases. Even a sentence can be splited to words. Before that we should have a structural analysis of the Dictionary to be followed. So once the permutation and combination procedure works for a simple word split then it can work for a sentence even if for a paragraph too whichever inputted by the user. It is of course hard to structure but not too hard. Satya Das |
|
|
|
|
|
|
![]() |
|
| Tags |
| splitting, words |
| Thread Tools | |
|
|
![]() |