Go Back   WarriorForum - Internet Marketing Forums > Warrior Support Forums > Programming Talk
Register Blogs FAQ Social Groups CalendarHelp Desk

Reply
 
LinkBack Thread Tools
Old 08-27-2009, 08:15 AM   #1
www.chandan.in
 
chandan's Avatar
 
Join Date: Oct 2008
Posts: 7
Thanks: 7
Thanked 0 Times in 0 Posts
Default words splitting

how to do word splitting

if i give buynow it should give buy now

if i give worldtraveltour then world travel tour even (world travel rave our tour ) such combo

if i give domainsitea it should give domain site a

etc

any dictionary tools , class files are available for this task ?

thanks

chandan is offline   Reply With Quote
Old 08-27-2009, 12:42 PM   #2
Business Pro
War Room Member
 
markfail's Avatar
 
Join Date: Jul 2009
Location: Scarborough
Posts: 169
Thanks: 2
Thanked 11 Times in 10 Posts
Social Networking View Member's Twitter Profile 
Contact Info
Send a message via MSN to markfail Send a message via Skype™ to markfail
Default Re: words splitting

hi,

if your using php try using the explode function: PHP: explode - Manual

*NEW* Wordpress Auction Theme - your own Flippa or eBay website in minutes!
Wordpress Directory Script - creating a directory website is easy!
Wordpress Shopping Cart - setup your own online store or Amazon affiliate store!
Wordpress Classifieds Theme
*NEW* Wordpress Coupon Websites - Earn BIG affiliate commissions with a coupon code website.
markfail is offline   Reply With Quote
Old 08-27-2009, 12:48 PM   #3
www.chandan.in
 
chandan's Avatar
 
Join Date: Oct 2008
Posts: 7
Thanks: 7
Thanked 0 Times in 0 Posts
Default Re: words splitting

thanks

actually the input is random can be anything so explode function not fits

i just given example with buynow , worldtraveltour

but it can be like ksadas a junk name which should be splitted with sad das words too

chandan is offline   Reply With Quote
Old 08-27-2009, 12:50 PM   #4
Business Pro
War Room Member
 
markfail's Avatar
 
Join Date: Jul 2009
Location: Scarborough
Posts: 169
Thanks: 2
Thanked 11 Times in 10 Posts
Social Networking View Member's Twitter Profile 
Contact Info
Send a message via MSN to markfail Send a message via Skype™ to markfail
Default Re: words splitting

how would it know which words to split?

u can add the words u want to an array and then just check the array, if the word is found then split it.

*NEW* Wordpress Auction Theme - your own Flippa or eBay website in minutes!
Wordpress Directory Script - creating a directory website is easy!
Wordpress Shopping Cart - setup your own online store or Amazon affiliate store!
Wordpress Classifieds Theme
*NEW* Wordpress Coupon Websites - Earn BIG affiliate commissions with a coupon code website.
markfail is offline   Reply With Quote
Old 08-27-2009, 12:51 PM   #5
Senior Warrior Member
War Room Member
 
Steve Diamond's Avatar
 
Join Date: Apr 2006
Location: Tucson, AZ, USA.
Posts: 1,025
Thanks: 120
Thanked 158 Times in 115 Posts
Social Networking View Member's FaceBook Profile  View Member's Twitter Profile  View Member's YouTube Profile
Contact Info
Send a message via Skype™ to Steve Diamond
Default Re: words splitting

No, explode isn't going to help because you don't know where the words are divided. That's the whole point. The only way to do this is with a dictionary lookup, as the OP implied.

I don't know of any existing classes that do this. It wouldn't be too hard to write one if you had a good dictionary, but the tricky part would be making it quick and efficient. (Obviously, Google is very good at it.)

Steve

Executive I.T. consulting for small/medium business
Website development | PHP - MySQL - JavaScript expert programming
Software requirements analysis | Specification writing
Project management | Vendor relationship management
Steve Diamond is offline   Reply With Quote
Old 08-27-2009, 01:15 PM   #6
Business Pro
War Room Member
 
markfail's Avatar
 
Join Date: Jul 2009
Location: Scarborough
Posts: 169
Thanks: 2
Thanked 11 Times in 10 Posts
Social Networking View Member's Twitter Profile 
Contact Info
Send a message via MSN to markfail Send a message via Skype™ to markfail
Default Re: words splitting

Steve, you miss understand,

If you have an array of words already, you can use this array to check if the word exists within a string and then extract it using explode.

Either than or you can do it manually but i know which one i would prefer...

*NEW* Wordpress Auction Theme - your own Flippa or eBay website in minutes!
Wordpress Directory Script - creating a directory website is easy!
Wordpress Shopping Cart - setup your own online store or Amazon affiliate store!
Wordpress Classifieds Theme
*NEW* Wordpress Coupon Websites - Earn BIG affiliate commissions with a coupon code website.
markfail is offline   Reply With Quote
Old 08-27-2009, 01:18 PM   #7
www.chandan.in
 
chandan's Avatar
 
Join Date: Oct 2008
Posts: 7
Thanks: 7
Thanked 0 Times in 0 Posts
Default Re: words splitting

Quote:
Originally Posted by markfail View Post
Steve, you miss understand,

If you have an array of words already, you can use this array to check if the word exists within a string and then extract it using explode.

Either than or you can do it manually but i know which one i would prefer...
no not possible to have words in array because it will be too lengthy to to put the dictionary words in array

chandan is offline   Reply With Quote
Old 08-27-2009, 01:37 PM   #8
Senior Warrior Member
War Room Member
 
Steve Diamond's Avatar
 
Join Date: Apr 2006
Location: Tucson, AZ, USA.
Posts: 1,025
Thanks: 120
Thanked 158 Times in 115 Posts
Social Networking View Member's FaceBook Profile  View Member's Twitter Profile  View Member's YouTube Profile
Contact Info
Send a message via Skype™ to Steve Diamond
Default Re: words splitting

Quote:
Originally Posted by chandan View Post
no not possible to have words in array because it will be too lengthy to to put the dictionary words in array
Exactly. If you're thinking of PHP on a typical shared web server, the dictionary would be much too lengthy.

If you have a dedicated server with plenty of RAM, you could possibly write a C application taking this approach. Or you could virtualize the array. Or you could pre-load only a subset of the most common words in the dictionary, then do a database lookup as a last resort.

As I indicated in my first post, the tricky part is to do it quickly and efficiently.

Steve

Executive I.T. consulting for small/medium business
Website development | PHP - MySQL - JavaScript expert programming
Software requirements analysis | Specification writing
Project management | Vendor relationship management
Steve Diamond is offline   Reply With Quote
Old 08-28-2009, 10:24 PM   #9
Lisa Dozois
War Room Member
 
lisag's Avatar
 
Join Date: Jan 2006
Location: Florida, USA.
Posts: 612
Thanks: 85
Thanked 221 Times in 110 Posts
Social Networking View Member's Twitter Profile 
Default Re: words splitting

Sometimes us programmers are guilty of trying to provide a solution to a problem we don't fully understand.

Chandan, you told us WHAT you want to do, but not WHY you want to do it. If we understand why you are trying to do this, maybe a clear solution will pop up.

-- Lisa G
lisag is offline   Reply With Quote
Old 08-29-2009, 02:09 AM   #10
Business Pro
War Room Member
 
markfail's Avatar
 
Join Date: Jul 2009
Location: Scarborough
Posts: 169
Thanks: 2
Thanked 11 Times in 10 Posts
Social Networking View Member's Twitter Profile 
Contact Info
Send a message via MSN to markfail Send a message via Skype™ to markfail
Default Re: words splitting

Quote:
Originally Posted by lisag View Post
Sometimes us programmers are guilty of trying to provide a solution to a problem we don't fully understand.

Chandan, you told us WHAT you want to do, but not WHY you want to do it. If we understand why you are trying to do this, maybe a clear solution will pop up.
ah, very well said.

*NEW* Wordpress Auction Theme - your own Flippa or eBay website in minutes!
Wordpress Directory Script - creating a directory website is easy!
Wordpress Shopping Cart - setup your own online store or Amazon affiliate store!
Wordpress Classifieds Theme
*NEW* Wordpress Coupon Websites - Earn BIG affiliate commissions with a coupon code website.
markfail is offline   Reply With Quote
Old 08-31-2009, 06:43 AM   #11
www.chandan.in
 
chandan's Avatar
 
Join Date: Oct 2008
Posts: 7
Thanks: 7
Thanked 0 Times in 0 Posts
Default Re: words splitting

Quote:
Originally Posted by lisag View Post
Sometimes us programmers are guilty of trying to provide a solution to a problem we don't fully understand.

Chandan, you told us WHAT you want to do, but not WHY you want to do it. If we understand why you are trying to do this, maybe a clear solution will pop up.
it will be used for a name suggestion like when user searching a whois of domain, or simple name search

chandan is offline   Reply With Quote
Old 08-31-2009, 07:39 AM   #12
Lisa Dozois
War Room Member
 
lisag's Avatar
 
Join Date: Jan 2006
Location: Florida, USA.
Posts: 612
Thanks: 85
Thanked 221 Times in 110 Posts
Social Networking View Member's Twitter Profile 
Default Re: words splitting

I would start here:
Eight word lists to help you creating the perfect word game : Emanuele Feronato

Grab those keyword lists and build a MySQL table.

Since you aren't looking for anagrams; that is you don't want to find characters in random order, just linear order, you need to iterate through the string, one character at a time, concatenating the next character as you go.

So, you take the string and you search for the first character. If a word is found you push it on to an array.

Here's a matrix for the 11 character string: isthisright

Character Position
1
1,2
1,2,3
1,2,3,4
1,2,3,4,5
1,2,3,4,5,6
1,2,3,4,5,6,7
1,2,3,4,5,6,7,8
1,2,3,4,5,6,7,8,9
1,2,3,4,5,6,7,8,9,10,11
2
2,3
2,3,4
2,3,4,5
2,3,4,5,6
2,3,4,5,6,7
2,3,4,5,6,7,8
2,3,4,5,6,7,8,9
2,3,4,5,6,7,8,9,10,11
3
3,4
3,4,5
3,4,5,6
3,4,5,6,7
3,4,5,6,7,8
3,4,5,6,7,8,9
3,4,5,6,7,8,9,10,11
...
Continue through all permutations until you have tested all the combinations against your word list.

I think this is the correct progression order but someone feel free to chime in if I got it wrong.

let's test: isthisright
*= found word

1=I*
1,2=IS*
1,2,3 = IST
1,2,3,4 = ISTH
1,2,3,4,5 = ISTHI
1,2,3,4,5,6 = ISTHIS (ISTHIS is NOT a word). You already found Is, the word This will come later in the progression.

1,2,3,4,5,6,7 = ISTHISR
1,2,3,4,5,6,7,8 = ISTHISRI
1,2,3,4,5,6,7,8,9 = ISTHISRIG
1,2,3,4,5,6,7,8,9,10 = ISTHISRIGH
1,2,3,4,5,6,7,8,9,10,11 = ISTHISRIGHT
2 = S
2,3 = ST
2,3,4 = STH
...
Continue through the matrix and you'll eventually make all the words.

-- Lisa G
lisag is offline   Reply With Quote
Old 09-04-2009, 08:10 PM   #13
HyperActive Warrior
War Room Member
 
Join Date: Oct 2002
Posts: 360
Thanks: 112
Thanked 48 Times in 39 Posts
Default Re: words splitting

Whatever solution you use be careful with situations like:
wordsexpress
wordsexchange

Doing it on a character by character case to find dictionary words might give you some unexpected/undesired results

Even Google makes mistakes when analyzing/splitting such kind of strings into words... and it was (don't know if still is) one of the reasons that many domains were flagged as adult domains.

Carlos
CMartin is offline   Reply With Quote
Old 09-04-2009, 09:06 PM   #14
Lisa Dozois
War Room Member
 
lisag's Avatar
 
Join Date: Jan 2006
Location: Florida, USA.
Posts: 612
Thanks: 85
Thanked 221 Times in 110 Posts
Social Networking View Member's Twitter Profile 
Default Re: words splitting

Quote:
Originally Posted by CMartin View Post
Whatever solution you use be careful with situations like:
wordsexpress
wordsexchange

Doing it on a character by character case to find dictionary words might give you some unexpected/undesired results

Even Google makes mistakes when analyzing/splitting such kind of strings into words... and it was (don't know if still is) one of the reasons that many domains were flagged as adult domains.

Carlos
Good catch Carlos. It would be a simple process to build a "kill list" of words you don't want to display.

-- Lisa G
lisag is offline   Reply With Quote
Old 09-04-2009, 09:15 PM   #15
Lisa Dozois
War Room Member
 
lisag's Avatar
 
Join Date: Jan 2006
Location: Florida, USA.
Posts: 612
Thanks: 85
Thanked 221 Times in 110 Posts
Social Networking View Member's Twitter Profile 
Default Re: words splitting

Quote:
Originally Posted by lisag View Post
Good catch Carlos. It would be a simple process to build a "kill list" of words you don't want to display.
Here's a list to get you started.

** WARNING **
This link leads to a dirty word list that you may find offensive. It's intended use is to build a dirty word filter and not to cater to anyone's prurient interests. If dirty words offend you, don't click.

http://drupal.org/files/issues/dirtywords.txt

-- Lisa G
lisag is offline   Reply With Quote
Old 09-04-2009, 09:59 PM   #16
HyperActive Warrior
War Room Member
 
Join Date: Oct 2002
Posts: 360
Thanks: 112
Thanked 48 Times in 39 Posts
Default Re: words splitting

Quote:
Originally Posted by lisag View Post
Good catch Carlos. It would be a simple process to build a "kill list" of words you don't want to display.
The point with the examples I provided was not to "kill" words from the string, but instead of splitting them correctly:
- wordsexpress should be split as: words express
- wordsexchange should be split as: words exchange

Hmmm... but then who guarantees me or anyone else if the way they are split are in fact the correct way? Maybe the domain owner really registered "word sex press" or "word sex change"

In other words... there will be always domain strings that can be split in several ways with very different meanings. Developing an algorithm to deal with these (and many others) type of situations can be very complex if there's a need to be somewhat "perfect" when splitting domain strings into words, not to mention if there's also a need to optimize it for speed.

Carlos
CMartin is offline   Reply With Quote
Old 09-05-2009, 04:37 AM   #17
Advanced Warrior
War Room Member
 
Ken Durham's Avatar
 
Join Date: Jun 2009
Location: Chesterton, IN
Posts: 923
Thanks: 129
Thanked 193 Times in 153 Posts
Default Re: words splitting

Interesting...

Here's you some dictionaries Kevin's Word List Page

Now find a good one and loop through the words counting characters and picking out words from your concatenated string.
Via PHP use strpos() to grab your first word,mark the position of the next character to start a new loop at and check what is left over... then print or assign these to your array and discard garbage.
Of course you have to deal with if someone tries xapplesandoranges


So for every first word loop you have that finds an initial match you have to run inner pattern matching until you run out of characters or dictionary words.
Then move to your next potential phrase.
All while checking currently found phrases.

I think there are close to 3/4 of a million words in the English dialect not counting slang..not sure how many words are in any of those dictionaries

Of course you would want to include a thesaurus so you can have related phrases sent back also. jokes

Yeah, that would take some thought how to optimize....
The system would need to "learn" somehow so it would record common phrases in order to become faster over time utilizing the dictionary less and less.
Might make for an interesting project.

We'll develop it on your servers though since it may take hours to run killing everything else while it ran LOL

good luck

Webmaster Services
List Your Wealth Building Systems and Services for Free

Insanity is doing the same thing over and over and expecting a different result ~ Einstein
Insanity is doing the same thing over and over and never getting the same results ~ Ken

Ken Durham is offline   Reply With Quote
Old 09-11-2009, 10:37 AM   #18
Advanced Warrior
War Room Member
 
sndas's Avatar
 
Join Date: Jun 2009
Posts: 714
Blog Entries: 3
Thanks: 56
Thanked 28 Times in 26 Posts
Social Networking View Member's Myspace Profile  View Member's FaceBook Profile  View Member's Twitter Profile 
Contact Info
Send a message via Skype™ to sndas
Default Re: words splitting

Hi,
As per my programming experience goes the logic can be very simple excluding some special cases. Even a sentence can be splited to words.
Before that we should have a structural analysis of the Dictionary to be followed. So once the permutation and combination procedure works for a simple word split then it can work for a sentence even if for a paragraph too whichever inputted by the user.
It is of course hard to structure but not too hard.

Satya Das

sndas is offline   Reply With Quote
Reply

  WarriorForum - Internet Marketing Forums > Warrior Support Forums > Programming Talk

Tags
splitting, words

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off



All times are GMT -6. The time now is 03:46 AM.