|  | 
| 
 
 
 
 
 
 
 
 
    
    
 |  | 
 
 
	
		|  |  
	
	
		
	
	
	| 
			
			 
			
				August 22nd, 2003, 02:18 AM
			
			
			
		 |  
	| 
		
			|  | 
 Major General |  | 
					Join Date: Oct 2002 
						Posts: 2,174
					 Thanks: 0 
		
			
				Thanked 0 Times in 0 Posts
			
		
	      |  |  
    
	| 
				 Re: Scam Or Not? 
 
	Apparently you missed the numerous disclaimers:Quote: 
	
		| Originally posted by Imperator Fyron: You do not have to load every single address into active memory at once. In fact, with that loop, each address is deleted from active memory (essentially) after it is checked against the one you are comparing it to.
 |  rather than "is"; my note at the bottom
 
	and the  to indicate it was a worst-case (for that algorythm, worst-case = no duplicates) analysis.Quote: 
	
		| Granted, there are several ways to shave time off of the above analysis, but that just gives a general idea of what it would take. |  
 In the worst case, the Last entry checked must be checked against every other entry, and so all must be available (in memory, or accessed from the disk).  The point was to give a general idea of what was required, not the exact algorythm needed.  Things would be further complicated by the likelyhood that it isn't a matter of a single database of addresses being worked with.  There are a zillion (exaggeration) assumptions in my analysis, and several valid shortcuts that could be built into the algorythm.  It's an estimate to support what I said that DavidG had doubts about, not an exact analysis for that particular number set.
				__________________Of course, by the time I finish this post, it will already be obsolete.  C'est la vie.
 |  
	
		
	
	
	| 
			
			 
			
				August 22nd, 2003, 02:22 AM
			
			
			
		 |  
	| 
		
			|  | 
 Lieutenant Colonel |  | 
					Join Date: Jan 2002 Location: Dundas, Ontario, Canada 
						Posts: 1,498
					 Thanks: 0 
		
			
				Thanked 0 Times in 0 Posts
			
		
	      |  |  
    
	| 
				 Re: Scam Or Not? 
 
	They could, it's just a matter of the computer time required.  The standard algorythm for removeing duplicates goes something like:Quote: 
	
		| Originally posted by Jack Simth: 
 quote:Originally posted by DavidG:
 The biggest software company in the world that has written some of the most complex programs can't remove duplicate addresses from a list???  What's wrong with this picture.
 |  
 Well I'm not going to dispute your math (frankly I didn't take the time to really understand it
  )  But I have MSN Messanger at work.  I signed up for the service and provided my e-mail adress to MS exactly ONE time.  And yet I got that message 10 times.  Face it MS got something screwed up.  Other corps with large databases seem to get things OK. (like Symantec) |  
	
		
	
	
	| 
			
			 
			
				August 22nd, 2003, 02:25 AM
			
			
			
		 |  
	| 
		
			|  | 
 Lieutenant Colonel |  | 
					Join Date: Jan 2002 Location: Dundas, Ontario, Canada 
						Posts: 1,498
					 Thanks: 0 
		
			
				Thanked 0 Times in 0 Posts
			
		
	      |  |  
    
	| 
				 Re: Scam Or Not? 
 
	Perhaps this is the problem.  Now surely this shouldn't be hard to avoid.Quote: 
	
		| Originally posted by Jack Simth: However, apparently, it's possible to get on their lists more than once
 |  |  
	
		
	
	
	| 
			
			 
			
				August 22nd, 2003, 02:26 AM
			
			
			
		 |  
	| 
		
			|  | 
 Major General |  | 
					Join Date: Oct 2002 
						Posts: 2,174
					 Thanks: 0 
		
			
				Thanked 0 Times in 0 Posts
			
		
	      |  |  
    
	| 
				 Re: Scam Or Not? 
 
	Granted; MS could have done much better.  The analysis only applies when going back to fix it, not when compiling the list originally.  It is easier to maintain a no-duplicates list than it is to change a list to the no-duplicates variety.Quote: 
	
		| Originally posted by DavidG: Well I'm not going to dispute your math (frankly I didn't take the time to really understand it
  )  But I have MSN Messanger at work.  I signed up for the service and provided my e-mail adress to MS exactly ONE time.  And yet I got that message 10 times.  Face it MS got something screwed up.  Other corps with large databases seem to get things OK. (like Symantec) | 
				__________________Of course, by the time I finish this post, it will already be obsolete.  C'est la vie.
 |  
	
		
	
	
	| 
			
			 
			
				August 22nd, 2003, 02:30 AM
			
			
			
		 |  
	| 
		
			|  | Shrapnel Fanatic |  | 
					Join Date: Feb 2001 Location: Waterloo, Ontario, Canada 
						Posts: 11,451
					 Thanks: 1 
		
			
				Thanked 4 Times in 4 Posts
			
		
	      |  |  
    
	| 
				 Re: Scam Or Not? 
 Best would be an algorithm which checks new addresses against the list before even putting them in!Keep N small in the first place, and there's less trouble later.
 Each submitting machine could keep a filter list of the Last couple submissions so as to cut down on the work the main server has to do.
 
 If you had a sorted list, then the duplicate checking would be really easy.
 Decent insertion routines would help a lot too.
 Bucket sort to servers holding a piece of the list, then insert using your favorite routine.
 
 Get the n^2 work done as it drips in, so you have years to spend on the problem, instead of rushing it just before trying to send emails.
 
				__________________ 
				Things you want: |  
	
		
	
	
	| 
			
			 
			
				August 22nd, 2003, 02:51 AM
			
			
			
		 |  
	| 
		
			|  | 
 Major General |  | 
					Join Date: Oct 2002 
						Posts: 2,174
					 Thanks: 0 
		
			
				Thanked 0 Times in 0 Posts
			
		
	      |  |  
    
	| 
				 Re: Scam Or Not? 
 Actually, if you maintain it as a sorted, no-duplicates list from the start, the total computational effort is actually more along the lines of n*lg(n), which isn't unreasonable, even with a billion entries.  
 Come to think of it, one could always dump the current list into such an algorythm, and then only use ~ 30,000,000,000 comparisons - roughly eight hours, using the same numbers as the earlier analysis for a single machine.  Nevermind; MS has no real excuse.
 
				__________________Of course, by the time I finish this post, it will already be obsolete.  C'est la vie.
 |  
	
		
	
	
	
	
	
	
	| Thread Tools |  
	|  |  
	| Display Modes |  
	
	| 
		
		 Hybrid Mode |  
	| 
	|  Posting Rules |  
	| 
		
		You may not post new threads You may not post replies You may not post attachments You may not edit your posts 
 HTML code is On 
 |  |  |  |  |