---------------------------------------------------------
Brosco's Liberty Basic Newsletter - Issue #11 - July 98
---------------------------------------------------------

        In this Issue:
            1)  A proposal for you to consider
            2)  Indexing Concepts


1)  A proposal for you to consider

If you look at my previous newsletters you see that most of
them hover around the 10Kb mark.  The reason for this is
simple.  ListBot provides this as a free service.  If I wanted
to write a larger Newsletter I would have to upgrade to their
commercial service - around US$100 per year.

Considering that I am already spending extra money with my
ISP to pay for accesses to my Web Site, plus I contribute
a substancial amount of time to produce this Newsletter, I
dont believe that I should incur yet another cost to
provide a FREE service.

There are now over 80 subscribers to this newsletter, if
everyone contributed a dollar or so - we would have 12 
months access to a Newsletter service with NO restrictions 
and NO adverts.

In reality, the likelyhood of everyone contributing a $1.00 is
non-existent.  If there are any contributions at all, they would
come from just a handful of generous souls!

So - does that handful of generous contributers exist?  If you
are interested in contributing to (what I believe to be) an
important LB resource - please send me an email stating the
amount that you could contribute.  You never know, I might 
get sufficient contributers to allow me to happily reply to
you that the contribution can be reduced!  Regardless,
at this stage, your email would NOT be a committment - it
would purely be an expression of interest.  I would not ask
for any money to be paid until I was sure that the full $100
could be collected.  I also would not subscribe to the 
commercial service until the full amount had been sent in.

By the way, this offer is NOT open to LBers who are already
substancial contributers in the way of Sites, code, assistance,
etc. - you know who I mean - but I wont list them here for fear 
of leaving someone off the list (and a 10KB maximum).

This offer is for LBers who would like to contribute something
back to the community - but so far have not found a way to do so.

WHAT WOULD YOU GET IN RETURN?

Basically, very little, other than the warm feeling that comes
from knowing you are assisting others.

Your name (and amount contributed) would be listed in the 
Newsletter - every issue for the next 12 months.  You may 
remain anonymous, if you desire.

Would you get any special privledges - like priority with 
support requests, or having your topic covered more quickly
in the Newsletter? - ABSOLUTELY NOT!!!

Would the Newsletters become bigger than 10Kb?  Not substantially,
but regularly I need to edit out 1 or 2Kb just to make it fit.  Also,
sometimes I need to direct you to an additional download for some
additional material that accompanies the Newsletter.  This would 
become unneccessary.

Could I potentially make money out of this?  ABSOLUTELY NOT.  All
contributions will be listed in the Newsletter.  You will be able
to see the status on the 'fund' and exactly how much money has
been donated.  Excess collections will either be refunded, or,
kept for the following year - depending on your preference. 



In I get little or no response from this - I will not bring
the matter up again - we will just continue the way we are.  I 
know that most people in the community are hobbiests, and many
on limited budgets.  It will not be a personal affront to me if
we can't get this working, but it will be a major disappointment!

Please email me your thoughts on this topic.

----------------------------------------------------------
2)  Indexing Concepts.


Let me get one thing cleared up immediately.  

In the PC world - many people refer to indexing a Random file as
using a DataBase.  This is technically incorrect.  A 'True' database
usually contains several files which are 'related' in some way.

For example - in a package that looks after the accounts of a business
there will be several files:

Customer info
Invoices
Inventory
BankBook info
etc.

It is all of these files in combination that make up the database - but
in isolation - they are just indexed files.

We have been using the word 'Database' to describe the file that is
holding our Movie Cassette collection - so we'll stick with it.  However,
to be technically correct - this newsletter is really about 'indexing'
techniques - not Database - OK?

First of all - what is an Index?
An Index is just another file that contains shortcuts to information 
stored in a data file.  For example - all your bank account information
is stored in data files on your bank's computer.  And when you use your
ATM card to withdraw money - the program must locate the information 
about your account to verify that you have sufficient funds available
to make the withdrawal.  If the program had to scan the entire data
file to find your account information - it would take far too long -
because most banks have hundreds of thousands (even millions) of 
accounts.

So to speed up the process of finding your account information, there
is a separate file - called an index.  This index just contains a list 
of all the Account Numbers and the Record Number of where the information
is stored on the data file.  Now scanning an index file that contains
millions of entries wouldn't be much faster than scanning the data file -
so how does this help?

The index file is created in such a way as to make searching very fast.
If you read my tutorial about array searching techniques - you will see
a very fast technique called 'Binary Search'.  An index works in a very
similar way.  You dont need to understand how this works - but if you
are a glutton for technical detail - here's a very simplistic example:


Suppose that a bank only had 256 active accounts.  The index would be
constructed like this:

Entry #1:  A pointer to another index entry that indexes Account Numbers 
in the range '1' to '128' - lets assume thats in Index Entry #2. And a
pointer to another index entry for the Account numbers in the range '129'
to '256'. 

Entry #2:  A pointer to another index entry that indexes Account Numbers
in the range '1' to '64' - say entry #3. And a pointer to the index 
entry for account numbers '65' to '128'.

Entry #3:  A pointer to another index entry that indexes Account Numbers
in the range '1' to '32', and a pointer to the index entries in the 
range '33' to '64'.


etc.

This 'Halving the keys' process is continued until there is only one
key left - and that index entry holds the Record number of the data 
in the data file.

So - to locate an Account Number - the Indexing software only needs to 
access 8 Index entries.  Since these entries are very small - (just the 
KEY and a KeyReference number ) all the index entries will normally be
held in memory.  Even for a very large database with thousands of records,
the number of disk accesses to the index file is minimal - just 2 or 3 -
maximum. 

On a test I did with DBdll - I created an index for a list of book
titles.  The title was 70 bytes long and there were 15,000 titles.
The index file created was 2 Megabytes! BUT - to find a find any
particular entry only required a maximum of 4 disk acccesses!  Usually,
2 of these entries where still in memory (in the buffer) so, in reality,
there were only 2 physical disk accesses for any particular search!


The sofware required to maintain an index is incredibly complex and 
way beyond the scope of this newsletter.  But the DBdll does all of 
this for you.


OK - enough technical stuff - lets get back to our Bank Account example.

You can understand that the data file needs to be indexed by the account
number so that your account information can be located very quickly.

The Account Number is referred to as the "PRIMARY KEY".  An Account 
Number is UNIQUE.  That is, every person will have a different account
number.

You got that?  Do NOT progress any further into this until you fully
understand the above concepts.  Read it over a few times if necessary. 


OK - I believe you - its not that hard to understand - is it?

Today we're having a bad day - we've lost our wallet and therefore
our ATM card.  So we walk into the bank to explain the problem and
request a new card.

The conversation will go something like this:

Bank Teller:   "What's your Account Number?".                        

ME:            "How the hell do I know?  That's recorded on my ATM card
                - and that's lost!  Geeeez - what a stupid question."

Bank Teller:   "OK - calm down sir, what's your name?"

ME:            "Brosco"

Bank Teller:   (after typing something in the computer)
               "ah yes, we have a couple of people by the name
                of Brosco with accounts here - could you just 
                give me your Address and Date-of-birth so that
                I can verify which account number is yours"

OK - the Accounts file is indexed by Account Number.  How did the 
bank teller find my account details so quickly?  The secret is that
there is a "SECONDARY INDEX".  This is a second index file, but 
instead of using account number as the Key - it uses customer Name
as a Key.

Now its quite possible for people to have the same name - so this
key is NOT UNIQUE.  The indexing program must allow for DUPLICATE 
keys pointing to different account information.

The DBdll will allow you to create as many indexes as you need.  The
only restriction is the FILES limitation in Windows - usually 16.


   (another example of a newsletter that is only 10KB - 
    another 1 or 2K would have allowed just that little extra!)

---------------------------------------------------------
 Newsletter written by: Brosco.
 Comments, requests or corrections mailto:brosco@orac.net.au

 Translated from Australian to English by an American:
 Alyce Watson -  Chief Editor.  Thanks Alyce.

