Such constraints may formulate specific restrictions to web searches
This is implemented by scraping information for constraints from a web
page during parsing, and storing flags to the pages within the web index.
In this first step, only information for index pages ("index of", directory listings)
are scraped and stored in flags
- added new flag class kelondroBitfield
- added scraper method in condenser
- added bitfield structure for all scrape types (see also condenser)
- added bitfield structure for appearance locations (see RWIEntry)
- added handover protocol for remote search and index distribution
- extended kelondroColumn class to hold bitfield types
- added another search attribute on search page (index.html)
- extended search-filter to enable filtering of non-matching constraints
- set all new database types to be default
- refactoring: moved word hash generation to condenser class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2999 6c8d7289-2bf4-0310-a012-ef5d649a1542
- moved all url and index(RWI) entries to index package
- better naming to distinguish RWI entries and URL entries
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2937 6c8d7289-2bf4-0310-a012-ef5d649a1542
- enhanced local domain detection
- bugfixing for memory assignment in kelondroFlexSplit
- automatic memory assignment to caches according to available RAM
- bugfixes for details during search process
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2924 6c8d7289-2bf4-0310-a012-ef5d649a1542
- added test migration method to migrate the old LURL to a new LURL
the new LURL will be splitted into different tables for each month
this solves several problems:
- the biggest table in YaCy is splitted in different parts and can
also be managed in filesystems that are limited to 2GB
- the oldest entries can easily be identified, used for re-crawl und
deleted
- The complete database can be limited to a specific size (as wanted many times)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2755 6c8d7289-2bf4-0310-a012-ef5d649a1542
- removed settings that are in Basic Settings
- joined pages that belong together
- moved include pages from yacy/ to /
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2726 6c8d7289-2bf4-0310-a012-ef5d649a1542
snippet loading. Some access to header-db had been twice and even
more times in some cases. Snippet resource loading fixed.
Furthermore the snippet loading during remote search within the
remote peer has been disabled, but can be switched on remotely by
new flag 'includesnippet=true'
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2688 6c8d7289-2bf4-0310-a012-ef5d649a1542
- added new image 'B' in front of search results for bookmark generation
- added news generation when a public bookmark is added
- the '+' in front of search results has new meaning: positive rating for that result
- added news generation when a '+' is hit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2613 6c8d7289-2bf4-0310-a012-ef5d649a1542
there are now two full-controlled caches for incoming indexes:
- dhtIn
- dhtOut
during indexing, all indexes that shall not be transported to remote peers
because they belong to the own peer are stored to dhtIn. It is furthermore
ensured that received indexes are not again transmitted to other peers
directly. They may, however be transmitted later if the network grows.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2574 6c8d7289-2bf4-0310-a012-ef5d649a1542
this is a set of search processes that shall enrich search results
with specialized requests to realize a combination of search results
from different peers.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2571 6c8d7289-2bf4-0310-a012-ef5d649a1542
- added generation of a compressed index within remote peers during global search
- added selection of specific urls within remote peers during secondary global search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2539 6c8d7289-2bf4-0310-a012-ef5d649a1542
* cleaned up code
* added unit test code
* migrated ranking RCI computation to kelondroFlex and kelondroCollectionIndex
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2414 6c8d7289-2bf4-0310-a012-ef5d649a1542
to store the index entry. This is another step to move to the new database structure.
A side effect of this change is, that index storage uses much less RAM space,
which affects the index RAM cache.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2341 6c8d7289-2bf4-0310-a012-ef5d649a1542
* store() is now called explicitely
* more urls are written to the EURL table
* the EURL stack does not store the complete entry any more, now only the URL hash
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2323 6c8d7289-2bf4-0310-a012-ef5d649a1542
* mainly for the transition to the new indexing database structure
* a bugfix for an endless loop inside kelondroTree iteration
* a bugfix for bulk read inside a kelondroTree iteration; the bug caused that some elements had been iterated twice
* very strong speed enhancement for url/domain extraction
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2320 6c8d7289-2bf4-0310-a012-ef5d649a1542
- check can be disabled via property indexDistribution.dhtReceiptLimitEnabled
- upper bound can be configured via indexDistribution.dhtReceiptLimit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2234 6c8d7289-2bf4-0310-a012-ef5d649a1542
- null pointer exception during startup of a robinson-configured peer
- wrong time calculation of default value of re-crawl option
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2005 6c8d7289-2bf4-0310-a012-ef5d649a1542
*) new layout for bookmark list
(Allo: please take a look if it's acceptable for you)
*) crawlReceipt.java: displaying peer name in logging message
*) Network.html: adding button for manual peer ping
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1584 6c8d7289-2bf4-0310-a012-ef5d649a1542
- changed index collection process: indexes are not first flushed to indexEntity,
but now collected directly from ram cache and assortments
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1489 6c8d7289-2bf4-0310-a012-ef5d649a1542
- replaced usage of temporary IndexEntity by EntryContainer
- added more attributes to word index
- added exact-string search (using quotes in query)
- disabled writing into WORDS during search; EntryContainers are used instead
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1485 6c8d7289-2bf4-0310-a012-ef5d649a1542
- implemented hand-over of new word index attributes during remote search
- implemented word-distance computation during search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1382 6c8d7289-2bf4-0310-a012-ef5d649a1542