orbiter
bddc197453
reverted by-mistake removed change from low012/SVN 3068
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3070 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
1377c53aa3
extraction of media links from search results
...
these links are mixed to the snippets for testing purpose
(a final version will handle this differently)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3069 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
low012
586add4c6c
*) Better snippets: words like GNU/Linux will not prevent Linux or GNU from being marked if they are searchword (see http://www.yacy-forum.de/viewtopic.php?t=2891 )
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3068 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
borg-0300
8b7c543885
NullPointer fix
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3061 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
937ccd4e76
fix for snippet-generation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3060 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
auron_x
c086c71f17
*) fixed ArrayIndexOutOfBoundsException
...
--> http://www.yacy-forum.de/viewtopic.php?t=3210
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3058 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
c93cfdc23a
fix for http://www.yacy-forum.de/viewtopic.php?p=28564#28564
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3057 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
93a5ace330
fix for http://www.yacy-forum.de/viewtopic.php?p=28544#28544
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3056 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
bf0d820659
- added correct flagging of word properties
...
- added self-healing to database in case that wrong free-pointers exist
- added presentation of media links in snippets (does not yet work correctly)
- code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3055 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
10d888e70c
- added a media search for images, audio, video and applications
...
- new search options on search page
- new option in ViewInfo to display all links of a file
- enhanced collection data structure
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3054 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
a603c4d5e8
more code simplifications
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3052 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
9a85f5abc3
cleanup
...
- removed 'deleteComplete' flag; this was used especially for WORDS indexes
- shifted methods from plasmaSwitchboard to plasmaWordIndex
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3051 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
borg-0300
fbe1ee402b
plasmaCrawlLURL$kiter cleanup
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3050 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
773ba1e91a
- generalized object order handling
...
- controlled object order for all database tables
- migrated DHT position computation to correct base64-decoded values
this also closed the 'gaps' in the dht positions
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3049 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
borg-0300
15381cbf73
other bugfix
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3048 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
borg-0300
ad65cc9d2f
NullPointer fixes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3047 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
borg-0300
d33745a7ea
NullPointer
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3046 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
3a4933b63c
bugfix for
...
http://www.yacy-forum.de/viewtopic.php?p=28493#28493
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3045 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
109ed0a0bb
- cleaned up code; removed methods to write the old data structures
...
- added an assortment importer. the old database structures can
be imported with
java -classpath classes yacy -migrateassortments
- modified wordmigration. The indexes from WORDS are now imported
to the collection database. The call is
java -classpath classes yacy -migratewords
(as it was)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3044 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
052f28312a
removed assortments from indexing data structures
...
removed options to switch on assortments
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3041 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
2372b4fe0c
release 0.49
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3040 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
f8efb3c948
fixed a null pointer exception problem reported in the forum.
...
I cant find the forum entry any more because my girlfriend switched
off the power while the forum window was open.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3039 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
ad1e4aa88e
added selection of audio, video, image and application resources
...
to search procedure. This function can currently not used through the
search interface, but only through remote search.
added accumulation of search attributes to enable the audio, video,
image and application selection.
fixed a problem with external URL representation generation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3036 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
7cc4cec9c9
bugfix for assertion bugs documented in
...
http://www.yacy-forum.de/viewtopic.php?p=28261#28261
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3030 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
7dbcd358b4
fix for http://www.yacy-forum.de/viewtopic.php?p=28231#28231
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3021 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
86394e7a56
fix for cache-delete problem:
...
- better synchronization
- files are only deleted if they have been in the cache for 5 minutes
- hash-path for the HTCACHE is now default
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3018 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
ceb9e3aa17
- enhanced parser: collection of audio, video, image and application links
...
- enhanced condenser: better handling of utf-8 and pre-formatted texts
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3017 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
0b9370a9dc
fix for http://www.yacy-forum.de/viewtopic.php?p=28108#28108
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3013 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
b5a29e9651
- fix for snippets that are too short
...
- added keyword to snippet fetch to suppres removal of not-found snippet words (for debugging)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3009 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
f1528672b1
filtering of non-index pages during index-of search
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3004 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
8e7215475b
- extended ViewFile to use is as debugging-tool: you can now use the
...
post-parameter url to submit an url directly
- fixed some bugs in text parser (not all parts had been analysed)
- fixed a bug in remote search interface (could not handle constraints)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3001 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
30888e7a2f
implementation of search constraints
...
Such constraints may formulate specific restrictions to web searches
This is implemented by scraping information for constraints from a web
page during parsing, and storing flags to the pages within the web index.
In this first step, only information for index pages ("index of", directory listings)
are scraped and stored in flags
- added new flag class kelondroBitfield
- added scraper method in condenser
- added bitfield structure for all scrape types (see also condenser)
- added bitfield structure for appearance locations (see RWIEntry)
- added handover protocol for remote search and index distribution
- extended kelondroColumn class to hold bitfield types
- added another search attribute on search page (index.html)
- extended search-filter to enable filtering of non-matching constraints
- set all new database types to be default
- refactoring: moved word hash generation to condenser class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2999 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
49a83f99d9
- fix for wrong DHT ordering in DHT selection
...
- fix for http://www.yacy-forum.de/viewtopic.php?t=3112&highlight=
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2995 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
f4b547dc13
limited index transfer to peer with version 0.486
...
this protects peers with version below 0.486 from new RWI objects
(which they cannot handle)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2988 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
10a4ab5195
disabled some (more) write caches
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2987 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
09bcc10344
bugfix for some problems of last change with assortments
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2986 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
e3d75f42bd
final version of collection entry type definition
...
- the test phase of the new collection data structure is finished
- test data that had been generated is void. There will be no migration
- the new collection files are located in DATA/INDEX/PUBLIC/TEXT/RICOLLECTION
- the index dump is void. There will be no migration
- the new index dump is in DATA/INDEX/PUBLIC/TEXT/RICACHE
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2983 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
c9364246cc
introduced new RWI-Object.
...
This will be used for the final version of the collections.
The new object is not yet used.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2966 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
e628d34e16
patches for bad data
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2951 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
497428c8ec
refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2949 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
76fceb9997
refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2945 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
eeda881553
bugfix for last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2938 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
bb7d4b5d5e
refactoring to prepare new RWI entry object
...
- moved all url and index(RWI) entries to index package
- better naming to distinguish RWI entries and URL entries
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2937 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
bdc9216366
- more asserts
...
- some bugfixes
- some patches for bugs that are already in the database
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2935 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
1751a799ac
- deactivated all write buffers
...
- fixed a storage bug
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2933 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
ba967c4875
- bugfixes and debug code
...
- ne generalized index class indexCachedRI
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2930 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
ee4715a21c
- more asserts
...
- bugfix for performaceMemory
- refactoring of index ram cache: renamed indexRAMCacheRI to indexRAMRI, to make space for a cached indexRI, which should be named indexRAMCacheRI
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2925 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
114a76a86e
- added flag to urlhash that shows that domain is a local domain
...
- enhanced local domain detection
- bugfixing for memory assignment in kelondroFlexSplit
- automatic memory assignment to caches according to available RAM
- bugfixes for details during search process
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2924 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
b2d51be33c
bugfix for latest changes to entry generalization
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2922 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
8385557672
Small fix for the Cache Monitor when using proxyCacheLayout=hash
...
see: http://www.yacy-forum.de/viewtopic.php?p=27394#27394
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2916 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
f1ed55a5fc
bugfix for last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2913 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
8fdefd5c68
generalization of payload definition of index storage
...
this is one step forward to the migration to a new collection data format
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2912 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
ad248d61ca
*) more verbose exception
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2901 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hydrox
7e8669b15c
*) added possibility to "recycle" a DHTChunk that failed to transfer.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2898 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
low012
4feaa91890
*) Added additional MIME-Type.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2895 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
low012
89af433879
*) Deleted parts of WebCat that were not needed for parsing SWFs.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2893 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
46a712e195
- more asserts
...
- simplified indexURLEntry
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2891 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
low012
8c9bc7e341
*) extracting urls works now
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2890 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
low012
493391e42d
*) new flash parser, still experimental
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2888 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
215c4e65f1
code cleanup
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2887 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
bd4f43cd66
- fixed a null pointer exception bug
...
- switched off more write caches
- re-enabled index-abstracts search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2885 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
auron_x
194d42b6a7
*) changed PPM-calculation to be more accurate
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2884 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
fe8afaf426
switched off usage of write cache for imprortant databases
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2883 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
d3431433b0
more anonymization in logging
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2876 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
e6044e5198
bugfix for
...
http://www.yacy-forum.de/viewtopic.php?p=27207#27207
and
http://www.yacy-forum.de/viewtopic.php?p=27219#27219
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2875 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
78b7f6f7fd
bugfix for index remove bug,
...
appeared after search where snippet-loading triggered word removal
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2869 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
147d88cf23
re-design of database caching
...
this should reduce IO a lot, because write caches are now actived for all databases
- added new caching class that combines a read- and write-cache.
- removed old read and write cache classes
- removed superfluous RAM index (can be replaced by kelonodroRowSet)
- addoped all current classes that used the old caching methods
- more asserts, more bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2865 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
4e363108e1
- removed bad debug code that caused a large and unnecessary delay during global search
...
- fixed problem that global search results disappear after a search
- removed some stopwords
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2861 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
2a9d868f6d
- removed object cache from kelondroTree
...
- generalized object caching and added new object caching class
- added object caching wherever kelondroTree was used
- added object caching also to usage of kelondroFlex
- added object buffering (a write cache) to NURLs
- added many assert statements; fixed bugs here and there
- added missing close methods to latest added classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2858 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
3ffc5b8793
fixed problem with serverCharBuffer.append(char)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2821 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
06854988da
- full integration of new LURL database in INDEX
...
- added migration method for urlHash.db into INDEX
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2819 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
octoate
e4a3574b77
StringBuffer now resets every time the parser is called
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2817 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
karlchenofhell
ce237aefad
- assortment-sizes table from PerformanceQueues_p.html is not shown if not used
...
- escape query- and fragment-part of an url as well
- new resolveBackpath for urls: http://www.yacy-forum.de/viewtopic.php?t=2679#24867
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2815 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
a5b9b514c1
*) retry crawling without content-encoding if the content-encoding header was not correct
...
See: http://www.yacy-forum.de/viewtopic.php?p=26917#26917
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2811 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
92f774edd1
*) Better charset encoding detection
...
*) New testclass for charset encoding detection tests
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2808 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
b79e06615d
- added new LURL.Entry class for next database migration
...
- refactoring of affected classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2802 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
octoate
cc24dde5e0
First version of a MS Excel parser based on Apache POI
...
(event based parsing)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2801 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
karlchenofhell
4c63129136
- stupid mistake...
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2798 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
karlchenofhell
ebf0da2a45
- now the fix http://www.yacy-forum.de/viewtopic.php?t=2974 works
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2796 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
3d152bfe43
*) Logging message added
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2794 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
karlchenofhell
b5e40e2fa2
- fix for http://www.yacy-forum.de/viewtopic.php?t=2974 (no cache-sizes for new db)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2792 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
77a59a115d
refactoring of indexing methods
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2787 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
cbb1e710b9
*) removing old class
...
- was replaced by plasma/urlPattern/defaultURLPattern
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2765 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
c6d46f7ebd
null pointer bugfix
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2761 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
decb09df6d
*) Trying to be more tolerant against wrong charset names
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2760 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
e9afe39cbb
*) Trying to be more tolerant against wrong charset names
...
See: http://www.yacy-forum.de/viewtopic.php?p=26662
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2759 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
7526c831a8
*) Suppressing stracktrace
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2758 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
50f2578c55
- some bugfixing and code cleanup
...
- now assortments can completely left out if they do not exist
before startup and collection index is selected.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2757 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
bdf4c7c51e
added missing files for last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2756 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
a5dd0d41af
- refactoring of plasmaCrawlLURL.Entry to prepare new Entry format
...
- added test migration method to migrate the old LURL to a new LURL
the new LURL will be splitted into different tables for each month
this solves several problems:
- the biggest table in YaCy is splitted in different parts and can
also be managed in filesystems that are limited to 2GB
- the oldest entries can easily be identified, used for re-crawl und
deleted
- The complete database can be limited to a specific size (as wanted many times)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2755 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
octoate
1c4076da8a
First version of the MS Powerpoint parser based on Apache POI
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2753 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
5b75d64d7d
*) bugfix for last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2750 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
71ed104bc7
*) adding additional rpm mimetype (used by packman)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2749 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
6396f5971e
bugfixes and migration attempt toward new kelondroFlex db
...
- more synchronization
- bugfix for remove in collections
- bugfix in kelondroFlex (wrong exception condition!)
- options to use RAM, FLEX and TREE tables for Crawl URL stacker
- default for Crawl URL stacker is now FLEX (!)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2746 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
48f81acc0e
reverse SVN 2744, it is not needed
...
(this resulted from a small misunderstanding of the newest cache layout)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2745 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
1da9aece12
Repair DNS prefetch during cacheScan
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2744 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
22649408ad
*) Better errorhandling for charset encoding problem during content parsing
...
See: http://www.yacy-forum.de/viewtopic.php?t=2952
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2737 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
a9c7e3f061
*) Bugfix for NoSuchElementException
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2735 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
c8f3a7d363
added snippet-url re-indexing
...
- snippets will generate an entry in responseHeader.db
- there is now another default profile for snippet loading
- pages from snippet-loading will be indexed, indexing depth = 0
- better organization of default profiles
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2733 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
low012
2cfd4633ac
*) even better handling of searchwords in snippets, words can consist of letters and numbers now
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2732 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
e17fea7015
files in htcache are now stored in different hash/tree subdirectories
...
according to storage method
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2730 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
low012
2d3b7251a4
*) better handling of searchwords in snippets (see http://www.yacy-forum.de/viewtopic.php?t=2891 for details)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2728 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
25ae3d3161
generalized definition of hexhash
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2725 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
f0d747c723
removed deprecated method
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2723 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
5ff77612ac
bugfix for old WORDS storage method
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2722 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
0f10bdde22
more generic cache methods
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2721 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
6557112d8f
small fix for plasmaURLPool.getURL() needed for new alternative htcache layout
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2719 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
440c6ee657
Implement alternative htcache layout
...
mostly according to: http://www.yacy-forum.de/viewtopic.php?p=26205#26205
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2718 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
fd61209797
lines inside tags without punctuation are extended by a single dot.
...
This enables the condenser to distinguish the lines in a better way.
The result is a better preparation of snippets.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2715 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
1969522dc1
removed lowercase of snippets (and other things):
...
- added new sentence parser to condenser
- sentence parsing can now handle charsets
to do: charsets must be handed over to new sentence parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2712 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
43614f1b36
bugfix in collection index. the index for collections was not created correctly
...
The bugfix includes a migration function which starts automatically
after startup of yacy.
This applies only to you, if you are using the new collection index.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2711 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
db294687ea
enhanced logging
...
- more logging output
- fix in log line preparation
- added filter to log page
- some small bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2707 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
a9a0f51303
*) suppressing InterruptedException errormessage
...
See: http://www.yacy-forum.de/viewtopic.php?t=2915
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2705 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
1d4fb680ce
*) CrawlWorker.java: only keep content in memory if size is equal or less than 5MB
...
TODO: make this limit configurable
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2703 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
1586d57187
*) odtParser: better handling of large files
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2702 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
f17ce28b6d
*) plasmaHTCache:
...
- method loadResourceContent defined as deprecated.
Please do not use this function to avoid OutOfMemory Exceptions
when loading large files
- new function getResourceContentStream to get an inputstream of a cache file
- new function getResourceContentLength to get the size of a cached file
*) httpc.java:
- Bugfix: resource content was loaded into memory even if this was not requested
*) Crawler:
- new option to hold loaded resource content in memory
- adding option to use the worker class without the worker pool
(needed by the snippet fetcher)
*) plasmaSnippetCache
- snippet loader does not use a crawl-worker from pool but uses
a newly created instance to avoid blocking by normal crawling
activity.
- now operates on streams instead of byte arrays to avoid OutOfMemory
Exceptions when operating on large files
- snippet loader now forces the crawl-worker to keep the loaded
resource in memory to avoid IO
*) plasmaCondenser: adding new function getWords that can directly operate on input streams
*) Parsers
- keep resource in memory whenever possible (to avoid IO)
- when parsing from stream the content length must be passed to the parser function now.
this length value is needed by the parsers to decide if the parsed resource content is to large
to hold it in memory and must be stored to file
- AbstractParser.java: new function to pass the contentLength of a resource to the parsers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2701 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
630a955674
read snippets from cache in case they are not provided in RAM
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2700 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
dbc2e039bb
added time-out option parameter to call hierarchy
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2691 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
00746ca232
identified and fixed search performance problem caused by
...
snippet loading. Some access to header-db had been twice and even
more times in some cases. Snippet resource loading fixed.
Furthermore the snippet loading during remote search within the
remote peer has been disabled, but can be switched on remotely by
new flag 'includesnippet=true'
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2688 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
310f1c41cd
added option to see ranking scores in surftipps
...
and some cleanups
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2684 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
a2e3095044
*) Bugfix. Add missing plasmaParserDocument.close() calls
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2680 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
cd5f349666
*) Better handling of large files during parsing
...
Extracted text of files that are larger than 5MB is stored in a temp file instead of keeping it in memory
*) plasmaParserDocument.java; getText now returnes an inputStream instead of a byte array
*) plasmaParserDocument.java: new function getTextBytes returns the parsed content as byte array
Attention: the caller of this function has to ensure that enough memory is available to do this
to avoid OutOfMemory Exceptions
*) httpd.java: better error handling if the soaphander is not installed
*) pdfParser.java:
- better handling of documents with exotic charsets
- better handling of large documents
- better error logging of encrypted documents
*) rtfParser.java: Bugfix for UTF-8 support
*) tarParser.java: better handling of large documents
*) zipParser.java: better handling of large documents
*) plasmaCrawlEURL.java: new errorcode for encrypted documents
*) plasmaParserDocument.java: the extracted text can now be passed
to this object as byte array or temp file
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2679 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
low012
f8ac694e51
*) fixed a bug where searchword in snippets were not displayed bold in front of a punctuation mark (see http://www.yacy-forum.de/viewtopic.php?p=25998 )
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2677 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
df1629b05a
- code cleanup
...
- version 0.471
- moved surftipps to own web page
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2676 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
b73efd5565
*) missing changes needed because of last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2673 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
2463e5624a
'quick' release 0.47
...
- documentation update
- necessary bugfixes (missing css for new peers)
- reduced effect of search result redundancy filter
- removed some debug output, but not all
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2665 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
625c2ce6b1
*) bugfix for snippet fetching problem if content but not http header is available in cache
...
See: http://www.yacy-forum.de/viewtopic.php?p=25748
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2651 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
813a8a8179
*) migration of mimeTypeParser to jmimemagic 0.1
...
- better mimetype detection for rss feeds
- better mimetype detection for odt documents (less memory consuming)
- two new detector classes implementing MagicDetector interface of jmimemagic
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2650 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
3f5a4153a0
Make Peers more receptible to transferred indexes
...
- Set MaxWordCount for dhtInCache to indexDistribution.dhtReceiptLimit
so that the inCache gets flushed when the limit is passed
- Modify flushCacheSome to flush enough words to get below MaxWordCount immediately
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2649 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
b6c7b91582
*) Parser now throws an ParserException instead of returning null on parsing errors (e.g. needed by snippet fetcher)
...
*) better logging of parser failures
*) simplified usage of plasmaparser through switchboard
*) restructuring of crawler
- crawler now returns an error message if it is used in sync mode (e.g. by snippet fetcher)
*) snippet-fetcher: more verbose error messages
*) serverByteBuffer.java: adding new function append(String,encoding)
*) serverFileUtils.java: adding functions to copy only a given number of bytes between streams
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2641 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
1dc12d6659
*) Bugfix for shutdown problem caused by cacheScan thread
...
See: http://www.yacy-forum.de/viewtopic.php?p=25729
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2636 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
borg-0300
42173462f5
rename cutUrlText to shortenURLString;
...
other little things;
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2635 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
26dfbb7499
*) Bugfix for UTF-8: url names are now stored properly in stackcrawl, crawler, indexing queue and should be displayed correct on the gui
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2630 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
cf6acff2c2
*) Bugfix. htmlFilterInputStream document analysis did not work properly for documents smaller than the
...
default InputStream Buffer size.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2629 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
5c6251bced
*) some improvements for extended html document charset support
...
- new class htmlFilterInputStream.java which allows to pre-analyze the html header to extract
the charset meta data. This is only enabled for the crawler at the moment. Integration into
proxy needs more testing.
- adding eventlisterner interfaces to the htmlscraper to allow other classes to get informed
about detected tags (used by the htmlFilterInputStream.java)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2624 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
f453c14b5d
removed unreacheable catch blocks and unused imports
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2619 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
ad7f600f25
*) Bugfix. re-enabling inheritance of serverCharBuffer from writer class
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2618 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
97d2a08ef1
*) restructuring needed to support parsing of documents using various charsets
...
- serverFileUtils.java:
-- adding methods to copy from stream to writer and readers to writers
-- moving httpc writeX methods into serverFileUtils class
- serverCharBuffer.java: removing inheritance from Writer class
- replacing htmlFilterOutputStream by htmlFilterWriter class which handles
content as char stream
- htmlFilterContentTransformer.java: deactivating getText mode
(still needs to be migrated to use char streams instead of byte streams)
- changes in several classes to use htmlFilterWriter instead of htmlFilterOutputStream
- changes in Scraper and Transformer classes to operate on chars instead of bytes
- httpdProxyHandler.java: bugfix. clientTimeout setting was missing in config file
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2617 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
3aac5b26da
- added automatic tag generation when a web page from the search results is added
...
- added new image 'B' in front of search results for bookmark generation
- added news generation when a public bookmark is added
- the '+' in front of search results has new meaning: positive rating for that result
- added news generation when a '+' is hit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2613 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
f644a1c3a7
better evaluation of index abstracts
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2604 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
allo
2fd610b556
http://www.yacy-forum.de/viewtopic.php?p=25611#25611
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2601 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
06fa891152
*) htmlFilterContentScraper.java: using proper charset for document title
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2595 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
74c3e7cf29
*) storing document charset into plasmaParserDocument object (is needed later by the condenser)
...
*) htmlFilterContentScraper.java: using proper charset for document title
*) serverByteBuffer.java: adding new toString which allows to specify the charset for byte encoding
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2593 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
c5d3020941
*) better errorhandling for last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2592 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
d0a5a53789
*) changes needed for multi-language support
...
- parsers may need to know the charset of the byte stream
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2591 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
26ab1fa885
fixed null pointer exception
...
See http://www.yacy-forum.de/viewtopic.php?p=25598#25598
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2588 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
b0e8ff6eda
*) some TODO makers for UTF-8 problem
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2586 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
41e27b85b7
fix for crawler condition
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2583 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
9ecf7f0da2
*) some TODO makers for UTF-8 problem
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2578 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
c89d8142bb
replaced old 'kCache' by a full-controlled cache
...
there are now two full-controlled caches for incoming indexes:
- dhtIn
- dhtOut
during indexing, all indexes that shall not be transported to remote peers
because they belong to the own peer are stored to dhtIn. It is furthermore
ensured that received indexes are not again transmitted to other peers
directly. They may, however be transmitted later if the network grows.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2574 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
6e2907135a
bugfixes for remote search server part
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2573 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
cf9884e22b
first attempt to implement a secondary search
...
this is a set of search processes that shall enrich search results
with specialized requests to realize a combination of search results
from different peers.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2571 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
b251076e64
avoid ConcurrentModificationException
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2563 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
75b198bc02
- updated references to indexContainer
...
- more bugfixes and debugging for indexAbstract processing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2555 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
b7e7808ea6
wordmigration now works also for new index database
...
if the new database is switched on, no 'too big' messages appear,
all the WORDS files can be completely migrated
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2553 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
a0ddf2ec11
*) AbstractCrawlWorker.java: delete already downloaded data on crawling error
...
*) plasmaSwitchboard.java: log unexpected errors while parsing/indexing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2552 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
4f9e42d5ed
more changes towards better join-search
...
- fixed problems with index-abstract generation
- added analysis output for index abstract receive
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2551 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
a7281a9b4d
fix for last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2545 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
82a6054275
- fixed bug with new indexAbstract generation
...
- added partly evaluation of indexAbstracts during remote searches
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2544 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
fded1f4a5d
*) better handling of maximum file size limit in crawler
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2543 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
74d1dea30b
changes towards better join-search
...
- added generation of a compressed index within remote peers during global search
- added selection of specific urls within remote peers during secondary global search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2539 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
ae4e8ce03e
- cut for 'probably last html-interface version': version number update
...
- small enhancement to ranking
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2536 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
64bed59ee8
enhancements to ranking
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2535 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
63893003be
*) Adding settings page for the crawler which allows to specify a file size limit and the timeout to use.
...
*) adding first version of maximum filesize check for the crawler
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2534 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
94d7ced900
fix for last ranking commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2529 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
03835c2ee8
enhanced search result computation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2527 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
ac3419b65f
better debugging for indexOutOfBoundException bug
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2525 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
a8bc768206
enhancements to ranking evaluation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2523 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
33898ae7e9
*) ResourceInfoFactory.java: Bugfix for classNotFoundException
...
See: http://www.yacy-forum.de/viewtopic.php?t=2797
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2521 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
406e170e25
*) more verbose error message
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2519 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
b298474e22
*) Bugfix needed because of changed plasmaCrawlLURL.load behavior
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2518 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
96c6e4e322
- enhancements to detailed search page
...
- enhancements to search ranking computation process
- removed bugs in postranking
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2516 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
9340dbb501
fixed all possible problems with nullpointer exception for LURLs
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2513 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
a5ed86105b
*) bugfix for handling of ResourceInfo object in proxy
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2512 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
ff4362b02d
some more fixes for new plasmaCrawlLURL.load behavior
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2511 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
7aeadbe7cc
another NullPointerException in http.ResourceInfo
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2510 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
141f9e5bb4
fix for new plasmaCrawlLURL.load behavior
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2509 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
087f7511f8
prevent NullPointerException in http.ResourceInfo
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2507 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
a2525072f2
bugfix for kelondroRow - property generation
...
this bug affected ranking parameters :-(
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2506 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
b44514242a
*) crawler/ftp/CrawlWorker.java: better errorhandling
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2503 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
7d7f30139c
*) crawler/ftp/CrawlWorker.java: delete old cache file
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2502 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
4ae0f122f8
*) ResourceInfo.java: License header added
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2501 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
043edfa4d8
*) ftp/ResourceInfo.java ResourceInfo object for ftp resources added
...
*) ftp/CrawlWorker.java better errorhandling for ftp crawler
*) plasmaCrawlEURL.java: some errorcodes added
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2499 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
4866868c0e
added write cache for LURLs
...
This was necessary to speed up the index receive process during global search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2498 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
8a0e35618b
enhancements to search result preparation
...
- added detailed count on remote search results
- enhanced search sequence during remote searches (doing local search in sequence)
- strict adherence to timout limits
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2497 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
5c1bb53d2a
Missing description for last commit
...
*) next step of restructuring for new crawlers
> HTCaching should now work protocol independent
-- introduction of new ResourceInfo objects containing protocolspecific metadata
of a resource.
-- the ResourceInfo objects now implement old functions like shallIndexCacheForXXX,
shallStoreCacheForXXX in a protocol dependent manner
> Indexing should also work protocol independent now
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2496 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
dae763d8e3
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2495 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
4825bfaaf3
*) Bugfix for PrintWriter Problem
...
See: http://www.yacy-forum.de/viewtopic.php?t=2792
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2494 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
7930839594
*) URL.java: userinfo was not taken over when generating a new url from a base url and a rel. path
...
*) CrawlWorker.java: using new dirhtml function of ftpc
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2492 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
7a35b8e237
*) direct access to responseheaders of sbQueue.Entry removed to make it more http independent
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2487 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
ffbf416e76
*) direct access to requestheader of htCache.Entry removed to make it more http independent
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2486 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
3870d615e3
*) setting htCache.Entry fields to private
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2485 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
393a7d10be
*) setting htCache.Entry fields to private
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2484 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
ab5a9bee66
*) adding some copyright headers
...
*) next step of restructuring for new crawlers
- adding first testversion of ftp crawler class
-- does not create a htCache entry yet
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2483 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
5847492537
*) next step of restructuring for new crawlers
...
- IndexCreate_p.java: correcting problems with ftp urls
- URL.java does not cutout the userinfo anymore
(needed to transport authentication info in ftp urls, e.g. ftp://username:pwd@ftp.irgendwas.de)
- plasmaCrawlLoader.java:
-- hack to re enable https urls
-- adding function getSupportedProtocols
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2482 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
fce9e7741b
*) next step of restructuring for new crawlers
...
- renaming of http specific crawler settings
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2480 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
e3f0136606
*) next step of restructuring for new crawlers
...
- adding function isSupportedProcotol to plasmaCrawlLoader.java
- disabling robots.txt check for protocols other than http(s)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2479 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
9ded4e8d5a
*) Bugfix for name resolution in proxy mode
...
See: http://www.yacy-forum.de/viewtopic.php?p=25241
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2478 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
1c8300fcec
*) Bugfix for name resolution in proxy mode
...
See: http://www.yacy-forum.de/viewtopic.php?p=25241
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2477 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
4e2a950ac9
*) next step of restructuring for new crawlers
...
- avoid using the http crawler class directly. Using the interface class instead
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2476 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
09b106eb04
*) next step of restructuring for new crawlers
...
- adding interface class (plasma/crawler/plasmaCrawlWorker.java) for protocol specific crawl-worker threads
- moving reusable code into abstract crawl-worker class AbstractCrawlWorker.java
- the load method of the worker threads should not be called directly anymore (e.g. by the snippet fetcher)
to crawl a page and wait for the result use function plasmaCrawlLoader.loadSync([...])
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2474 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
eb9b138986
*) next step of restructuring for new crawlers
...
- conversion of the crawler pool into a keyed object pool
- crawlers are now loaded based on the url protocol (of course works only for http now)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2473 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
1395aae742
*) starting restructuring which is needed to add crawlers for additional protocols
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2472 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
b4acbdaa97
*) better handling of server shutdown
...
See: e.g. http://www.yacy-forum.de/viewtopic.php?p=25234
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2470 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
f3ac4dbbb9
*) better handling of server shutdown
...
See: e.g. http://www.yacy-forum.de/viewtopic.php?t=2584
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2468 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
959b779aba
*) avoid performance loss if log level is greater than 'fine'
...
See: http://www.yacy-forum.de/viewtopic.php?p=25180
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2467 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
18b6876860
new cache flush configuration settings
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2460 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
f0278b4092
Bugfix for / by zero when the AssortmentCluster is empty
...
See: http://www.yacy-forum.de/viewtopic.php?t=2746
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2459 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
14e0bb0dcf
allow more references per word for new db
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2458 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
985dcbde7f
changed some parameters that may cause better memory usage and more indexing speed
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2457 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
b7f4a1521b
added options to switch on or off the kelondroFlexTable for NURL, EURL and PreNURL
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2456 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
c26da4893b
turned back NURL usage of kelondroTree, kelondroFlexTable has still problems with deleted entries
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2454 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
db1eae0227
* simplified initialization of database objects
...
* replaced kelondroTree for NURLs by kelondroFlex
* replaced kelondroTree for EURLs by kelondroFlex
take care, may be very buggy
please finish crawls before updating. crawls will be lost.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2452 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
0b73f2b132
Repair DNS prefetch during cacheScan
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2451 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
27a159b401
* documentation update
...
* removed doc from release
* release information in doc/News.html
* release 0.46
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2442 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
f80f776b89
*) Trying to solve NullpointerException problem in function addURLtoErrorDB
...
See: http://www.yacy-forum.de/viewtopic.php?t=2705
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2441 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hydrox
1c99b5a484
*)fixed logging for urldbcleanup
...
*)changed exception handling in urldbcleanup so that it shows NullPointerException correctly
*)added more Blacklisting to urlcleaner
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2436 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
8f3f4ab0eb
enhanced synchronisation in plasmaWordIndex
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2433 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
23dd972608
fixed memory calculation in performanceMemory web page
...
fixed also maximum cache size computation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2429 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
1ce3c22761
better memory control:
...
- added memory monitor for preNURL-db in performanceMemory
- changed default memory assignments
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2427 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
39b4c26bdc
more memory control:
...
- catchup of OutOfMemoryError in server threads
- automatic adoption of word cache size after a Short Mem Cycle
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2426 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
3e9d509c39
some small fixes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2425 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
eb633c0a4f
server threads must now supply a method that can be called in case
...
of short memory. This has been realized for the indexing thread.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2421 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
f5720cb2fa
removed most synchronization in wordIndex (for testing)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2420 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
0187c60010
because of a bug in the JRE 1.4.2 there was no memory protection
...
see http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4686462
this commit fixes the bug by using a memory-computation patch.
All uses of Runtime.maxMemory had been replaced by serverMemory.max
The bug is not present any more in Java 1.5
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2419 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
cfb51fdef1
less synchronization in plasmaWordIndex
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2416 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
d6a928c2da
quickfix for http://www.yacy-forum.de/viewtopic.php?t=2705
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2415 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
6ad471ef96
* applied many compiler warning recommendations
...
* cleaned up code
* added unit test code
* migrated ranking RCI computation to kelondroFlex and kelondroCollectionIndex
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2414 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hydrox
9da3aa74d3
silly me, fix for the fix as advised by theli
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2408 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hydrox
bb3d9a5582
*) e.getMessage().indexOf() can only be used if there is actually an ExceptionMessage.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2407 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hydrox
7a54010a9c
*) Iterators can't be casted to IndexContainer
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2406 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
cd5f7e137c
fixed problem with NURL-generation upon first startup
...
(a new kelondroFlexTable was generated, which should not)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2402 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
8418af141a
added several consistency checks and small changes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2400 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
9d13aeca13
*) removing class. does not work so far
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2399 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
95a84ae469
*) adding missing classes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2398 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
eee44be602
*) adding an interface for customized blacklist classes
...
- now it's possible to use a customized blacklist engine
instead of the default one
- this can be done by configuring the property BlackLists.class
See: http://www.yacy-forum.de/viewtopic.php?t=2108
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2397 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
6d2f15971a
there is a very strange error that causes that the kelondroRecords structure
...
is corrupted. The cause is, that the deleted-records-chain has wrong entries,
and one of the pointers in that chain points to a place behind the file end.
This causes an IndexOutOfBoundsException within an IO operation.
I currently don't know the reason that the deleted-records-chain is
corrupted, but the error can be catched. If this now happens with the
assortment database, the database is deleted.
See also:
http://www.yacy-forum.de/viewtopic.php?p=24586#24586
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2396 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
d2e8e76218
*) now it's possible to configure the yacy blacklist separately for dht, search, proxy, crawler
...
See: http://www.yacy-forum.de/viewtopic.php?t=2541
http://www.yacy-forum.de/viewtopic.php?p=24516
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2389 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
9ae9062bd3
* disabled new kelondroFlex table for NURLs
...
* added new RAM index Class
* fixed possible synchronization problem in kelondroRecords
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2388 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
689bbcf9cd
replaced kelondroTree db for NURLs by new kelondroFlexTable
...
The new database is only created if the old is deleted or does not exist
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2387 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
7fbba41962
synchronization fixes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2386 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
328f9859a5
more synchronization in plasmaWordIndex
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2385 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
130e6d4719
generalized index object for eurl, nurl and lurl to prepare move
...
of these tables to new kelondroFlexTable Object
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2382 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
acdf24877f
more synchronization against outOfMemoryError in wordIndex
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2381 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
95160d7f2c
fixed size computation of index elements from the collection index
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2380 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
26116cabde
added missing rowdef assignment
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2379 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
abf22f6e60
removed url normalform computation from htmlFilterContentScraper.
...
This method was implemented in de.anomic.net.URL
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2377 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
740d49751d
* strict type and size check in kelondroRow handling
...
* adopted all code to use the declaration form of kelondroRow
* fixed a bug in kelondroRow which caused wrong parsing of encoding type
* the bug caused bad database behaviour in new indexCollection data structure.
because of this bug, all test databases are now already void. A new database is created
* the kelondroFlexTable and indexCollection data structures now store a declaration of the row definition
into a properties file along the database files.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2375 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
314021453f
* more logging
...
* option in yacy.init to set useCollectionIndex usage
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2374 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
61b151b083
* added another auto-fix for collection index inconsitency check
...
* fixed words size computation for collection index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2368 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
f58283def2
better control of index flush
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2364 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
4be21a3cab
ups
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2363 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
80b6c90d54
enhancements to prevent blocking during dht transfer receive
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2362 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
9f298083cd
*) adding more urls to the error url
...
- old error strings where replaced with there corresponding constants
See: http://www.yacy-forum.de/viewtopic.php?t=2638
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2360 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
d56f06401e
- Cache known URLs during indexReceive to avoid getting blocked during loadedURL.exists() whenever possible
...
- Small logging updates
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2359 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
c09f734d06
*) offer router configuration on ConfigBasic.html
...
- checkbox to allow router configuration is shown if
- a) the UPnP forwarder is installed
- b) a UPnP enabled router was found
- c) no other forwarder was configured
See: http://www.yacy-forum.de/viewtopic.php?p=24264
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2358 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
dcbb4d0a6b
Display the size of HashBlacklistedCache on PerformanceMemory page.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2357 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
d799622da1
better flush limit for index collections
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2354 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
279b1d969d
Integrated new indexing data structure 'collections' into the main class
...
for indexing, the plasmaWordIndex.
The new data structure is ready-to-use, but currently disabled.
It can be activated by setting the static
plasmaWordIndex.useCollectionIndex
to true. This shall be done for testing purpose.
The new index is stored to
DATA/INDEX/PUBLIC/TEXT
The directory PLASMA shall be used only for crawler in the future.
Attention: during testing the data structure in INDEX may change,
and created indexes with the new data structure may get useless.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2348 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
4ff742e42d
implemented indexCollectionRI
...
this is the new database structure that is supposed to replace the
plasmaAssortmentCluster AND the plasmaWordIndexFileCluster
The new structure is not yet active and needs to be integrated into
plasmaWordIndex. This has some migration constraints that are not yet
completely solved.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2347 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
01f95eccd3
re-write of kelondroCollectionIndex. This is the data structure that
...
shall replace the current assortment files.
* used the kelondroFlexTable to hold the index of collections
* used kelondroRow definitions to declare all data structures
* fixed several bugs that appeared in kelondroRowSet and kelondroRowCollection during testing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2344 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
ebc2233092
* implemented (finished) class indexRowSetContainer
...
* replaced indexTreeMapContainer by indexRowSetContainer
* deleted indexTreeMapContainer and abstract class
This is another step to the new database structure
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2343 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
9183d21f25
renamed new index class to old name
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2342 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
c4e922885a
replaced indexURLEntry by new class that uses a kelondroRow.Entry object
...
to store the index entry. This is another step to move to the new database structure.
A side effect of this change is, that index storage uses much less RAM space,
which affects the index RAM cache.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2341 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
e357599f92
* fixed problem with indexContainer iteration from RAM:
...
indexContainers from RAM must be cloned explicitely to prevent
side-effects on stored indexContainer objects in Cache
* changed behaviour of urlReference deletion from indexContainers:
deletion does not user retrieval of all Elements from the assortments
* added textual configuration of kelondroRow and kelondroColumn definition
* update of kelondroRow usage in yacyNews
* modified kelondroAttrSeq to use modified kelondroColumn parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2339 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
8b77afd72c
some fixes to new container merger
...
and some code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2336 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
417ed5102e
redesign of database iterators:
...
an iteration of key elements in kelondroTree databases is no longer supported.
this is now replaced by an iteration of kelondroRow.Entry objects from the database
Iteration of keys from the database was mostly followed by retrieval of the row
from the database, whcih caused unnecessary database load.
The index selection was also redesigned to use the new row iteration methods.
This affects many funktions, most important is the DHT selection routine which is now much faster.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2327 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
ad692fc6c7
implemented option to extract nurls from the database
...
(plus some iteration enhancements for nurls)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2325 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
7fd90ca7c8
* strict handling of NURL entry element generation, storage and stacking
...
* more space for EURL reason strings (you must delete the EURL db to use this)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2324 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
5f72be2a95
some redesign of EURL storage
...
* store() is now called explicitely
* more urls are written to the EURL table
* the EURL stack does not store the complete entry any more, now only the URL hash
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2323 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
1ed3e2daef
added option to extract domains and/or urls from the eurl database
...
when extracting from eurl, the html output format is recommended, since
this format adds also the fail reason to the domain/url.
The complete syntax for domain extraction is now
java -Xmx<megabytes>m -classpath classes yacy -domlist [ -source { lurl | eurl } ] [ -format { text | zip | gzip | html } ] [ <path to DATA folder> ]
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2322 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
58df8b7bbf
a large collection of different changes
...
* mainly for the transition to the new indexing database structure
* a bugfix for an endless loop inside kelondroTree iteration
* a bugfix for bulk read inside a kelondroTree iteration; the bug caused that some elements had been iterated twice
* very strong speed enhancement for url/domain extraction
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2320 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
e4f1820b58
protection against too long authentication strings in switchboard
...
see also: http://www.yacy-forum.de/viewtopic.php?p=23943#23943
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2312 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
b3c569f706
*) renaming of function getTransferedEntitySpeed to getTransferedEntrySpeed to avoid confusion
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2308 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
5214f571cd
simplified method call in balancer
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2303 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
7935f27038
enhanced synchronization in balancer
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2291 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
3879a0ecd0
replaced java.net.URL usage by use of new class de.anomic.net.URL
...
This shall be seen as an experiment to exclude all cases where
there could be a DNS lookup during URL comparisment.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2290 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
07900366ac
deactivated cache-initialization for file-indexes (files in WORDS)
...
see also: http://www.yacy-forum.de/viewtopic.php?p=23801#23801
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2289 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
40aa735520
fixe timing problem causing too long delay during initialization of kelondroTree objects
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2288 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
24a02cbeef
*) Bugfix for not parsable application/xhtml+xml resources if
...
an URL has no extension
See: http://www.yacy-forum.de/viewtopic.php?p=23687
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2280 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
b0ca5fa784
some correction algorithm for preload time computation during assortment open
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2279 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
e22cbaee97
- extended logging for preload
...
- reduced preload-time for IndexImport_p.java
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2278 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
671fd9a5c9
work towards new indexing database structure
...
(no effect on current functionality yet)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2277 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
92f4cb4d73
added option to configure the start-up delay time for kelondro database files.
...
the start-up delay is used to pre-load the database node cache
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2276 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
6643da3fbd
bugfix for http://www.yacy-forum.de/viewtopic.php?p=23463#23463
...
(affected URL DB Cleaner)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2263 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hydrox
8ba8e2b7d9
*) added cache for blacklists urlhashs recieved by DHT. DHT does not request URLs listed in this cache.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2251 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
53cbcc6d6e
Implement emergency break in index receive when the limit of the ramCache is exceeded by more than cacheLimit
...
See: http://www.yacy-forum.de/viewtopic.php?p=22911#22911
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2248 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
66964dc015
removed high/med/low from kelondroRecords cache control.
...
this was done because testing showed that cache-delete operations
slowed down record access most, even more that actual IO operations.
Cache-delete operations appeared when entries were shifted from low-priority
positions to high-priority positions. During a fill of x entries to a database,
x/2 delete situation happen which caused two or more delete operations.
removing the cache control means that these delete operations are not
necessary any more, but it is more difficult to decide which cache elements
shall be removed in case that the cache is full. There is not yet a stable
solution for this case, but the advantage of a faster cache is more important
that the flush problem.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2244 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
borg-0300
4c6083b264
network picture;
...
back to old version
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2242 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
borg-0300
955915385a
network picture;
...
small changes;
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2241 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
borg-0300
027fa8ab1c
network picture;
...
bigger;
more dot steps;
small other;
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2240 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
b20496e42b
*) make DHT DoS check configurable (requested by KoH)
...
- check can be disabled via property indexDistribution.dhtReceiptLimitEnabled
- upper bound can be configured via indexDistribution.dhtReceiptLimit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2234 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
12af69dd86
cosmetics
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2212 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
allo
67a8c74be3
Fix for dynamic login with static password.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2210 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
allo
ef9eb50c3c
fix for adminlogin
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2209 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
allo
6fe2fed87e
cookieauth works with static Admin.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2208 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
45b39ee1be
*) solving unpacking problems with to long filename by
...
a) renaming the parent folder in the tgz file to yacy
(can be configured via build properties file)
b) reconfiguring build file to throw an error if a file
name is too long
Please note that currently there is _no_ proplem with too long
class names because of step a.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2207 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
fb090652df
*) use a more compact for plasmaWordIndexAssortmentImporter.java because the long name
...
caused problems during untar operation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2206 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
4ca0857c0c
*) Index transfer now considers the pause time send by busy peers during
...
index transfer / index distribution
See: http://www.yacy-forum.de/viewtopic.php?p=22647#22491
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2205 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
75ed507d39
some debugging of new kelondroFlexTable class
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2190 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
370c481fa7
bugfixes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2171 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
c36e9fc8d3
full integration of kelondroRow
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2167 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
c75cacda95
added a flex-width-array: this is a table where it is
...
possible to add columns to an existing table
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2163 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
4a907a570f
1st step to migrate kelondroTree to usage of kelondroRow instead of byte[][]
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2162 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
09f780df27
more bugfixes for the new row/stack handling changes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2160 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
3c3c047d0a
integrated kelondroRow into kelondroStack
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2156 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
5bb565944f
integration of new kelondroRow into some parts of kelondro,
...
especially into the array storage
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2155 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
eaa6f012f0
refactoring: better naming for classic DB (files in WORDS)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2151 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
5041d330ce
refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2150 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
7b3b12888c
refactoring: integrated indexContainer abstraction layer
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2149 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
cb295fbbdc
refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2147 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
rramthun
bc94a714b2
Better explanation for the auto-dom-filter.
...
Some javadoc.
Small change to DetailedSearch.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2146 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
196b8abb30
refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2144 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
b48327904a
Don't disconnect peers that report 'busy' during index transfer.
...
These peers are already being marked as not accepting remote index transmissions by yacyClient.transferIndex. That should by enough to prevent further transfer attempts until newer seed information is received.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2142 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
4d8f8ba384
added cache-performance analysis for node caches
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2140 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
bd057b44dd
- automatic setting of peer-does-not-accept-remote-crawl
...
- increased percentage of object cache to node cache to 30%
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2136 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
81e79f2caf
fixed new cache behaviour changes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2134 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
cda087f43b
- integrated cache miss storage into object cache
...
- removed cache-miss handling from indexURL
todo: new Monitoring in PerformanceMemory_p
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2132 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
757ec28430
refactoring: better data capsulation for indexURL
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2131 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
61078b3885
*) adding support for delayed shutdown
...
- needed by Ismael to receive the Steering page properly on shutdown
- now the steering page should always be displayed properly in the web browser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2129 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
90d569d70f
refactoring of index management:
...
url storage is part of index management; moved plasmaURL to indexURL
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2122 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
a930be4ba3
refactoring of index management:
...
generalized the index entry
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2121 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
df7e1d9df3
Changes to plasmaURL and subclasses:
...
- Improve performance of plasmaURL.exists() by remembering URL-hashes that are not present
- Use a more realistic estimation of memory usage by the existsIndex cache
- Routine cleanup of the existsIndex to limit its memory usage
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2113 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
a474669338
start with refactoring of index management
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2110 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
rramthun
f08e33680c
Added Blog-news-symbol as requested.
...
I think I will change the character distance a little bit later.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2101 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
f331def5d8
*) Bugfix for distribution. Incorrect behavior if peerCount == selectedCount
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2098 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
auron_x
55ea4cbfe6
*)reverted patch for memory-display issue
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2095 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
5048b05bc6
*) Index Transfer should only restart at the beginning if the delete
...
option is configured. Otherwise we have an endless loop
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2092 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
auron_x
53d9ab6db7
*)fixed bug in PerformanceMemory_p.java which caused negative memory-values on big peers
...
see http://www.yacy-forum.de/viewtopic.php?t=2370
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2091 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
ddfe0f0e27
*) don't try to parse referer string if it's null
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2090 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
bcc950c533
*) Bugfix for Index Transfer
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2088 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
015d044c25
tried to fix some problems with latest changes to httpc
...
very experimental!
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2078 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
3e31820c3d
- corrections to PerformanceMemory display of object cache
...
- configuration of object cache size in kelondroTree initializer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2075 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
461548698c
configuration of index transfer chunk size
...
see http://www.yacy-forum.de/viewtopic.php?p=20951#20951
new properties in yacy.init:
indexDistribution.minChunkSize = 5
indexDistribution.maxChunkSize = 1000
indexDistribution.startChunkSize = 50
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2073 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
29b1b0823c
added monitoring of new object cache to performanceMemory page
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2072 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
9104001e7c
*) Better error handling for assortment import
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2067 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
51e3bb576f
Don't increase dhtTransferIndexCount when the last transferred index was smaller
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2064 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
a0ca4c5fb8
Remove a possible race condition between DHT transfer and deQueue
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2059 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
0cfba8950f
Removing unnecessary and possibly dangerous synchronization of the wordIndex
...
when deleting transferred indexes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2058 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
d6213f8a85
quickfix for http://www.yacy-forum.de/viewtopic.php?p=19482#19482
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2042 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
b0036249c1
added some attributes to network picture
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2032 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
cbcf7418ef
Cleanup synchronization in plasmaWordIndex
...
- only synchronize when changing data in more than one database
see: http://www.yacy-forum.de/viewtopic.php?t=2167
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2031 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
60e5aff9fc
some enhancements to the remote crawl trigger
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2030 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
dbe96e6541
added hand-over of search filter and prefer ranking to yacy protocol
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2029 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
rramthun
0604203bce
Updated and corrected German language file
...
Changed Italian language file for an Italian/English interface and not Italian/German
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2024 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
00a5d435e2
- fixed some bugs with domain filter
...
- added new ranking filter "prefermask": urls that match the filter are ranked better
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2022 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
14d6e476c9
tried to solve some problems with new picture viewer
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2019 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
9324425165
fix for remote crawl reject
...
see http://www.yacy-forum.de/viewtopic.php?p=20075#20075
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2017 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
borg-0300
30e4fc39a5
HTCache extended
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2015 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
d0dd8b14d2
fixed picture tag and presentation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2014 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
borg-0300
da6a8bafa2
rename currCacheSize -> curCacheSize;
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2010 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
borg-0300
92110aea32
nullpointer fix for profile(); other minor change;
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2009 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
f0833b0328
introduced simple search interface
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2007 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
47b541b2d1
added better option handling in yacysearch
...
added depth option for image presentation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2001 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
c9e16bfd48
first try to insert image search (does not work yet)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2000 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
f77775220b
fixed parser error
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1999 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
22de954a57
added some log output to parser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1996 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
83e0e765ec
redesigned some parts of the html scanner & parser
...
to better support image tags
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1995 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
ac114d69c0
tried to fix some problems with time-outs during search
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1994 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
e2e8d0c188
some kind of refactoring of yacysearch:
...
made 'room' for new picture search result presentation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1993 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
6b63e26cbb
- removed search function from index.html/java, only imput left
...
- added media fetcher/crawler class (not ready yet)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1992 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
bc3e80fe42
quickfix
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1990 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
d8d0ac29c3
added image-viewer servlet that can do:
...
- each image that is requested is stored in the cache
- the image is taken from the cache if exists there
- the image can be scaled
The purpose of creation a scaled image is because of copyright problems
In a further stept the retrieval of not-shrinked images is restricted
to either access from localhost or with given authentication
This servlet can be used for image-preview purpose after an image search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1989 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
ddc6394d9b
fixed bug about auto-depth 0
...
see http://www.yacy-forum.de/viewtopic.php?p=19751#19751
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1988 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
60351fa3f7
small fix to previous commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1987 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
a469874e3f
added and fixed time-out behaviour during search
...
see also: http://www.yacy-forum.de/viewtopic.php?p=19823#19823
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1986 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
1d0b0d6e2a
synchronized local searched to prevent that several searches are performed at the same time
...
see also: http://www.yacy-forum.de/viewtopic.php?p=19761#19761
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1985 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
22b9d03bbf
Correcting remaining time issue in getContainers
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1984 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
d58788b753
added some synchronisation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1982 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
e566d1d8d6
some bugfixes regarding new crawling options
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1980 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
c7f1300300
-fixes for last commit
...
-some more ranking attributes (comments only)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1979 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
f2421f6a47
some small attribut changes regarding cache flush
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1974 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
7a650d0023
several bugfixes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1971 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
59d52fb4a9
fixed some problems with crawl profiles
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1967 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
708cc6c8d9
fixed some bugs for auto-filter and added monitor in profile list
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1959 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
rramthun
250864406f
...
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1955 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
e82899ba57
fixed missing urls map initializer
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1950 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
63f39ac7b5
added 3 new crawling steering options:
...
- re-crawl by age of page (enter in minutes)
- auto-domain-filter
- maximum number of pages per domain
NOT YET TESTED!
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1949 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
1fc3b34be6
some pre-work (without function yet) to implement:
...
- re-crawl (by age of last crawl)
- auto-crawl-filter by crawl depth (to be explained..)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1948 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
c9e6b5e391
*) check size of indexing-queue and crawler pool before processing remote triggered crawl jobs
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1946 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
1509314ea6
set tighter control during DHT index and peer selection
...
see http://www.yacy-forum.de/viewtopic.php?p=19329#19329
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1945 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hydrox
fcc0683200
*) undoing last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1944 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hydrox
9411961eec
*) another little fix for DHT-Transfer
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1943 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hydrox
8b14a0c833
*) little fix for DHT-Transfer
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1941 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
1f4412a146
adopted isListed to discussed new behavior as discussed (url, getFile)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1940 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
063ef4660a
bug?
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1936 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
82358677a9
added another shiftK2W to flushCacheSome
...
this should fix the bug that the DHT cache is not flushed if there is no indexing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1935 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
128e4ab199
- in serverSystem: maxPathLength is now a variable, not a method
...
- upon startup the calculated maximum path length is shown
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1932 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
30e3e3a0fd
adopted MAXPATHLENGTH to host system capabilities
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1930 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
borg-0300
85bb8e32a1
Bugfix for last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1928 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
borg-0300
3fe402069f
try to fix
...
see: http://www.yacy-forum.de/viewtopic.php?p=19175#19175
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1927 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
f16f1f15cd
bugfix for 100% CPU bug; thanks to Matthias for analysis
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1926 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
borg-0300
254a13efd9
MAXPATHLENGTH used
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1925 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
borg-0300
8865948e4e
Cleanup;
...
Methode replaceRegex added;
Constant MAXPATHLENGTH added;
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1923 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
6c70f4a0cf
renamed wordHashes for a word hash set generation to wordHashSet
...
This was done because the wordHashes iterator will get another integer
parameter and then conflicts with the wordHashes set generation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1921 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
d5f8f40c31
removed correcting iterator
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1920 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
488a0ed580
replaced old keyIterator and rowIterator by buffered iterators
...
that are synchronized with database access
Main change is done in kelondroTree, other classes are only adoptions
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1918 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
4e9a8f41fd
rwiDBCleaner + dbImporter: Iterate over small excerpts of
...
word hashes instead of the whole DB especially while changing
the DB in the process.
see http://www.yacy-forum.de/viewtopic.php?p=19136#19136
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1917 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
474379ae63
remove TABs from plasmaDbImporter.java
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1916 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
dba02f399f
starting of re-design of kelondroTree iterator
...
- new access to iterator
- added many IOException handling in other Classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1914 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
f02b426073
made kelondroTree.nodeIterator private
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1910 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
borg-0300
5f6fdf1786
Bugfix for getCachePath(URL url)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1909 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
303b6463a8
added debug line to URL storage for testing
...
see http://www.yacy-forum.de/viewtopic.php?p=19129#19129
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1908 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
91dca2cd8d
fixed a bug in last commit: LURL entries cannot be written,
...
because a stored property was not set to false (but true)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1906 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
3286b1f498
re-organisation of lurl-creation and -stacking
...
this was necessary to prevent useless write to the database
in case of blacklist appearance of the url
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1905 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
0b903c5317
removed usage of kelondroNaturalOrder from plasmaCondenser to experimental
...
exclude cause of a 100% bug.
see http://www.yacy-forum.de/viewtopic.php?p=19076#19076
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1900 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
4239db0d1c
fixed new ordering for backup iterator TreeSet
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1899 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
33eba5ecb8
temporary disabling last change, does not work (cannot debug right now)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1896 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
f0464042fc
fix for latest iterator-replacement-fix:
...
iterator generated TreeSet which did not resprect rotations
this has now be implemented using kelondroOrder Objects
and by adding this rotation-rules to the ording
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1895 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
borg-0300
ec21c585cb
try to fix path too long
...
see http://www.yacy-forum.de/viewtopic.php?p=19079
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1893 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
a6a3f4b694
fix for svn 1888
...
this is a redesign of the no-iterator solution
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1892 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hydrox
8da13088e9
*)removed multiple DHT_Distribution_Threads
...
*)boosted DHT_Distribution sending chunk parallel to multiple peers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1890 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
283a7181c6
try to fix new 100% cpu bug, possibly caused by iterator method
...
see http://www.yacy-forum.de/viewtopic.php?p=18900#18900
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1888 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
f588c0724f
removed cache flush in case of DHT receive
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1885 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
e94b374d56
update to cache flush method
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1884 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
bcd99fe83e
introduced a second RAM cache for DHT transfer
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1880 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hydrox
360a460da8
*)URL-Cleaner: moved logging-statement to correct position
...
*)plasmaURLPattern: host is now added to the hashset in lowercase
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1879 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
02f9765013
quickfix for time problem during cache restore
...
see http://www.yacy-forum.de/viewtopic.php?p=18810#18810
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1878 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
ad119f06af
*) Don't overwrite new entries with older ones
...
see: http://www.yacy-forum.de/viewtopic.php?t=2015
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1874 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
be88687d8c
fixed some problems with new cache flush karenz
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1873 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
d3da7c9a08
*) Adding support for robots Allow directive
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1872 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hydrox
f046e1814a
*fix or last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1869 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hydrox
c55c51e2a8
*)added keywords to IndexCleaner_p.java
...
*)updated Logging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1868 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
ddbeda738e
added minimum age of word in cache to performance menu
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1866 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
f188611fc6
apply blacklist on rwis during dht receive
...
very experimental!
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1865 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
0ec28b8f8e
added DBCleaner from Hydrox
...
see http://www.yacy-forum.de/viewtopic.php?p=18093#18093
The servlet is now named IndexCleaner_p.
See http://localhost:8080/IndexCleaner_p.html
The Servlet was adopted to fit in the overall architecture
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1863 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
fb4100d47b
*) undoing last commit.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1856 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
a84cc71218
*) removing getTotalRuntime
...
- not needed anymore
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1855 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
auron_x
dce08771d1
*) Fix for wrong estimated and elapsed times when import was paused
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1850 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
b34713324a
DBImport: remove words from source index even if nothing has been added to home index
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1849 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
520b60f15b
fix for http://www.yacy-forum.de/viewtopic.php?p=18610#18610
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1841 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
bae3783d38
added a snippet marking
...
(search words are now bold in snippets)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1823 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
f0a38873eb
* added yacysearch page with better view on search results
...
the old search page is obsolete and will be removed
* ConfigBasic.html is now the default page instead of index.html
as long as no password is set
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1815 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
f0041d504d
remove of several results from a single domain is stopped if the result set is smaller than the wanted number of results
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1811 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
89286478e7
*) removing thread pool eviction for now. Not needed at the moment
...
See: http://www.yacy-forum.de/viewtopic.php?p=18290#18290
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1801 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
759800f543
*) Bugfix for storeHTCache problem
...
- content was not indexed if storeHTCache was off
See: http://www.yacy-forum.de/viewtopic.php?p=18269
See: http://www.yacy-forum.de/viewtopic.php?t=1882
See: http://www.yacy-forum.de/viewtopic.php?t=241
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1800 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
a8548c0484
* several bugfixes regarding basic configuration
...
* extended number of search target peers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1794 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
1b9b8922d9
* fixed problems with new basic 1-2-3 configuration (now authentication required)
...
* fixed graphics problem
* fixed some other problems with default values
* 1-2-3 config now appears automatically on start-up if no password is set
* added new config menu
* moved profile to new config menu
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1792 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
auron_x
8c6f38fe70
*) added Blog to YaCy (atm not reachable through interface) -> Blog.html
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1790 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
ce5274c194
yacybot user agent
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1786 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
351bd0a678
*) dbImport: convert cacheSize to kb when creating plasma* objects
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1773 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
eaffcfefe2
* added more ranking attributes (without function; this will be added later)
...
* added ranking coefficient transmission to remote peer (without evaluation on server side, will be added later)
* changed ranking coefficients slightly
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1770 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
87e90b9d8c
refinements in ram cache flush procedure and default timing
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1768 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
d31a4e0b4f
some small enhancements with cache flushing parameters and data structures
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1767 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
3703f76866
- fixed re-search bug: after a search with several words, a second search could not
...
find the same words as before. This was caused because indexContaines stored the url references
with a hashtable. A tree was needed to work with the index conjunction-by-numeration
- added permanent ram cache flush (again)
- removed direct flush of ram cache after a large container is added.
this happens especially during DHT transmission and therefore this fix should
speed up DHT transmission on server side.
- removed unused and out-dated methods
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1765 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
fbbbf5f411
*) remote trigger for proxy-crawl
...
- remote crawling can now be enabled for the proxy crawling profile
See: http://www.yacy-forum.de/viewtopic.php?p=17753#17753
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1758 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
dc9174c809
*) Implementing snippet fetching via ajax
...
Snippets that are not available on page load time will be fetched using ajax requests.
see: http://www.yacy-forum.de/viewtopic.php?p=16479
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1748 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
1d8ca6e082
serialized dhtChunk deletion with indexing
...
The dht selection, transmission and deletion is now completely serialized with indexing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1731 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
2336f0f013
*) allow pausing/resuming of crawlJob Threads separately
...
- pausing/resuming localCrawls
- pausing/resuming remoteTriggeredCrawls
- pausing/resuming globalCrawlTrigger
See: http://www.yacy-forum.de/viewtopic.php?t=1591
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1723 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
60dac4325e
serialized indexing with dht selection
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1719 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
a840755964
moved parts of index transfer logic back to switchboard
...
this is needed to merge the dht selection with the indexing thread
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1718 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
134253a603
fixed bug with cache flush
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1717 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
c2d863855d
different flush limit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1713 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
borg-0300
64441b1f78
ADDED: yacy.badwords list to filter the topwords
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1711 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
f9063e2040
added some synchronization to avoid that several tasks can trigger a cache flush simultanously
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1708 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
2c4e4ae6a2
further refactoring of dht selection, transfer and flushing
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1707 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
73dad68cf1
outsourced thelis DHT flush class into own file
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1706 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
allo
aa4b04e3dd
reverted last change
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1705 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
allo
4b0dae8fcf
added a possiblity to get the ranking values for an url.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1703 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
85ac7d8386
* moved DHT transfer thread to own class file, needed for further modularization
...
* changed status handling
* added forced cache flush when cache has containers with too high number of index entries
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1702 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
7df2e6e571
bugfix for last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1700 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
cd41e9a0eb
moved DHT index selection to new object that holds indexes to be send away to other peer.
...
This was made to make it possible that RWI selections can be serialized with indexing.
Serialization will be implemented in another step.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1698 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
42a5f56723
*) Bugfix for broken dht thread configuration
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1695 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
f95d98142f
*) displaying amount of items in the existsIndex caches
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1679 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hydrox
e2af2a3f45
*) it's now possible to run more then one indexDistribution-Thread
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1673 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
40dd6ec4fd
*) experimental restructuring of db import function
...
- trying to reduce IO load by avoiding unnecessary db access
- trying to presort url list
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1671 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
2da18ab359
*) correcting logging output
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1667 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
8ffc6e35ad
*) correcting logging output
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1665 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
980e986b64
*) Re enabling short cycle for already removed nurl entries
...
See: http://www.yacy-forum.de/viewtopic.php?p=17147#17147
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1660 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
3b6328ad02
*) Consistent use of minCount for index transfer
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1645 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
0b60b9bf51
*) Remove entries from AssortmentCluster before reinserting the rest into the ramCache
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1640 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hydrox
8ab1d6ff4b
*) fixed NullPointerException in plasmaWordIndexEntity
...
See: http://www.yacy-forum.de/viewtopic.php?t=1921
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1638 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
allo
a26574c894
Migration from tagName as key to wordhash(tagName) as key for bookmarkTags.db
...
(just deleting the old db, rebuildTags does the rest)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1637 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
7eb10675b3
re-organization of index management
...
this was done to be prepared for new storage algorithms
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1635 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
1e4578aab6
VERY EXPERIMENTAL removal of index ram cache flushing thread.
...
The cache will fill up and flushed explicitely when it is full.
This shall remove double-access of assortments (indexing and flush)
during indexing process. Hopefully this should reduce IO.
The main idea is: the cache shall mainly be flushed by DHT transfer, and
only indexes that shall be hosted by the own peer are flushed to the
assortments. This needs further work.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1617 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
954f02d22e
*) Bugfix: Prevent wordIndex.getContainer() from returning and even manipulating
...
the containers from the ram cache. Return a new container instead.
*) Speedup flushFromMem by reducing the number of searches in the TreeMap
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1604 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
fe39493145
changed default ranking parameters
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1582 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
365a3fff8e
fixings for ranking attributes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1569 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
8e55098b74
fixed detailed search
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1562 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
0cb940a8e5
added detailed search.
...
ranking profiles do not work properly yet
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1551 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
c695928f7c
adopted search page to new detailed search (to be commited later)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1550 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
45323e7b76
fixed null pointer exception during search
...
see http://www.yacy-forum.de/viewtopic.php?p=16429#16429
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1547 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
fb7411d7bb
re-structuring of ranking application:
...
concentration of all ranking attributes in the
plasmaSearchRankingProfile
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1541 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
d98418390b
- introduced rankingProfile Class
...
- selection of ranking and timing profiles for each search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1539 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
eab1805bca
refactoring: plasmaSearchProfile -> plasmaSearchTimingProfile
...
This was made to distiguish this profile from the
(to-be-implemented) plasmaSeachOrderProfile
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1538 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
6eef848954
re-design of post-ranking process
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1537 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
be77fe1a88
code clean-up
...
@Martin: bitte schaue mal warum die Variablenzuweisung
in plasmaCrawlNURLImporter war. So wie sie waren, waren sie überflüssig.
Das hattest du dir bestimmt nicht so gedacht.
Sollten es ggf. globale Variablen sein?
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1529 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
0bc2aaeb42
added normalization to search attributes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1528 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
008bcb7fb8
*) simplifying code by moving closeTransferIndexes into final block
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1522 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
50d85657b8
*) new import function for IndexImport_p.html
...
- can be used to import the crawling queue (noticeUrlDB + stacks)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1518 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
214302284e
*) undoing last commit because of problems with getUpdateTime
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1514 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
408de3beee
*) avoiding to search in the treemap two times for the same key
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1513 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
borg-0300
139ba4e0c8
Bugfix for getCachePath(URL url)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1510 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
442807cb29
*) Bugfix for last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1506 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
22fd1ca9aa
*) minor changes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1505 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
6a99304b2b
*) Redesign of db import functionality
...
- restructuring to allow different import tasks to be controlled via one gui
- adding possibility to import a single assortment file
- adding possibility to set the cache size that should be used
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1504 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
3834675084
fixed bug that caused wrong behavior of search result preparation
...
(second search on same topic resulted in less links)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1502 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
31c8476b5d
plasmaWordIndexCache.getContainer:
...
*) Also get entries from cache
*) calculate available remaining time for backend.getContainer correctly
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1501 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
3419b3bcdd
fix for bug that caused the peer-counter problem.
...
See http://www.yacy-forum.de/viewtopic.php?p=16016#16016
The kelondroDyn now uses a generic fill character.
kelondroDyn-Tables containing peer/word/url-hashes must not use '_'
as fill character.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1498 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
4f43816ec0
*) Fix wrong class cast in indexSize()
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1495 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
a7f0adf6fa
bugfix in entity iterator
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1490 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
fa90c3ca7a
- removed some usage of indexEntity
...
- changed index collection process: indexes are not first flushed to indexEntity,
but now collected directly from ram cache and assortments
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1489 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
aea3e00864
cleanup: removed unused temporary index management in indexEntity.
...
This is replaced by indexContainers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1486 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
03c65742ba
changes towards the new index storage scheme:
...
- replaced usage of temporary IndexEntity by EntryContainer
- added more attributes to word index
- added exact-string search (using quotes in query)
- disabled writing into WORDS during search; EntryContainers are used instead
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1485 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
ab7a911bb3
*) Trying to solve pool not open problem
...
See: http://www.yacy-forum.de/viewtopic.php?t=1798
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1482 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hydrox
d665f3c39c
*) fixed Threadnames for stackCrawl-Threads
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1480 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
3d5347bc8e
*) changing loglevel for some messages
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1479 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
0fcd113c42
*) last bugfix part. Seems to work now for the stackCrawler
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1478 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
b9c9eaeb44
*) next try todo a bugfix :-((
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1477 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
4b4b93c413
*) next try todo a bugfix :-(
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1476 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
d9fbad71b9
*) next try todo a bugfix
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1475 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
6da97bd2e4
*) next bugfix for threadpool problem
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1474 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
bea2b9edee
*) further redesign of threadpools to solve too many thread problem
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1473 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
784fd50437
*) more verbose thread names
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1471 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
56e4dbeb71
*) displaying current active + current idle threads in PerformanceQueues_p.html now
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1470 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
859c6a88f5
*) testing various thread pool eviction settings to avoid outOfMemory - Thread creation problem
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1467 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
f2b18cede9
AND-bugfix
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1461 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
b946e28e61
some ranking enhancements
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1460 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
rramthun
6c02f889f7
Cosmetic changes.
...
Corrected version numbering as described in http://www.yacy-websuche.de/wiki/index.php/De:Versionsnummern
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1453 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
b191f06d16
*) Adding additional logging message to locate problems with stackcrawl threads
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1452 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
d9bcd73d93
*) Bugfix for exception
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1448 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
f5abfe8d57
*) more failsafe threadpools
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1446 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
a56fefe0d3
added missing forced-flush for index cache
...
see http://www.yacy-forum.de/viewtopic.php?p=15732#15732
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1434 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
78bcb8014a
*) Limit range for selection of indexes for distribution to a DHTDistance of 0.2
...
(For wider ranges enough suitable targets are not probable)
*) Migrate Indexes from ClassicDB back to AssortmentCluster if transfer fails
*) Remove class iterateFiles from plasmaWordIndex
(The class iterateFiles from plasmaWordIndexClassicDB is used instead)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1430 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
861aae678d
*) cleanup cacheAge database when cleaning up the HTCache
...
*) Log directory deletes with level Fine
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1427 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
b4e2efef10
*) first test of new iteration function
...
ATTENTION: please don't use it at the moment
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1418 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
eabf4a0386
fix for null pointer exception during shut-down
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1415 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
47843e69e2
auto-reset for switchboard queue stack
...
bugfix for http://www.yacy-forum.de/viewtopic.php?p=15684#15684
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1414 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
d6581c445b
added content iterator for corrupted database files
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1406 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
ecdc1f7547
*) Bugfix for crawling URLs with query parameters
...
See: http://www.yacy-forum.de/viewtopic.php?p=14065
*) Preparation for http://www.yacy-forum.de/viewtopic.php?t=1719
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1405 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
fc4ae899f7
added word-position to ranking (this is only a first step)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1395 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
bb2095fe39
assortment files are now not deleted, but shifted to a backup directory.
...
See also: http://www.yacy-forum.de/viewtopic.php?p=15458#15458
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1394 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
7366e39dd3
tried to fix 100% CPU bug.
...
See http://www.yacy-forum.de/viewtopic.php?p=15569#15569
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1393 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
f14d49fae9
enhancements, bugfixes and additions to word index attribute storage
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1392 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
allo
4d33020f56
Migration to WORK
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1389 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
rramthun
1e5feedf0e
Fix for http://www.yacy-forum.de/viewtopic.php?p=15547#15547
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1388 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
f4ffa9aee5
- implemented more attributes to index entries
...
- implemented hand-over of new word index attributes during remote search
- implemented word-distance computation during search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1382 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
90b940e90e
fixed position storage problem.
...
Now the word position is properly stored.
No use of that now, but can be used for better ranking.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1378 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
0371494010
tried to add word position to index
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1377 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
f1cfee7703
removed tabs from condenser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1376 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
37791fd529
*) Close indexEntities when "found not enough peers for distribution"
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1375 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
borg-0300
c5b6154136
added CRDistOn = true/false
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1372 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
71d5c2b2ca
better control for target peer selection for RWI transfer
...
see also http://www.yacy-forum.de/viewtopic.php?p=15343#15343
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1370 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
ca7407b7e1
*) Don't change maxTime if zero or negative
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1363 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
3d7c8aaeae
removed confusing method
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1339 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
4cd0c45a77
code cleanup
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1337 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
hermens
971247b78f
- rotate merged indexes after merging
...
see: http://www.yacy-forum.de/viewtopic.php?t=1717
- fix -rwihashlist to correctly shutdown
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1336 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
e2ff1767b5
fix for last DHT distribution bug-fix
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1330 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
060e5a0df0
fixed problem with DHT target peer selection:
...
- shifted selection in front of distribution
see http://www.yacy-forum.de/viewtopic.php?p=15131#15131
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1327 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
7c22afe3de
*) Bugfix for NullpointerException in deleteOldHTCache
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1326 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago