allo
3730ec3440
moving to a _p page.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2738 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
22649408ad
*) Better errorhandling for charset encoding problem during content parsing
...
See: http://www.yacy-forum.de/viewtopic.php?t=2952
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2737 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
89ee215ff0
*) better detection of svn revision number in old xml format
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2736 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
a9c7e3f061
*) Bugfix for NoSuchElementException
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2735 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
f25f61d9d3
documentation of compile problem. See
...
http://www.yacy-forum.de/viewtopic.php?p=26407#26407
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2734 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
c8f3a7d363
added snippet-url re-indexing
...
- snippets will generate an entry in responseHeader.db
- there is now another default profile for snippet loading
- pages from snippet-loading will be indexed, indexing depth = 0
- better organization of default profiles
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2733 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
low012
2cfd4633ac
*) even better handling of searchwords in snippets, words can consist of letters and numbers now
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2732 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
b062847797
fix for
...
http://www.yacy-forum.de/viewtopic.php?p=26439#26439
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2731 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
e17fea7015
files in htcache are now stored in different hash/tree subdirectories
...
according to storage method
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2730 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
661f005214
fix for seed upload build script
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2729 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
low012
2d3b7251a4
*) better handling of searchwords in snippets (see http://www.yacy-forum.de/viewtopic.php?t=2891 for details)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2728 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
ddf8f220f6
fix for build fail
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2727 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
2e4aa6a170
refactoring of Advanced Config:
...
- removed settings that are in Basic Settings
- joined pages that belong together
- moved include pages from yacy/ to /
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2726 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
25ae3d3161
generalized definition of hexhash
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2725 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
86047f439d
removed very bad bug that prevented production of any remote search result
...
:-(((
Please update!
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2724 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
f0d747c723
removed deprecated method
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2723 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
5ff77612ac
bugfix for old WORDS storage method
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2722 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
0f10bdde22
more generic cache methods
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2721 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
72482b1426
fixed scraper
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2720 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
hermens
6557112d8f
small fix for plasmaURLPool.getURL() needed for new alternative htcache layout
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2719 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
hermens
440c6ee657
Implement alternative htcache layout
...
mostly according to: http://www.yacy-forum.de/viewtopic.php?p=26205#26205
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2718 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
allo
226f2c5b2c
first version, of the Serverlet Debugger
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2717 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
adf1f74ab2
bugfix for java 1.5 compile problem with serverCharBuffer.append(char)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2716 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
fd61209797
lines inside tags without punctuation are extended by a single dot.
...
This enables the condenser to distinguish the lines in a better way.
The result is a better preparation of snippets.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2715 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
allo
e25172853a
fixed license notice
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2714 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
allo
1d0c0edda3
first version of posts/get from the del.icio.us api
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2713 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
1969522dc1
removed lowercase of snippets (and other things):
...
- added new sentence parser to condenser
- sentence parsing can now handle charsets
to do: charsets must be handed over to new sentence parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2712 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
43614f1b36
bugfix in collection index. the index for collections was not created correctly
...
The bugfix includes a migration function which starts automatically
after startup of yacy.
This applies only to you, if you are using the new collection index.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2711 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
low012
07155ef3b0
*) added a few constraints to prevent exceptions when clicking on stop or pause on IndexCleaner_p.html when no thread is started
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2710 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
1dfab1abe3
more control for seed receive
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2709 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
1c0e65f55f
*) Bugfix for problems with charset detection
...
See: http://www.yacy-forum.de/viewtopic.php?p=26196
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2708 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
db294687ea
enhanced logging
...
- more logging output
- fix in log line preparation
- added filter to log page
- some small bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2707 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
borg-0300
08aa9d4c07
duplicate removes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2706 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
a9a0f51303
*) suppressing InterruptedException errormessage
...
See: http://www.yacy-forum.de/viewtopic.php?t=2915
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2705 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
ce7ee74316
*) better errorhandling in filehandler (try catch block now starts before argument parsing)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2704 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
1d4fb680ce
*) CrawlWorker.java: only keep content in memory if size is equal or less than 5MB
...
TODO: make this limit configurable
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2703 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
1586d57187
*) odtParser: better handling of large files
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2702 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
f17ce28b6d
*) plasmaHTCache:
...
- method loadResourceContent defined as deprecated.
Please do not use this function to avoid OutOfMemory Exceptions
when loading large files
- new function getResourceContentStream to get an inputstream of a cache file
- new function getResourceContentLength to get the size of a cached file
*) httpc.java:
- Bugfix: resource content was loaded into memory even if this was not requested
*) Crawler:
- new option to hold loaded resource content in memory
- adding option to use the worker class without the worker pool
(needed by the snippet fetcher)
*) plasmaSnippetCache
- snippet loader does not use a crawl-worker from pool but uses
a newly created instance to avoid blocking by normal crawling
activity.
- now operates on streams instead of byte arrays to avoid OutOfMemory
Exceptions when operating on large files
- snippet loader now forces the crawl-worker to keep the loaded
resource in memory to avoid IO
*) plasmaCondenser: adding new function getWords that can directly operate on input streams
*) Parsers
- keep resource in memory whenever possible (to avoid IO)
- when parsing from stream the content length must be passed to the parser function now.
this length value is needed by the parsers to decide if the parsed resource content is to large
to hold it in memory and must be stored to file
- AbstractParser.java: new function to pass the contentLength of a resource to the parsers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2701 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
630a955674
read snippets from cache in case they are not provided in RAM
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2700 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
allo
b114def2f8
duplicate classpath entry
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2699 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
allo
2ab09e71a7
removing absolute Classpaths
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2698 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
allo
a723c2809d
-t(aillog) option, to start monitoring the log after startup. So you see the log, but can stop viewing it with ctrl+c, without stopping yacy.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2697 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
allo
fda7031991
further cleanup
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2696 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
allo
f0ed7f43c4
more sh (i.e. /bin/dash instead of /bin/bash as sh) compatibility
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2695 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
bcf2b800b4
applied UTF-8 encoding parameter to yacy-internal protocol communication
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2694 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
c40fca08a2
fixed bad handling of string separation
...
you can now use a new encoding attribute to create strings from byte arrays
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2693 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
5a40ea7866
refactoring of wget string list generation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2692 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
dbc2e039bb
added time-out option parameter to call hierarchy
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2691 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
b59d4576af
increased version number to emphasise that the snippet fix
...
_dramatically_ increased search speed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2690 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
d4c239e4be
- fixed problem in collection index with deletion of single url references
...
- added automatic deletion of not-found snippets after search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2689 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago