orbiter
ceb9e3aa17
- enhanced parser: collection of audio, video, image and application links
...
- enhanced condenser: better handling of utf-8 and pre-formatted texts
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3017 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
b5a29e9651
- fix for snippets that are too short
...
- added keyword to snippet fetch to suppres removal of not-found snippet words (for debugging)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3009 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
30888e7a2f
implementation of search constraints
...
Such constraints may formulate specific restrictions to web searches
This is implemented by scraping information for constraints from a web
page during parsing, and storing flags to the pages within the web index.
In this first step, only information for index pages ("index of", directory listings)
are scraped and stored in flags
- added new flag class kelondroBitfield
- added scraper method in condenser
- added bitfield structure for all scrape types (see also condenser)
- added bitfield structure for appearance locations (see RWIEntry)
- added handover protocol for remote search and index distribution
- extended kelondroColumn class to hold bitfield types
- added another search attribute on search page (index.html)
- extended search-filter to enable filtering of non-matching constraints
- set all new database types to be default
- refactoring: moved word hash generation to condenser class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2999 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
d34f10c63d
some tests with reverse dns lookup
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2954 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
497428c8ec
refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2949 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
allo
a75f895884
memory and traffic informations
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2904 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
allo
2ba56f70a8
XML-safe put.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2848 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
allo
a17c43779f
removed wrong part of template
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2830 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
allo
27f9e0b1c6
xml interface for blacklists
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2829 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
allo
74f09a0510
some more xml-backend files.
...
ConfigAdvanced_p.java: list settings after changing.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2784 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
allo
e25172853a
fixed license notice
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2714 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
allo
1d0c0edda3
first version of posts/get from the del.icio.us api
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2713 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
5a40ea7866
refactoring of wget string list generation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2692 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
dbc2e039bb
added time-out option parameter to call hierarchy
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2691 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
b59d4576af
increased version number to emphasise that the snippet fix
...
_dramatically_ increased search speed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2690 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
d4c239e4be
- fixed problem in collection index with deletion of single url references
...
- added automatic deletion of not-found snippets after search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2689 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
df1629b05a
- code cleanup
...
- version 0.471
- moved surftipps to own web page
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2676 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
3aac5b26da
- added automatic tag generation when a web page from the search results is added
...
- added new image 'B' in front of search results for bookmark generation
- added news generation when a public bookmark is added
- the '+' in front of search results has new meaning: positive rating for that result
- added news generation when a '+' is hit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2613 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
5015e780c2
- simplified watchCrawler code
...
- changed display of watchCrawler slightly
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2594 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
c89d8142bb
replaced old 'kCache' by a full-controlled cache
...
there are now two full-controlled caches for incoming indexes:
- dhtIn
- dhtOut
during indexing, all indexes that shall not be transported to remote peers
because they belong to the own peer are stored to dhtIn. It is furthermore
ensured that received indexes are not again transmitted to other peers
directly. They may, however be transmitted later if the network grows.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2574 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
92e986bb91
*) adding missing return prop (requested by allo)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2532 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
allo
f0529fe53e
update for ftp urls
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2531 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
413e6b9855
*) direct access to responseheaders of sbQueue.Entry removed to make it more http independent
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2489 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
eb9b138986
*) next step of restructuring for new crawlers
...
- conversion of the crawler pool into a keyed object pool
- crawlers are now loaded based on the url protocol (of course works only for http now)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2473 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
theli
1395aae742
*) starting restructuring which is needed to add crawlers for additional protocols
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2472 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
7df572756a
fist step+attempt so solve the snippet marking problem.
...
See: http://www.yacy-forum.de/viewtopic.php?p=22855#22855
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2469 6c8d7289-2bf4-0310-a012-ef5d649a1542
18 years ago
orbiter
3879a0ecd0
replaced java.net.URL usage by use of new class de.anomic.net.URL
...
This shall be seen as an experiment to exclude all cases where
there could be a DNS lookup during URL comparisment.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2290 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
allo
933a9e02ab
fix for broken build
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2284 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
allo
360056b30c
fix ajax bug (no valid xml)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2283 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
90d569d70f
refactoring of index management:
...
url storage is part of index management; moved plasmaURL to indexURL
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2122 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
allo
44d72f06c4
more Caching
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1965 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
allo
1a13c8b78e
right wordCachesize after orbiters commit.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1882 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
allo
6b056610e3
updated watchcrawler for the recent changes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1881 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
bcd99fe83e
introduced a second RAM cache for DHT transfer
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1880 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
bae3783d38
added a snippet marking
...
(search words are now bold in snippets)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1823 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
allo
fb5d8fdc59
removed encoding attribute
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1776 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
allo
f1b91b1266
xml with right encoding
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1766 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
3703f76866
- fixed re-search bug: after a search with several words, a second search could not
...
find the same words as before. This was caused because indexContaines stored the url references
with a hashtable. A tree was needed to work with the index conjunction-by-numeration
- added permanent ram cache flush (again)
- removed direct flush of ram cache after a large container is added.
this happens especially during DHT transmission and therefore this fix should
speed up DHT transmission on server side.
- removed unused and out-dated methods
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1765 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
dc9174c809
*) Implementing snippet fetching via ajax
...
Snippets that are not available on page load time will be fetched using ajax requests.
see: http://www.yacy-forum.de/viewtopic.php?p=16479
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1748 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
allo
7e7a72b108
display wordcaches number on WatchCrawler.html
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1746 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
allo
3fd1641893
queuesizes in queues_p.xml
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1714 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
allo
62664d7252
AJAX Check for robots.txt before crawling.
...
Icons from herrlich
TODO: Style it nicely ;-)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1689 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
allo
26d7e8dd0d
more escapes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1677 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
allo
127396436f
more queues in the xml backend
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1674 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
allo
99a970eda1
xml backend with verifyAuthentication
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1652 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
allo
73f18ed5b2
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1627 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
allo
a4513523d6
hide add/edit/import bookmarks per default.
...
xml-bookmark import (this does not work, yet)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1619 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
allo
27b6b3d714
public Tags.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1589 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
allo
a8eff9a0ae
xml/bookmarks/posts/all.xml to list all public Bookmarks
...
bookmarksIterator now accepts an option, if you want all(with private) or only public bookmarks.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1577 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
allo
62a0bb475a
More values displayed on WatchCrawler.html
...
status_p.xml: to be extended.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1561 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago